skills/docs/research/agent-capability-matrix.md
dan 8c033eedd1 docs: add Gemini path fix (includeDirectories setting)
Gemini CLI can access ~/.claude/skills/ via:
  settings.json: { "context": { "includeDirectories": ["~/.claude/skills"] } }
  or CLI: gemini --include-directories ~/.claude/skills

Closes: skills-8nl, skills-bo8

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 19:35:28 -08:00

296 lines
9.9 KiB
Markdown

# Agent Capability Matrix
> **Date:** 2026-01-09
> **Status:** Research complete
> **Related:** [skills-fqu](../../.beads/), [Cross-agent epic](skills-hf1)
## Overview
Comparison of AI coding agent capabilities for cross-agent skill portability and quality gate design.
| Agent | Vendor | Open Source | Primary Language |
|-------|--------|-------------|------------------|
| **Claude Code** | Anthropic | No | TypeScript |
| **Gemini CLI** | Google | Yes (Apache 2.0) | TypeScript |
| **OpenCode** | OpenCode | Yes | Go + TypeScript |
| **Codex CLI** | OpenAI | Yes | Rust |
## Capability Matrix
### Core Features
| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **Hooks/Lifecycle** | ✅ 9 events | ✅ 8+ events | ✅ 32+ events | ⚠️ Limited |
| **Subagent Spawning** | ✅ Task tool | ⚠️ Via MCP | ✅ Native | ⚠️ Recent |
| **File System Access** | ✅ Full | ✅ Full | ✅ Configurable | 🔒 Sandboxed |
| **Bash Execution** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **State Persistence** | ✅ Sessions | ✅ Auto-save | ✅ Multi-level | ✅ Client-side |
| **MCP Support** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **Web Search** | ✅ Built-in | ✅ Grounded | ✅ Via tool | ✅ Built-in |
### Context & Models
| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **Context Window** | 200K tokens | 1M (2M soon) | Model-dependent | 128-400K |
| **Auto Compaction** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **Model Support** | Claude only | Gemini only | 75+ providers | GPT-5.x |
| **Custom Models** | ❌ No | ❌ No | ✅ Yes | ❌ No |
### Security & Sandboxing
| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **OS Sandboxing** | ❌ No | ❌ No | ❌ No | ✅ Yes |
| **Permission System** | ✅ Approval | ✅ Approval | ✅ Granular | ✅ Approval |
| **Tool Restrictions** | ❌ No | ✅ excludeTools | ✅ Per-tool | ✅ Sandbox modes |
| **Path Restrictions** | ❌ No | ❌ No | ✅ external_directory | ✅ Workspace only |
---
## Detailed Breakdown
### 1. Hooks / Lifecycle Events
**Critical for quality gates** - hooks enable mechanical enforcement.
#### Claude Code
```
SessionStart, SessionEnd, PreToolUse, PostToolUse,
PreCompact, UserPromptSubmit, Stop, SubagentStop, Notification
```
- Executable scripts via hooks.json
- Timeout configurable per hook
- Can block operations (Stop hook for quality gates)
#### Gemini CLI
```
SessionStart, SessionEnd, PreCompress, BeforeModel, AfterModel,
BeforeToolSelection, Notification
```
- Similar to Claude Code architecture
- Scripts or npm plugin packages
- Disabled by default for security
#### OpenCode
```
32+ events including:
session.*, file.*, command.*, permission.*, message.*,
todo.*, lsp.*, pty.*, tui.*
```
- Most comprehensive event system
- JS/TS modules (not shell scripts)
- Plugin-based architecture
#### Codex CLI
```
Limited: notification hooks, approval hooks, event loop lifecycle
```
- Less mature hook system
- Community requesting more hooks
- Tool-call events with approval flags
**Cross-Agent Implication:** Only Claude Code and Gemini have Stop-equivalent hooks for blocking exit. OpenCode and Codex would need protocol-based enforcement.
---
### 2. Subagent Spawning
**Critical for orchestrator pattern** - can any agent spawn others?
#### Claude Code
- **Task tool** - Native subagent spawning
- Multiple agent types (Explore, Plan, Bash, etc.)
- Returns results to parent
- Can run in background
#### Gemini CLI
- **Limited native** - Experimental YOLO mode
- **Via MCP** - PAL server enables cross-CLI spawning
- Can spawn Claude Code as subagent and vice versa
#### OpenCode
- **Native support** - Primary agents spawn subagents
- Session forking with parent tracking
- `@agent-name` mentions for manual invocation
- Configurable agent modes (primary/subagent/all)
#### Codex CLI
- **Recent addition** - Multi-conversation agent control
- Can run as MCP server for other agents
- Sandbox restrictions complicate spawning
**Cross-Agent Implication:** MCP is the universal bridge. Any agent can spawn any other via MCP server pattern.
---
### 3. File System Access
**Critical for skills** - can agents read skill files?
#### Claude Code
- Full filesystem access
- No path restrictions
- User permission level
#### Gemini CLI
- Full filesystem access
- **Known issue:** ReadFile restricts to workspace directories
- Symlinked paths (like ~/.claude/skills/) may be blocked
- **Fix:** Add to `~/.gemini/settings.json`:
```json
{ "context": { "includeDirectories": ["~/.claude/skills"] } }
```
- Or use `gemini --include-directories ~/.claude/skills`
#### OpenCode
- Configurable permissions per operation
- `external_directory` permission for outside-project access
- Granular: ask/allow/deny per operation type
#### Codex CLI
- **Sandboxed by default** - workspace-only writes
- Network disabled by default
- `--add-dir` for selective access
- `danger-full-access` mode available
**Cross-Agent Implication:** Gemini's path restrictions (skills-bo8) are the main blocker. Skills need to be in workspace or use shell workarounds.
---
### 4. CLI Tool Execution
**Critical for skills using helper scripts.**
| Agent | Method | Safety |
|-------|--------|--------|
| Claude Code | Bash tool | Approval prompts |
| Gemini CLI | `run_shell_command` (bash -c) | Approval prompts |
| OpenCode | Bash tool with glob patterns | Granular permissions |
| Codex CLI | `!` prefix or shell tool | Sandbox + approval |
All agents can run CLI tools. Key differences:
- Codex has OS-level sandboxing
- OpenCode has most granular permission patterns
- Gemini and Claude have similar approval models
---
### 5. State Persistence
**Critical for quality gates** - need to track review status.
| Agent | Session Storage | Cross-Session Memory |
|-------|-----------------|---------------------|
| Claude Code | Yes | CLAUDE.md |
| Gemini CLI | Auto-save to ~/.gemini/ | GEMINI.md, save_memory tool |
| OpenCode | Multi-level | Config files |
| Codex CLI | Client-side only | ~/.codex/sessions/ |
**Cross-Agent Implication:** For cross-agent state, need external store (jwz, beads, or simple files). Agent-native persistence is agent-specific.
---
### 6. Built-in Tools
#### Claude Code (~10 tools)
Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch, TodoWrite, NotebookEdit
#### Gemini CLI (12+ tools)
read_file, write_file, replace, glob, list_directory, run_shell_command,
web_fetch, google_web_search, save_memory, write_todos, codebase_investigator,
search_file_content, read_many_files
#### OpenCode
File ops, Bash, Web fetch, LSP integration, Git/VCS, PTY management, MCP servers, Custom plugins
#### Codex CLI
read_file, list_dir, glob_file_search, apply_patch, git, rg (search),
shell, todo_write, web_search, image support
**Parity:** All have core file/shell/search. Differences in naming and extras (LSP, code execution, etc.)
---
## Cross-Agent Quality Gate Analysis
### What Works Across All Agents
| Component | Approach |
|-----------|----------|
| **State storage** | External CLI tool (jwz, bd, or file-based) |
| **Reviewer invocation** | Any agent can spawn reviewer via Bash/MCP |
| **Issue tracking** | External CLI (beads, tissue) |
| **Second opinions** | orch works from any agent with Bash |
### What Doesn't Work Across Agents
| Component | Problem |
|-----------|---------|
| **Stop hook** | Claude/Gemini only - no equivalent in OpenCode/Codex |
| **Mechanical blocking** | Can't prevent exit without hooks |
| **Native subagents** | Different spawning mechanisms |
### Recommended Cross-Agent Pattern
```
┌─────────────────────────────────────────────┐
│ Orchestrator (any agent) │
│ │
│ 1. Start work │
│ 2. Spawn worker agent (via MCP/Bash) │
│ 3. Worker completes, posts to state store │
│ 4. Orchestrator spawns reviewer │
│ 5. Reviewer posts APPROVED/ISSUES │
│ 6. Orchestrator checks state, gates exit │
└─────────────────────────────────────────────┘
```
The **orchestrator** enforces the gate, not hooks. Works with any agent.
---
## Recommendations
### For Portable Skills
1. **Use SKILL.md format** - All agents can read markdown
2. **Avoid agent-specific features** - No hooks in skill logic
3. **CLI tools for actions** - All agents can run Bash
4. **External state** - beads/jwz for cross-agent coordination
### For Quality Gates
1. **Orchestrator pattern** - Gate logic in orchestrator, not hooks
2. **Protocol-based** - Agents follow instructions, post to state store
3. **Hybrid** - Use hooks where available (Claude/Gemini), protocol elsewhere
### For Subagent Research Sandbox (skills-ut4)
| Agent | Sandbox Approach |
|-------|------------------|
| Claude Code | No native sandbox - rely on Task agent type restrictions |
| Gemini CLI | `excludeTools` setting to disable write tools |
| OpenCode | Permission config: `edit: "deny", bash: "deny"` |
| Codex CLI | Native sandbox - best support for read-only research |
---
## Open Questions
1. Can MCP server sandboxing be enforced by the server, not client?
2. Is there a standard for "read-only agent" across platforms?
3. Should we build a universal sandbox wrapper script?
4. How do we handle agents that ignore protocol instructions?
---
## References
- [Claude Code Docs](https://docs.anthropic.com/claude-code)
- [Gemini CLI GitHub](https://github.com/google-gemini/gemini-cli)
- [OpenCode Docs](https://opencode.ai/docs/)
- [Codex CLI Docs](https://developers.openai.com/codex/cli/)
- [MCP Specification](https://modelcontextprotocol.io/)