skills/docs/research/agent-capability-matrix.md

# Agent Capability Matrix

> **Date:** 2026-01-09
> **Status:** Research complete
> **Related:** [skills-fqu](../../.beads/), [Cross-agent epic](skills-hf1)

## Overview

Comparison of AI coding agent capabilities for cross-agent skill portability and quality gate design.

| Agent | Vendor | Open Source | Primary Language |
|-------|--------|-------------|------------------|
| **Claude Code** | Anthropic | No | TypeScript |
| **Gemini CLI** | Google | Yes (Apache 2.0) | TypeScript |
| **OpenCode** | OpenCode | Yes | Go + TypeScript |
| **Codex CLI** | OpenAI | Yes | Rust |

## Capability Matrix

### Core Features

| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **Hooks/Lifecycle** | ✅ 9 events | ✅ 8+ events | ✅ 32+ events | ⚠️ Limited |
| **Subagent Spawning** | ✅ Task tool | ⚠️ Via MCP | ✅ Native | ⚠️ Recent |
| **File System Access** | ✅ Full | ✅ Full | ✅ Configurable | 🔒 Sandboxed |
| **Bash Execution** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **State Persistence** | ✅ Sessions | ✅ Auto-save | ✅ Multi-level | ✅ Client-side |
| **MCP Support** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **Web Search** | ✅ Built-in | ✅ Grounded | ✅ Via tool | ✅ Built-in |

### Context & Models

| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **Context Window** | 200K tokens | 1M (2M soon) | Model-dependent | 128-400K |
| **Auto Compaction** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **Model Support** | Claude only | Gemini only | 75+ providers | GPT-5.x |
| **Custom Models** | ❌ No | ❌ No | ✅ Yes | ❌ No |

### Security & Sandboxing

| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **OS Sandboxing** | ❌ No | ❌ No | ❌ No | ✅ Yes |
| **Permission System** | ✅ Approval | ✅ Approval | ✅ Granular | ✅ Approval |
| **Tool Restrictions** | ❌ No | ✅ excludeTools | ✅ Per-tool | ✅ Sandbox modes |
| **Path Restrictions** | ❌ No | ❌ No | ✅ external_directory | ✅ Workspace only |

---

## Detailed Breakdown

### 1. Hooks / Lifecycle Events

**Critical for quality gates** - hooks enable mechanical enforcement.

#### Claude Code
```
SessionStart, SessionEnd, PreToolUse, PostToolUse,
PreCompact, UserPromptSubmit, Stop, SubagentStop, Notification
```
- Executable scripts via hooks.json
- Timeout configurable per hook
- Can block operations (Stop hook for quality gates)

#### Gemini CLI
```
SessionStart, SessionEnd, PreCompress, BeforeModel, AfterModel,
BeforeToolSelection, Notification
```
- Similar to Claude Code architecture
- Scripts or npm plugin packages
- Disabled by default for security

#### OpenCode
```
32+ events including:
session.*, file.*, command.*, permission.*, message.*,
todo.*, lsp.*, pty.*, tui.*
```
- Most comprehensive event system
- JS/TS modules (not shell scripts)
- Plugin-based architecture

#### Codex CLI
```
Limited: notification hooks, approval hooks, event loop lifecycle
```
- Less mature hook system
- Community requesting more hooks
- Tool-call events with approval flags

**Cross-Agent Implication:** Only Claude Code and Gemini have Stop-equivalent hooks for blocking exit. OpenCode and Codex would need protocol-based enforcement.

---

### 2. Subagent Spawning

**Critical for orchestrator pattern** - can any agent spawn others?

#### Claude Code
- **Task tool** - Native subagent spawning
- Multiple agent types (Explore, Plan, Bash, etc.)
- Returns results to parent
- Can run in background

#### Gemini CLI
- **Limited native** - Experimental YOLO mode
- **Via MCP** - PAL server enables cross-CLI spawning
- Can spawn Claude Code as subagent and vice versa

#### OpenCode
- **Native support** - Primary agents spawn subagents
- Session forking with parent tracking
- `@agent-name` mentions for manual invocation
- Configurable agent modes (primary/subagent/all)

#### Codex CLI
- **Recent addition** - Multi-conversation agent control
- Can run as MCP server for other agents
- Sandbox restrictions complicate spawning

**Cross-Agent Implication:** MCP is the universal bridge. Any agent can spawn any other via MCP server pattern.

---

### 3. File System Access

**Critical for skills** - can agents read skill files?

#### Claude Code
- Full filesystem access
- No path restrictions
- User permission level

#### Gemini CLI
- Full filesystem access
- **Known issue:** ReadFile restricts to workspace directories
- Symlinked paths (like ~/.claude/skills/) may be blocked
- **Fix:** Add to `~/.gemini/settings.json`:
  ```json
  { "context": { "includeDirectories": ["~/.claude/skills"] } }
  ```
- Or use `gemini --include-directories ~/.claude/skills`

#### OpenCode
- Configurable permissions per operation
- `external_directory` permission for outside-project access
- Granular: ask/allow/deny per operation type

#### Codex CLI
- **Sandboxed by default** - workspace-only writes
- Network disabled by default
- `--add-dir` for selective access
- `danger-full-access` mode available

**Cross-Agent Implication:** Gemini's path restrictions (skills-bo8) are the main blocker. Skills need to be in workspace or use shell workarounds.

---

### 4. CLI Tool Execution

**Critical for skills using helper scripts.**

| Agent | Method | Safety |
|-------|--------|--------|
| Claude Code | Bash tool | Approval prompts |
| Gemini CLI | `run_shell_command` (bash -c) | Approval prompts |
| OpenCode | Bash tool with glob patterns | Granular permissions |
| Codex CLI | `!` prefix or shell tool | Sandbox + approval |

All agents can run CLI tools. Key differences:
- Codex has OS-level sandboxing
- OpenCode has most granular permission patterns
- Gemini and Claude have similar approval models

---

### 5. State Persistence

**Critical for quality gates** - need to track review status.

| Agent | Session Storage | Cross-Session Memory |
|-------|-----------------|---------------------|
| Claude Code | Yes | CLAUDE.md |
| Gemini CLI | Auto-save to ~/.gemini/ | GEMINI.md, save_memory tool |
| OpenCode | Multi-level | Config files |
| Codex CLI | Client-side only | ~/.codex/sessions/ |

**Cross-Agent Implication:** For cross-agent state, need external store (jwz, beads, or simple files). Agent-native persistence is agent-specific.

---

### 6. Built-in Tools

#### Claude Code (~10 tools)
Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch, TodoWrite, NotebookEdit

#### Gemini CLI (12+ tools)
read_file, write_file, replace, glob, list_directory, run_shell_command,
web_fetch, google_web_search, save_memory, write_todos, codebase_investigator,
search_file_content, read_many_files

#### OpenCode
File ops, Bash, Web fetch, LSP integration, Git/VCS, PTY management, MCP servers, Custom plugins

#### Codex CLI
read_file, list_dir, glob_file_search, apply_patch, git, rg (search),
shell, todo_write, web_search, image support

**Parity:** All have core file/shell/search. Differences in naming and extras (LSP, code execution, etc.)

---

## Cross-Agent Quality Gate Analysis

### What Works Across All Agents

| Component | Approach |
|-----------|----------|
| **State storage** | External CLI tool (jwz, bd, or file-based) |
| **Reviewer invocation** | Any agent can spawn reviewer via Bash/MCP |
| **Issue tracking** | External CLI (beads, tissue) |
| **Second opinions** | orch works from any agent with Bash |

### What Doesn't Work Across Agents

| Component | Problem |
|-----------|---------|
| **Stop hook** | Claude/Gemini only - no equivalent in OpenCode/Codex |
| **Mechanical blocking** | Can't prevent exit without hooks |
| **Native subagents** | Different spawning mechanisms |

### Recommended Cross-Agent Pattern

```
┌─────────────────────────────────────────────┐
│  Orchestrator (any agent)                    │
│                                              │
│  1. Start work                               │
│  2. Spawn worker agent (via MCP/Bash)        │
│  3. Worker completes, posts to state store   │
│  4. Orchestrator spawns reviewer             │
│  5. Reviewer posts APPROVED/ISSUES           │
│  6. Orchestrator checks state, gates exit    │
└─────────────────────────────────────────────┘
```

The **orchestrator** enforces the gate, not hooks. Works with any agent.

---

## Recommendations

### For Portable Skills

1. **Use SKILL.md format** - All agents can read markdown
2. **Avoid agent-specific features** - No hooks in skill logic
3. **CLI tools for actions** - All agents can run Bash
4. **External state** - beads/jwz for cross-agent coordination

### For Quality Gates

1. **Orchestrator pattern** - Gate logic in orchestrator, not hooks
2. **Protocol-based** - Agents follow instructions, post to state store
3. **Hybrid** - Use hooks where available (Claude/Gemini), protocol elsewhere

### For Subagent Research Sandbox (skills-ut4)

| Agent | Sandbox Approach |
|-------|------------------|
| Claude Code | No native sandbox - rely on Task agent type restrictions |
| Gemini CLI | `excludeTools` setting to disable write tools |
| OpenCode | Permission config: `edit: "deny", bash: "deny"` |
| Codex CLI | Native sandbox - best support for read-only research |

---

## Open Questions

1. Can MCP server sandboxing be enforced by the server, not client?
2. Is there a standard for "read-only agent" across platforms?
3. Should we build a universal sandbox wrapper script?
4. How do we handle agents that ignore protocol instructions?

---

## References

- [Claude Code Docs](https://docs.anthropic.com/claude-code)
- [Gemini CLI GitHub](https://github.com/google-gemini/gemini-cli)
- [OpenCode Docs](https://opencode.ai/docs/)
- [Codex CLI Docs](https://developers.openai.com/codex/cli/)
- [MCP Specification](https://modelcontextprotocol.io/)