dan/skills

dan ec2d856c05 docs: add agent capability matrix for cross-agent design

Comprehensive comparison of Claude Code, Gemini CLI, OpenCode, and Codex:
- Hooks/lifecycle events (Claude/Gemini best, OpenCode most comprehensive)
- Subagent spawning (MCP is universal bridge)
- File access (Gemini has path restrictions - skills-bo8)
- Sandboxing (Codex has OS-level, others approval-based)
- State persistence (need external store for cross-agent)

Key finding: Orchestrator pattern works across all agents.
Stop hooks only in Claude/Gemini - others need protocol-based gates.

Closes: skills-fqu

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-09 17:32:17 -08:00

9.8 KiB

Raw Blame History

Agent Capability Matrix

Date: 2026-01-09 Status: Research complete Related: skills-fqu, Cross-agent epic

Overview

Comparison of AI coding agent capabilities for cross-agent skill portability and quality gate design.

Agent	Vendor	Open Source	Primary Language
Claude Code	Anthropic	No	TypeScript
Gemini CLI	Google	Yes (Apache 2.0)	TypeScript
OpenCode	OpenCode	Yes	Go + TypeScript
Codex CLI	OpenAI	Yes	Rust

Capability Matrix

Core Features

Capability	Claude Code	Gemini CLI	OpenCode	Codex CLI
Hooks/Lifecycle	✅ 9 events	✅ 8+ events	✅ 32+ events	⚠️ Limited
Subagent Spawning	✅ Task tool	⚠️ Via MCP	✅ Native	⚠️ Recent
File System Access	✅ Full	✅ Full	✅ Configurable	🔒 Sandboxed
Bash Execution	✅ Yes	✅ Yes	✅ Yes	✅ Yes
State Persistence	✅ Sessions	✅ Auto-save	✅ Multi-level	✅ Client-side
MCP Support	✅ Yes	✅ Yes	✅ Yes	✅ Yes
Web Search	✅ Built-in	✅ Grounded	✅ Via tool	✅ Built-in

Context & Models

Capability	Claude Code	Gemini CLI	OpenCode	Codex CLI
Context Window	200K tokens	1M (2M soon)	Model-dependent	128-400K
Auto Compaction	✅ Yes	✅ Yes	✅ Yes	✅ Yes
Model Support	Claude only	Gemini only	75+ providers	GPT-5.x
Custom Models	❌ No	❌ No	✅ Yes	❌ No

Security & Sandboxing

Capability	Claude Code	Gemini CLI	OpenCode	Codex CLI
OS Sandboxing	❌ No	❌ No	❌ No	✅ Yes
Permission System	✅ Approval	✅ Approval	✅ Granular	✅ Approval
Tool Restrictions	❌ No	✅ excludeTools	✅ Per-tool	✅ Sandbox modes
Path Restrictions	❌ No	❌ No	✅ external_directory	✅ Workspace only

Detailed Breakdown

1. Hooks / Lifecycle Events

Critical for quality gates - hooks enable mechanical enforcement.

Claude Code

SessionStart, SessionEnd, PreToolUse, PostToolUse,
PreCompact, UserPromptSubmit, Stop, SubagentStop, Notification

Executable scripts via hooks.json
Timeout configurable per hook
Can block operations (Stop hook for quality gates)

Gemini CLI

SessionStart, SessionEnd, PreCompress, BeforeModel, AfterModel,
BeforeToolSelection, Notification

Similar to Claude Code architecture
Scripts or npm plugin packages
Disabled by default for security

OpenCode

32+ events including:
session.*, file.*, command.*, permission.*, message.*,
todo.*, lsp.*, pty.*, tui.*

Most comprehensive event system
JS/TS modules (not shell scripts)
Plugin-based architecture

Codex CLI

Limited: notification hooks, approval hooks, event loop lifecycle

Less mature hook system
Community requesting more hooks
Tool-call events with approval flags

Cross-Agent Implication: Only Claude Code and Gemini have Stop-equivalent hooks for blocking exit. OpenCode and Codex would need protocol-based enforcement.

2. Subagent Spawning

Critical for orchestrator pattern - can any agent spawn others?

Claude Code

Task tool - Native subagent spawning
Multiple agent types (Explore, Plan, Bash, etc.)
Returns results to parent
Can run in background

Gemini CLI

Limited native - Experimental YOLO mode
Via MCP - PAL server enables cross-CLI spawning
Can spawn Claude Code as subagent and vice versa

OpenCode

Native support - Primary agents spawn subagents
Session forking with parent tracking
@agent-name mentions for manual invocation
Configurable agent modes (primary/subagent/all)

Codex CLI

Recent addition - Multi-conversation agent control
Can run as MCP server for other agents
Sandbox restrictions complicate spawning

Cross-Agent Implication: MCP is the universal bridge. Any agent can spawn any other via MCP server pattern.

3. File System Access

Critical for skills - can agents read skill files?

Claude Code

Full filesystem access
No path restrictions
User permission level

Gemini CLI

Full filesystem access
Known issue: ReadFile restricts to workspace directories
Symlinked paths (like ~/.claude/skills/) may be blocked
Workaround: use shell cat command

OpenCode

Configurable permissions per operation
external_directory permission for outside-project access
Granular: ask/allow/deny per operation type

Codex CLI

Sandboxed by default - workspace-only writes
Network disabled by default
--add-dir for selective access
danger-full-access mode available

Cross-Agent Implication: Gemini's path restrictions (skills-bo8) are the main blocker. Skills need to be in workspace or use shell workarounds.

4. CLI Tool Execution

Critical for skills using helper scripts.

Agent	Method	Safety
Claude Code	Bash tool	Approval prompts
Gemini CLI	`run_shell_command` (bash -c)	Approval prompts
OpenCode	Bash tool with glob patterns	Granular permissions
Codex CLI	`!` prefix or shell tool	Sandbox + approval

All agents can run CLI tools. Key differences:

Codex has OS-level sandboxing
OpenCode has most granular permission patterns
Gemini and Claude have similar approval models

5. State Persistence

Critical for quality gates - need to track review status.

Agent	Session Storage	Cross-Session Memory
Claude Code	Yes	CLAUDE.md
Gemini CLI	Auto-save to ~/.gemini/	GEMINI.md, save_memory tool
OpenCode	Multi-level	Config files
Codex CLI	Client-side only	~/.codex/sessions/

Cross-Agent Implication: For cross-agent state, need external store (jwz, beads, or simple files). Agent-native persistence is agent-specific.

6. Built-in Tools

Claude Code (~10 tools)

Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch, TodoWrite, NotebookEdit

Gemini CLI (12+ tools)

read_file, write_file, replace, glob, list_directory, run_shell_command, web_fetch, google_web_search, save_memory, write_todos, codebase_investigator, search_file_content, read_many_files

OpenCode

File ops, Bash, Web fetch, LSP integration, Git/VCS, PTY management, MCP servers, Custom plugins

Codex CLI

read_file, list_dir, glob_file_search, apply_patch, git, rg (search), shell, todo_write, web_search, image support

Parity: All have core file/shell/search. Differences in naming and extras (LSP, code execution, etc.)

Cross-Agent Quality Gate Analysis

What Works Across All Agents

Component	Approach
State storage	External CLI tool (jwz, bd, or file-based)
Reviewer invocation	Any agent can spawn reviewer via Bash/MCP
Issue tracking	External CLI (beads, tissue)
Second opinions	orch works from any agent with Bash

What Doesn't Work Across Agents

Component	Problem
Stop hook	Claude/Gemini only - no equivalent in OpenCode/Codex
Mechanical blocking	Can't prevent exit without hooks
Native subagents	Different spawning mechanisms

Recommended Cross-Agent Pattern

┌─────────────────────────────────────────────┐
│  Orchestrator (any agent)                    │
│                                              │
│  1. Start work                               │
│  2. Spawn worker agent (via MCP/Bash)        │
│  3. Worker completes, posts to state store   │
│  4. Orchestrator spawns reviewer             │
│  5. Reviewer posts APPROVED/ISSUES           │
│  6. Orchestrator checks state, gates exit    │
└─────────────────────────────────────────────┘

The orchestrator enforces the gate, not hooks. Works with any agent.

Recommendations

For Portable Skills

Use SKILL.md format - All agents can read markdown
Avoid agent-specific features - No hooks in skill logic
CLI tools for actions - All agents can run Bash
External state - beads/jwz for cross-agent coordination

For Quality Gates

Orchestrator pattern - Gate logic in orchestrator, not hooks
Protocol-based - Agents follow instructions, post to state store
Hybrid - Use hooks where available (Claude/Gemini), protocol elsewhere

For Subagent Research Sandbox (skills-ut4)

Agent	Sandbox Approach
Claude Code	No native sandbox - rely on Task agent type restrictions
Gemini CLI	`excludeTools` setting to disable write tools
OpenCode	Permission config: `edit: "deny", bash: "deny"`
Codex CLI	Native sandbox - best support for read-only research

Open Questions

Can MCP server sandboxing be enforced by the server, not client?
Is there a standard for "read-only agent" across platforms?
Should we build a universal sandbox wrapper script?
How do we handle agents that ignore protocol instructions?

9.8 KiB Raw Blame History

Agent Capability Matrix

Overview

Capability Matrix

Core Features

Context & Models

Security & Sandboxing

Detailed Breakdown

1. Hooks / Lifecycle Events

Claude Code

Gemini CLI

OpenCode

Codex CLI

2. Subagent Spawning

Claude Code

Gemini CLI

OpenCode

Codex CLI

3. File System Access

Claude Code

Gemini CLI

OpenCode

Codex CLI

4. CLI Tool Execution

5. State Persistence

6. Built-in Tools

Claude Code (~10 tools)

Gemini CLI (12+ tools)

OpenCode

Codex CLI

Cross-Agent Quality Gate Analysis

What Works Across All Agents

What Doesn't Work Across Agents

Recommended Cross-Agent Pattern

Recommendations

For Portable Skills

For Quality Gates

For Subagent Research Sandbox (skills-ut4)

Open Questions

References

9.8 KiB

Raw Blame History