skills/docs/research/agent-capability-matrix.md
dan 8c033eedd1 docs: add Gemini path fix (includeDirectories setting)
Gemini CLI can access ~/.claude/skills/ via:
  settings.json: { "context": { "includeDirectories": ["~/.claude/skills"] } }
  or CLI: gemini --include-directories ~/.claude/skills

Closes: skills-8nl, skills-bo8

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 19:35:28 -08:00

9.9 KiB

Agent Capability Matrix

Date: 2026-01-09 Status: Research complete Related: skills-fqu, Cross-agent epic

Overview

Comparison of AI coding agent capabilities for cross-agent skill portability and quality gate design.

Agent Vendor Open Source Primary Language
Claude Code Anthropic No TypeScript
Gemini CLI Google Yes (Apache 2.0) TypeScript
OpenCode OpenCode Yes Go + TypeScript
Codex CLI OpenAI Yes Rust

Capability Matrix

Core Features

Capability Claude Code Gemini CLI OpenCode Codex CLI
Hooks/Lifecycle 9 events 8+ events 32+ events ⚠️ Limited
Subagent Spawning Task tool ⚠️ Via MCP Native ⚠️ Recent
File System Access Full Full Configurable 🔒 Sandboxed
Bash Execution Yes Yes Yes Yes
State Persistence Sessions Auto-save Multi-level Client-side
MCP Support Yes Yes Yes Yes
Web Search Built-in Grounded Via tool Built-in

Context & Models

Capability Claude Code Gemini CLI OpenCode Codex CLI
Context Window 200K tokens 1M (2M soon) Model-dependent 128-400K
Auto Compaction Yes Yes Yes Yes
Model Support Claude only Gemini only 75+ providers GPT-5.x
Custom Models No No Yes No

Security & Sandboxing

Capability Claude Code Gemini CLI OpenCode Codex CLI
OS Sandboxing No No No Yes
Permission System Approval Approval Granular Approval
Tool Restrictions No excludeTools Per-tool Sandbox modes
Path Restrictions No No external_directory Workspace only

Detailed Breakdown

1. Hooks / Lifecycle Events

Critical for quality gates - hooks enable mechanical enforcement.

Claude Code

SessionStart, SessionEnd, PreToolUse, PostToolUse,
PreCompact, UserPromptSubmit, Stop, SubagentStop, Notification
  • Executable scripts via hooks.json
  • Timeout configurable per hook
  • Can block operations (Stop hook for quality gates)

Gemini CLI

SessionStart, SessionEnd, PreCompress, BeforeModel, AfterModel,
BeforeToolSelection, Notification
  • Similar to Claude Code architecture
  • Scripts or npm plugin packages
  • Disabled by default for security

OpenCode

32+ events including:
session.*, file.*, command.*, permission.*, message.*,
todo.*, lsp.*, pty.*, tui.*
  • Most comprehensive event system
  • JS/TS modules (not shell scripts)
  • Plugin-based architecture

Codex CLI

Limited: notification hooks, approval hooks, event loop lifecycle
  • Less mature hook system
  • Community requesting more hooks
  • Tool-call events with approval flags

Cross-Agent Implication: Only Claude Code and Gemini have Stop-equivalent hooks for blocking exit. OpenCode and Codex would need protocol-based enforcement.


2. Subagent Spawning

Critical for orchestrator pattern - can any agent spawn others?

Claude Code

  • Task tool - Native subagent spawning
  • Multiple agent types (Explore, Plan, Bash, etc.)
  • Returns results to parent
  • Can run in background

Gemini CLI

  • Limited native - Experimental YOLO mode
  • Via MCP - PAL server enables cross-CLI spawning
  • Can spawn Claude Code as subagent and vice versa

OpenCode

  • Native support - Primary agents spawn subagents
  • Session forking with parent tracking
  • @agent-name mentions for manual invocation
  • Configurable agent modes (primary/subagent/all)

Codex CLI

  • Recent addition - Multi-conversation agent control
  • Can run as MCP server for other agents
  • Sandbox restrictions complicate spawning

Cross-Agent Implication: MCP is the universal bridge. Any agent can spawn any other via MCP server pattern.


3. File System Access

Critical for skills - can agents read skill files?

Claude Code

  • Full filesystem access
  • No path restrictions
  • User permission level

Gemini CLI

  • Full filesystem access
  • Known issue: ReadFile restricts to workspace directories
  • Symlinked paths (like ~/.claude/skills/) may be blocked
  • Fix: Add to ~/.gemini/settings.json:
    { "context": { "includeDirectories": ["~/.claude/skills"] } }
    
  • Or use gemini --include-directories ~/.claude/skills

OpenCode

  • Configurable permissions per operation
  • external_directory permission for outside-project access
  • Granular: ask/allow/deny per operation type

Codex CLI

  • Sandboxed by default - workspace-only writes
  • Network disabled by default
  • --add-dir for selective access
  • danger-full-access mode available

Cross-Agent Implication: Gemini's path restrictions (skills-bo8) are the main blocker. Skills need to be in workspace or use shell workarounds.


4. CLI Tool Execution

Critical for skills using helper scripts.

Agent Method Safety
Claude Code Bash tool Approval prompts
Gemini CLI run_shell_command (bash -c) Approval prompts
OpenCode Bash tool with glob patterns Granular permissions
Codex CLI ! prefix or shell tool Sandbox + approval

All agents can run CLI tools. Key differences:

  • Codex has OS-level sandboxing
  • OpenCode has most granular permission patterns
  • Gemini and Claude have similar approval models

5. State Persistence

Critical for quality gates - need to track review status.

Agent Session Storage Cross-Session Memory
Claude Code Yes CLAUDE.md
Gemini CLI Auto-save to ~/.gemini/ GEMINI.md, save_memory tool
OpenCode Multi-level Config files
Codex CLI Client-side only ~/.codex/sessions/

Cross-Agent Implication: For cross-agent state, need external store (jwz, beads, or simple files). Agent-native persistence is agent-specific.


6. Built-in Tools

Claude Code (~10 tools)

Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch, TodoWrite, NotebookEdit

Gemini CLI (12+ tools)

read_file, write_file, replace, glob, list_directory, run_shell_command, web_fetch, google_web_search, save_memory, write_todos, codebase_investigator, search_file_content, read_many_files

OpenCode

File ops, Bash, Web fetch, LSP integration, Git/VCS, PTY management, MCP servers, Custom plugins

Codex CLI

read_file, list_dir, glob_file_search, apply_patch, git, rg (search), shell, todo_write, web_search, image support

Parity: All have core file/shell/search. Differences in naming and extras (LSP, code execution, etc.)


Cross-Agent Quality Gate Analysis

What Works Across All Agents

Component Approach
State storage External CLI tool (jwz, bd, or file-based)
Reviewer invocation Any agent can spawn reviewer via Bash/MCP
Issue tracking External CLI (beads, tissue)
Second opinions orch works from any agent with Bash

What Doesn't Work Across Agents

Component Problem
Stop hook Claude/Gemini only - no equivalent in OpenCode/Codex
Mechanical blocking Can't prevent exit without hooks
Native subagents Different spawning mechanisms
┌─────────────────────────────────────────────┐
│  Orchestrator (any agent)                    │
│                                              │
│  1. Start work                               │
│  2. Spawn worker agent (via MCP/Bash)        │
│  3. Worker completes, posts to state store   │
│  4. Orchestrator spawns reviewer             │
│  5. Reviewer posts APPROVED/ISSUES           │
│  6. Orchestrator checks state, gates exit    │
└─────────────────────────────────────────────┘

The orchestrator enforces the gate, not hooks. Works with any agent.


Recommendations

For Portable Skills

  1. Use SKILL.md format - All agents can read markdown
  2. Avoid agent-specific features - No hooks in skill logic
  3. CLI tools for actions - All agents can run Bash
  4. External state - beads/jwz for cross-agent coordination

For Quality Gates

  1. Orchestrator pattern - Gate logic in orchestrator, not hooks
  2. Protocol-based - Agents follow instructions, post to state store
  3. Hybrid - Use hooks where available (Claude/Gemini), protocol elsewhere

For Subagent Research Sandbox (skills-ut4)

Agent Sandbox Approach
Claude Code No native sandbox - rely on Task agent type restrictions
Gemini CLI excludeTools setting to disable write tools
OpenCode Permission config: edit: "deny", bash: "deny"
Codex CLI Native sandbox - best support for read-only research

Open Questions

  1. Can MCP server sandboxing be enforced by the server, not client?
  2. Is there a standard for "read-only agent" across platforms?
  3. Should we build a universal sandbox wrapper script?
  4. How do we handle agents that ignore protocol instructions?

References