docs: add agent capability matrix for cross-agent design

Comprehensive comparison of Claude Code, Gemini CLI, OpenCode, and Codex:
- Hooks/lifecycle events (Claude/Gemini best, OpenCode most comprehensive)
- Subagent spawning (MCP is universal bridge)
- File access (Gemini has path restrictions - skills-bo8)
- Sandboxing (Codex has OS-level, others approval-based)
- State persistence (need external store for cross-agent)

Key finding: Orchestrator pattern works across all agents.
Stop hooks only in Claude/Gemini - others need protocol-based gates.

Closes: skills-fqu

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
dan 2026-01-09 17:32:17 -08:00
parent c94def1c61
commit ec2d856c05
2 changed files with 293 additions and 1 deletions

View file

@ -80,7 +80,7 @@
{"id":"skills-ebl","title":"Benchmark vision model UI understanding","description":"## Goal\nMeasure how well vision models can answer UI questions from screenshots.\n\n## Test cases\n1. **Element location**: \"Where is the Save button?\" → coordinates\n2. **Element identification**: \"What buttons are visible?\" → list\n3. **State detection**: \"Is the checkbox checked?\" → boolean\n4. **Text extraction**: \"What does the error message say?\" → text\n5. **Layout understanding**: \"What's in the sidebar?\" → structure\n\n## Metrics\n- Accuracy: Does the answer match ground truth?\n- Precision: How close are coordinates to actual element centers?\n- Latency: Time from query to response\n- Cost: Tokens consumed per query\n\n## Prompt engineering questions\n- Does adding a grid overlay help coordinate precision?\n- What prompt format gives most actionable coordinates?\n- Can we get bounding boxes vs point coordinates?\n\n## Comparison baseline\n- Manual annotation of test screenshots\n- AT-SPI data (once enabled) as ground truth\n\n## Depends on\n- Test screenshots from real apps\n- Ground truth annotations","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-17T14:13:10.038933798-08:00","updated_at":"2025-12-29T15:26:19.822655148-05:00","closed_at":"2025-12-29T15:26:19.822655148-05:00","close_reason":"Benchmark complete. Vision models excellent for semantic understanding, approximate for coordinates. Recommend hybrid AT-SPI + vision. See docs/research/vision-ui-benchmark-2025-12-29.md"}
{"id":"skills-f2p","title":"Skills + Molecules Integration","description":"Integrate skills with beads molecules system.\n\nDesign work tracked in dotfiles (dotfiles-jjb).\n\nComponents:\n- Checklist support (lightweight skills)\n- Audit integration (bd audit for skill execution)\n- Skill frontmatter for triggers/tracking\n- Proto packaging alongside skills\n\nSee: ~/proj/dotfiles ADR work","status":"closed","priority":2,"issue_type":"epic","created_at":"2025-12-23T17:58:55.999438985-05:00","updated_at":"2025-12-23T19:22:38.577280129-05:00","closed_at":"2025-12-23T19:22:38.577280129-05:00","close_reason":"Superseded by skills-4u0 (migrated from dotfiles)","dependencies":[{"issue_id":"skills-f2p","depends_on_id":"skills-vpy","type":"blocks","created_at":"2025-12-23T17:59:17.976956454-05:00","created_by":"daemon"},{"issue_id":"skills-f2p","depends_on_id":"skills-u3d","type":"blocks","created_at":"2025-12-23T17:59:18.015216054-05:00","created_by":"daemon"}]}
{"id":"skills-fo3","title":"Compare WORKFLOWS.md with upstream","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-03T20:15:54.283175561-08:00","updated_at":"2025-12-03T20:19:28.897037199-08:00","closed_at":"2025-12-03T20:19:28.897037199-08:00","dependencies":[{"issue_id":"skills-fo3","depends_on_id":"skills-ebh","type":"discovered-from","created_at":"2025-12-03T20:15:54.286009672-08:00","created_by":"daemon","metadata":"{}"}]}
{"id":"skills-fqu","title":"Research: Agent capability matrix","description":"Document what each agent can and cannot do for cross-agent design decisions.\n\nAgents to cover:\n- Claude Code (claude CLI)\n- Gemini (gemini CLI / AI Studio)\n- OpenCode\n- Codex (OpenAI)\n\nCapabilities to assess:\n- Hooks / lifecycle events\n- Subagent spawning\n- File system access (paths, restrictions)\n- CLI tool execution\n- State persistence\n- Context window / memory\n\nOutput: Matrix showing capability parity and gaps","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-09T17:14:20.541961958-08:00","created_by":"dan","updated_at":"2026-01-09T17:14:20.541961958-08:00"}
{"id":"skills-fqu","title":"Research: Agent capability matrix","description":"Document what each agent can and cannot do for cross-agent design decisions.\n\nAgents to cover:\n- Claude Code (claude CLI)\n- Gemini (gemini CLI / AI Studio)\n- OpenCode\n- Codex (OpenAI)\n\nCapabilities to assess:\n- Hooks / lifecycle events\n- Subagent spawning\n- File system access (paths, restrictions)\n- CLI tool execution\n- State persistence\n- Context window / memory\n\nOutput: Matrix showing capability parity and gaps","status":"in_progress","priority":2,"issue_type":"task","created_at":"2026-01-09T17:14:20.541961958-08:00","created_by":"dan","updated_at":"2026-01-09T17:16:13.794753056-08:00"}
{"id":"skills-fvc","title":"Code Review: {{target}}","description":"Multi-lens code review workflow for {{target}}.\n\n## Philosophy\nThe LLM stays in the loop at every step - this is agent-assisted review, not automated parsing. The agent applies judgment about what's worth filing, how to prioritize, and what context to include.\n\n## Variables\n- target: File or directory to review\n\n## Workflow\n1. Explore codebase to find candidates (if target is directory)\n2. Run lenses via orch consensus for multi-model perspective\n3. Analyze findings - LLM synthesizes across lenses and models\n4. File issues with judgment - group related, set priorities, add context\n5. Summarize for digest\n\n## Lenses Available\n- bloat: size, complexity, SRP violations\n- smells: readability, naming, control flow\n- dead-code: unused, unreachable, obsolete\n- redundancy: duplication, YAGNI, parallel systems","status":"closed","priority":2,"issue_type":"epic","created_at":"2025-12-25T10:10:57.652098447-05:00","updated_at":"2025-12-26T23:22:41.408582818-05:00","closed_at":"2025-12-26T23:22:41.408582818-05:00","close_reason":"Replaced by /code-review skill","labels":["template"]}
{"id":"skills-fvc.1","title":"Run bloat lens on {{target}}","description":"Run bloat review lens via orch:\n\n```bash\norch consensus \"$(cat ~/.config/lenses/bloat.md)\" flash gemini --file {{target}} --mode open\n```\n\nLook for: file size, function length, complexity, SRP violations.\nRecord findings for later filing.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-25T10:13:59.789715667-05:00","updated_at":"2025-12-26T23:22:41.416754154-05:00","closed_at":"2025-12-26T23:22:41.416754154-05:00","close_reason":"Replaced by /code-review skill","dependencies":[{"issue_id":"skills-fvc.1","depends_on_id":"skills-fvc","type":"parent-child","created_at":"2025-12-25T10:13:59.80248308-05:00","created_by":"daemon"}]}
{"id":"skills-fvc.2","title":"Run smells lens on {{target}}","description":"Run smells review lens via orch:\n\n```bash\norch consensus \"$(cat ~/.config/lenses/smells.md)\" flash gemini --file {{target}} --mode open\n```\n\nLook for: naming issues, control flow smells, data smells, structural issues.\nRecord findings for later filing.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-25T10:16:13.977562568-05:00","updated_at":"2025-12-26T23:22:41.423564011-05:00","closed_at":"2025-12-26T23:22:41.423564011-05:00","close_reason":"Replaced by /code-review skill","dependencies":[{"issue_id":"skills-fvc.2","depends_on_id":"skills-fvc","type":"parent-child","created_at":"2025-12-25T10:16:13.989662453-05:00","created_by":"daemon"}]}
@ -128,6 +128,7 @@
{"id":"skills-ty7","title":"Define trace levels (audit vs debug)","description":"Two trace levels to manage noise vs utility:\n\n1. Audit trace (minimal, safe, always on):\n - skill id/ref, start/end\n - high-level checkpoints\n - artifact hashes/paths\n - exit status\n\n2. Debug trace (opt-in, verbose):\n - tool calls with args\n - stdout/stderr snippets\n - expanded inputs\n - timing details\n\nConsider OpenTelemetry span model as reference.\nGPT proposed this; Gemini focused on rotation/caps instead.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-23T19:49:48.514684945-05:00","updated_at":"2025-12-29T13:55:35.838961236-05:00","closed_at":"2025-12-29T13:55:35.838961236-05:00","close_reason":"Parked with ADR-001: skills-molecules integration deferred. Current simpler approach (skills as standalone) works well. Revisit when complex orchestration needed."}
{"id":"skills-u3d","title":"Define skill trigger conditions","description":"How does an agent know WHEN to apply a skill/checklist?\n\nOptions:\n- frontmatter triggers: field with patterns\n- File-based detection\n- Agent judgment from description\n- Beads hooks on state transitions\n- LLM-based pattern detection","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T17:59:09.69468767-05:00","updated_at":"2025-12-28T22:25:38.579989006-05:00","closed_at":"2025-12-28T22:25:38.579989006-05:00","close_reason":"Resolved: agent judgment from description is the standard. Good descriptions + 'When to Use' sections are sufficient. No new trigger mechanism needed - would add complexity without clear benefit."}
{"id":"skills-uan","title":"worklog: merge Guidelines and Remember sections","description":"Guidelines (8 points) and Remember (6 points) sections overlap significantly - both emphasize comprehensiveness, future context, semantic compression. Consolidate into single principles list. Found by bloat lens review.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-25T02:03:16.148596791-05:00","updated_at":"2025-12-27T10:05:51.527595332-05:00","closed_at":"2025-12-27T10:05:51.527595332-05:00","close_reason":"Closed"}
{"id":"skills-ut4","title":"Investigate: Sandbox for research-only subagents","description":"Can we ensure research/explore subagents run in a restricted sandbox?\n\n## Context\nWhen spawning subagents for research tasks (codebase exploration, web search, reading files), they should be read-only and sandboxed - no writes, no destructive commands.\n\n## Questions to Answer\n1. Does Claude Code Task tool support sandbox restrictions for subagents?\n2. Can we pass sandbox mode to Gemini CLI subagents?\n3. How does OpenCode's permission system work for subagents?\n4. Can Codex subagents inherit sandbox restrictions?\n\n## Desired Behavior\n- Research subagent can: Read, Grep, Glob, WebFetch, WebSearch\n- Research subagent cannot: Write, Edit, Bash (destructive), delete\n\n## Security Benefit\nPrevents research tasks from accidentally (or maliciously) modifying files or running destructive commands.\n\nRelated: Cross-agent quality gate architecture (skills-3ja)","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-09T17:31:05.49739394-08:00","created_by":"dan","updated_at":"2026-01-09T17:31:05.49739394-08:00"}
{"id":"skills-uz4","title":"Compare RESUMABILITY.md with upstream","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-03T20:15:54.897754095-08:00","updated_at":"2025-12-03T20:19:29.384645842-08:00","closed_at":"2025-12-03T20:19:29.384645842-08:00","dependencies":[{"issue_id":"skills-uz4","depends_on_id":"skills-ebh","type":"discovered-from","created_at":"2025-12-03T20:15:54.899671178-08:00","created_by":"daemon","metadata":"{}"}]}
{"id":"skills-vb5","title":"Resolve web search design questions","description":"web_search_brainstorm.md has unanswered design questions: single smart skill vs explicit flags, specific sources priority, raw links vs summaries. Need user input to finalize web-search/web-research direction.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:33.482270742-08:00","updated_at":"2025-12-28T22:21:05.814118092-05:00","closed_at":"2025-12-28T22:21:05.814118092-05:00","close_reason":"Resolved: keep 2 skills, web-search for OpenCode only (Claude has built-in), web-research for both. Source filtering via WebSearch domains. Summaries by default."}
{"id":"skills-vjm","title":"Refactor update-agent-context.sh: reduce nesting depth","description":"File: .specify/scripts/bash/update-agent-context.sh\n\nIssues:\n- update_existing_agent_file() has 4-level deep nesting (lines 360-499)\n- State machine with multiple variables: in_tech_section, in_changes_section, tech_entries_added\n- 70+ lines of while loop processing\n\nFix:\n- Extract file processing to separate function\n- Consider sed/awk for line-based transformations\n- Use guard clauses to reduce nesting\n\nSeverity: HIGH","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-24T02:50:57.874439288-05:00","updated_at":"2025-12-25T01:44:58.38265672-05:00","closed_at":"2025-12-25T01:44:58.38265672-05:00","close_reason":"update-agent-context.sh is .specify upstream code, not maintained here"}

View file

@ -0,0 +1,291 @@
# Agent Capability Matrix
> **Date:** 2026-01-09
> **Status:** Research complete
> **Related:** [skills-fqu](../../.beads/), [Cross-agent epic](skills-hf1)
## Overview
Comparison of AI coding agent capabilities for cross-agent skill portability and quality gate design.
| Agent | Vendor | Open Source | Primary Language |
|-------|--------|-------------|------------------|
| **Claude Code** | Anthropic | No | TypeScript |
| **Gemini CLI** | Google | Yes (Apache 2.0) | TypeScript |
| **OpenCode** | OpenCode | Yes | Go + TypeScript |
| **Codex CLI** | OpenAI | Yes | Rust |
## Capability Matrix
### Core Features
| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **Hooks/Lifecycle** | ✅ 9 events | ✅ 8+ events | ✅ 32+ events | ⚠️ Limited |
| **Subagent Spawning** | ✅ Task tool | ⚠️ Via MCP | ✅ Native | ⚠️ Recent |
| **File System Access** | ✅ Full | ✅ Full | ✅ Configurable | 🔒 Sandboxed |
| **Bash Execution** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **State Persistence** | ✅ Sessions | ✅ Auto-save | ✅ Multi-level | ✅ Client-side |
| **MCP Support** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **Web Search** | ✅ Built-in | ✅ Grounded | ✅ Via tool | ✅ Built-in |
### Context & Models
| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **Context Window** | 200K tokens | 1M (2M soon) | Model-dependent | 128-400K |
| **Auto Compaction** | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| **Model Support** | Claude only | Gemini only | 75+ providers | GPT-5.x |
| **Custom Models** | ❌ No | ❌ No | ✅ Yes | ❌ No |
### Security & Sandboxing
| Capability | Claude Code | Gemini CLI | OpenCode | Codex CLI |
|------------|-------------|------------|----------|-----------|
| **OS Sandboxing** | ❌ No | ❌ No | ❌ No | ✅ Yes |
| **Permission System** | ✅ Approval | ✅ Approval | ✅ Granular | ✅ Approval |
| **Tool Restrictions** | ❌ No | ✅ excludeTools | ✅ Per-tool | ✅ Sandbox modes |
| **Path Restrictions** | ❌ No | ❌ No | ✅ external_directory | ✅ Workspace only |
---
## Detailed Breakdown
### 1. Hooks / Lifecycle Events
**Critical for quality gates** - hooks enable mechanical enforcement.
#### Claude Code
```
SessionStart, SessionEnd, PreToolUse, PostToolUse,
PreCompact, UserPromptSubmit, Stop, SubagentStop, Notification
```
- Executable scripts via hooks.json
- Timeout configurable per hook
- Can block operations (Stop hook for quality gates)
#### Gemini CLI
```
SessionStart, SessionEnd, PreCompress, BeforeModel, AfterModel,
BeforeToolSelection, Notification
```
- Similar to Claude Code architecture
- Scripts or npm plugin packages
- Disabled by default for security
#### OpenCode
```
32+ events including:
session.*, file.*, command.*, permission.*, message.*,
todo.*, lsp.*, pty.*, tui.*
```
- Most comprehensive event system
- JS/TS modules (not shell scripts)
- Plugin-based architecture
#### Codex CLI
```
Limited: notification hooks, approval hooks, event loop lifecycle
```
- Less mature hook system
- Community requesting more hooks
- Tool-call events with approval flags
**Cross-Agent Implication:** Only Claude Code and Gemini have Stop-equivalent hooks for blocking exit. OpenCode and Codex would need protocol-based enforcement.
---
### 2. Subagent Spawning
**Critical for orchestrator pattern** - can any agent spawn others?
#### Claude Code
- **Task tool** - Native subagent spawning
- Multiple agent types (Explore, Plan, Bash, etc.)
- Returns results to parent
- Can run in background
#### Gemini CLI
- **Limited native** - Experimental YOLO mode
- **Via MCP** - PAL server enables cross-CLI spawning
- Can spawn Claude Code as subagent and vice versa
#### OpenCode
- **Native support** - Primary agents spawn subagents
- Session forking with parent tracking
- `@agent-name` mentions for manual invocation
- Configurable agent modes (primary/subagent/all)
#### Codex CLI
- **Recent addition** - Multi-conversation agent control
- Can run as MCP server for other agents
- Sandbox restrictions complicate spawning
**Cross-Agent Implication:** MCP is the universal bridge. Any agent can spawn any other via MCP server pattern.
---
### 3. File System Access
**Critical for skills** - can agents read skill files?
#### Claude Code
- Full filesystem access
- No path restrictions
- User permission level
#### Gemini CLI
- Full filesystem access
- **Known issue:** ReadFile restricts to workspace directories
- Symlinked paths (like ~/.claude/skills/) may be blocked
- Workaround: use shell `cat` command
#### OpenCode
- Configurable permissions per operation
- `external_directory` permission for outside-project access
- Granular: ask/allow/deny per operation type
#### Codex CLI
- **Sandboxed by default** - workspace-only writes
- Network disabled by default
- `--add-dir` for selective access
- `danger-full-access` mode available
**Cross-Agent Implication:** Gemini's path restrictions (skills-bo8) are the main blocker. Skills need to be in workspace or use shell workarounds.
---
### 4. CLI Tool Execution
**Critical for skills using helper scripts.**
| Agent | Method | Safety |
|-------|--------|--------|
| Claude Code | Bash tool | Approval prompts |
| Gemini CLI | `run_shell_command` (bash -c) | Approval prompts |
| OpenCode | Bash tool with glob patterns | Granular permissions |
| Codex CLI | `!` prefix or shell tool | Sandbox + approval |
All agents can run CLI tools. Key differences:
- Codex has OS-level sandboxing
- OpenCode has most granular permission patterns
- Gemini and Claude have similar approval models
---
### 5. State Persistence
**Critical for quality gates** - need to track review status.
| Agent | Session Storage | Cross-Session Memory |
|-------|-----------------|---------------------|
| Claude Code | Yes | CLAUDE.md |
| Gemini CLI | Auto-save to ~/.gemini/ | GEMINI.md, save_memory tool |
| OpenCode | Multi-level | Config files |
| Codex CLI | Client-side only | ~/.codex/sessions/ |
**Cross-Agent Implication:** For cross-agent state, need external store (jwz, beads, or simple files). Agent-native persistence is agent-specific.
---
### 6. Built-in Tools
#### Claude Code (~10 tools)
Read, Write, Edit, Glob, Grep, Bash, Task, WebFetch, WebSearch, TodoWrite, NotebookEdit
#### Gemini CLI (12+ tools)
read_file, write_file, replace, glob, list_directory, run_shell_command,
web_fetch, google_web_search, save_memory, write_todos, codebase_investigator,
search_file_content, read_many_files
#### OpenCode
File ops, Bash, Web fetch, LSP integration, Git/VCS, PTY management, MCP servers, Custom plugins
#### Codex CLI
read_file, list_dir, glob_file_search, apply_patch, git, rg (search),
shell, todo_write, web_search, image support
**Parity:** All have core file/shell/search. Differences in naming and extras (LSP, code execution, etc.)
---
## Cross-Agent Quality Gate Analysis
### What Works Across All Agents
| Component | Approach |
|-----------|----------|
| **State storage** | External CLI tool (jwz, bd, or file-based) |
| **Reviewer invocation** | Any agent can spawn reviewer via Bash/MCP |
| **Issue tracking** | External CLI (beads, tissue) |
| **Second opinions** | orch works from any agent with Bash |
### What Doesn't Work Across Agents
| Component | Problem |
|-----------|---------|
| **Stop hook** | Claude/Gemini only - no equivalent in OpenCode/Codex |
| **Mechanical blocking** | Can't prevent exit without hooks |
| **Native subagents** | Different spawning mechanisms |
### Recommended Cross-Agent Pattern
```
┌─────────────────────────────────────────────┐
│ Orchestrator (any agent) │
│ │
│ 1. Start work │
│ 2. Spawn worker agent (via MCP/Bash) │
│ 3. Worker completes, posts to state store │
│ 4. Orchestrator spawns reviewer │
│ 5. Reviewer posts APPROVED/ISSUES │
│ 6. Orchestrator checks state, gates exit │
└─────────────────────────────────────────────┘
```
The **orchestrator** enforces the gate, not hooks. Works with any agent.
---
## Recommendations
### For Portable Skills
1. **Use SKILL.md format** - All agents can read markdown
2. **Avoid agent-specific features** - No hooks in skill logic
3. **CLI tools for actions** - All agents can run Bash
4. **External state** - beads/jwz for cross-agent coordination
### For Quality Gates
1. **Orchestrator pattern** - Gate logic in orchestrator, not hooks
2. **Protocol-based** - Agents follow instructions, post to state store
3. **Hybrid** - Use hooks where available (Claude/Gemini), protocol elsewhere
### For Subagent Research Sandbox (skills-ut4)
| Agent | Sandbox Approach |
|-------|------------------|
| Claude Code | No native sandbox - rely on Task agent type restrictions |
| Gemini CLI | `excludeTools` setting to disable write tools |
| OpenCode | Permission config: `edit: "deny", bash: "deny"` |
| Codex CLI | Native sandbox - best support for read-only research |
---
## Open Questions
1. Can MCP server sandboxing be enforced by the server, not client?
2. Is there a standard for "read-only agent" across platforms?
3. Should we build a universal sandbox wrapper script?
4. How do we handle agents that ignore protocol instructions?
---
## References
- [Claude Code Docs](https://docs.anthropic.com/claude-code)
- [Gemini CLI GitHub](https://github.com/google-gemini/gemini-cli)
- [OpenCode Docs](https://opencode.ai/docs/)
- [Codex CLI Docs](https://developers.openai.com/codex/cli/)
- [MCP Specification](https://modelcontextprotocol.io/)