docs: web research on cross-agent patterns (via orch)

Key findings from gemini --websearch: - Manager-Worker orchestration (Maestro pattern) - alice/idle adversarial review gates (emes) - Git-as-state for agent coordination - tissue for machine-first issue tracking - Circuit breakers: semantic drift, three-strike, budget limits - Sandboxing: Wasm and Docker playgrounds Validates our direction: beads, orch, file-based coordination. Gaps: orchestrator-enforced gates, agent messaging, sandboxing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 17:50:37 -08:00 · 2026-01-09 17:50:37 -08:00 · c14075ae7e
parent e367be6cb6
commit c14075ae7e
1 changed files with 227 additions and 0 deletions
--- a/docs/research/cross-agent-patterns-web-research.md
+++ b/docs/research/cross-agent-patterns-web-research.md
@ -0,0 +1,227 @@
+# Cross-Agent Patterns: Web Research Synthesis
+
+> **Date:** 2026-01-09
+> **Method:** orch web research (gemini --websearch)
+> **Related:** [skills-hf1](../../.beads/) (Cross-agent epic)
+
+## Executive Summary
+
+Multi-agent AI coding has shifted from single "do it all" agents to **specialized agent teams** with distinct roles. Key patterns emerging in 2025-2026:
+
+1. **Manager-Worker orchestration** - Central agent delegates to specialists
+2. **Adversarial review gates** - Separate reviewer blocks completion until approved
+3. **Git-as-state** - Repository is source of truth for agent coordination
+4. **Machine-first issue tracking** - File-based, no API needed (tissue)
+5. **Circuit breakers** - Prevent infinite loops via semantic drift detection
+
+---
+
+## 1. Multi-Agent Orchestration Patterns
+
+### The "Maestro" Pattern (Centralized Orchestration)
+
+A central Manager agent breaks down features into atomic tasks and delegates to Workers.
+
+**Workflow:**
+1. **Manager (e.g., Claude Sonnet):** Reads SPEC.md, creates plan.md
+2. **Worker A (e.g., Gemini CLI):** "Scaffold database schema" - writes code, tests, reports back
+3. **Worker B (e.g., OpenCode):** "Write API endpoints" - waits for A, then executes
+
+**Tooling:** Maestro CLI orchestrates Claude Code, Codex, local LLMs via `orchestrator.md`.
+
+### The "Inspector-Fixer" Loop (Dual-Agent)
+
+Strict separation: one agent writes, one agent finds faults. Never switch roles.
+
+- **Fixer (Claude Code):** Monitors `todo.md`, implements tasks, moves to `done.md`
+- **Inspector (Gemini Pro):** Scans codebase, appends issues to `todo.md` (never fixes)
+
+**Key insight:** Inspector creates work for Fixer, not the other way around.
+
+---
+
+## 2. Adversarial Quality Gates
+
+### The alice/idle Pattern (emes)
+
+From `evil-mind-evil-sword/idle` - review gate for autonomous agents.
+
+**Mechanism:**
+1. **Stop Hook:** Intercepts agent's exit/finish signal
+2. **Idle State:** Forces mandatory pause before completion
+3. **Alice (Reviewer):** Read-only Opus agent inspects work
+4. **Blocking:** If Alice finds issues → creates tissue issues → main agent loops back
+5. **Exit:** Only when Alice returns `COMPLETE`
+
+**Critical design:** Enforcement is *external* to worker's prompt - system-level intervention.
+
+### Red Team Pre-Commit Hook
+
+Before commit, code passes through hostile reviewer from different model family:
+
+```bash
+git diff | llm --model o1-preview --system "You are a hostile security auditor. Block this commit if you find any logic gaps."
+```
+
+If rejected, original agent receives critique and retries.
+
+### Test-Driven Agent Development (TDAD)
+
+Agents forbidden from writing implementation until failing test exists:
+
+1. Agent generates `test_feature.py`
+2. System executes → asserts **FAILURE**
+3. Agent generates `feature.py`
+4. System executes → asserts **SUCCESS**
+
+Prevents "placebo tests" that always pass regardless of logic.
+
+---
+
+## 3. Circuit Breaker Patterns
+
+### Semantic Drift Detection
+
+Use embedding model to check similarity of agent's thought trace:
+- If last 3 thoughts are >95% similar → circuit breaks
+- Forces strategy shift, not just retry
+
+### Three-Strike Tool Rule
+
+Same error message 3 times → inject system prompt:
+> "You are stuck. You must try a different tool or approach. Do not retry the previous action."
+
+### Budget-Based Interrupts
+
+Replace step limits with token budgets per sub-task:
+- If 50% budget burned on first step of 5-step plan → pause
+- Request plan refinement or human intervention
+
+---
+
+## 4. State Management (Non-MCP Alternatives)
+
+### Git-as-State
+
+Repository is source of truth:
+- **Coordination:** Agent A writes file, Agent B reads diff
+- **State:** Current HEAD of branch
+- **History:** Git log = immutable, replayable decision history
+
+### jwz (Agent Messaging)
+
+From emes - lightweight email-thread-style messaging:
+- Async communication between agents
+- Preserves threading for topic separation
+- Doesn't confuse main context window
+
+### The "Postbox" File Pattern
+
+Agents share state via filesystem:
+- Standard: `.ai/context.md` or `claudecode.md`
+- Agents dump thought process, obstacles, decisions before signing off
+- Next agent reads to "load" state
+
+**Tooling:** Vibe Kanban - markdown board multiple agents read/write to.
+
+### Git Worktrees for Isolation
+
+Prevent agents overwriting each other in real-time:
+- Agent A: `worktree/feature-login`
+- Agent B: `worktree/refactor-db`
+- Merge via PRs with standard conflict resolution
+
+---
+
+## 5. Machine-First Issue Tracking
+
+### tissue (emes)
+
+From `evil-mind-evil-sword/tissue` - headless issue tracker for agents.
+
+**Design:**
+- **File-based:** Issues stored as JSON/Markdown in `.tissue/`
+- **No API:** Agents use standard file tools (read, write)
+- **Git-native:** Issues version-controlled with code
+- **Branch-aware:** Issue state branches with code branches
+
+**Workflow:**
+1. Alice: `tissue new --title "Memory Leak"` → `.tissue/issues/1.json`
+2. Worker reads issue, fixes code
+3. Worker updates JSON status to `resolved`
+4. Alice verifies, merges branch (code fix + closed issue)
+
+**Why this matters:** No desync between issue tracker and code state.
+
+---
+
+## 6. Sandboxing & Security
+
+### Wasm Sandboxing (Browser-style)
+
+Tools like Pyodide run agent-generated code in WebAssembly sandbox:
+- No host OS filesystem access unless explicit
+- NVIDIA uses for data visualization code
+
+### Docker Playgrounds
+
+Agent runs in disposable container, not user's shell:
+- Commands trapped and executed in isolation
+- Permission scoping: read/write only in project directory
+- Read-only access to rest of system
+
+---
+
+## Recommended Stack (2025-2026)
+
+Based on research, the "Golden Path" for multi-agent teams:
+
+| Role | Tool | Notes |
+|------|------|-------|
+| **Orchestrator** | Maestro or custom | Manages high-level plan |
+| **Primary Coder** | Claude Code | Via MCP for tool access |
+| **Reviewer/QA** | Gemini Pro or Opus | "Watchful Inspector" role |
+| **State** | Git + shared docs | `docs/arch.md`, worktrees |
+| **Issues** | tissue or beads | File-based, git-native |
+| **Messaging** | jwz | Async agent communication |
+
+---
+
+## Key Insights for Our Cross-Agent Work
+
+### What Aligns With Our Direction
+
+| Pattern | Our Equivalent | Status |
+|---------|---------------|--------|
+| Git-as-state | beads (.beads/ in repo) | ✅ Have |
+| Machine-first issues | beads, tissue | ✅ Have |
+| File-based coordination | SKILL.md, AGENTS.md | ✅ Have |
+| Multi-model consensus | orch | ✅ Have |
+| Adversarial review | alice pattern | 🔄 Researching |
+
+### Gaps to Address
+
+| Gap | Pattern to Adopt |
+|-----|------------------|
+| Stop hook (Claude/Gemini only) | Orchestrator-enforced gate |
+| Agent messaging | Consider jwz or build on beads |
+| Sandbox for research agents | Docker/Wasm or OS-level |
+| Circuit breakers | Semantic drift + three-strike |
+
+### Recommended Next Steps
+
+1. **Prototype orchestrator pattern** - Central agent enforces review gate
+2. **Evaluate jwz** - Could complement beads for transient state
+3. **Implement circuit breakers** - Semantic drift detection
+4. **Sandbox research** - Docker-based for research subagents
+
+---
+
+## Sources
+
+- orch web research via gemini --websearch
+- GitHub: evil-mind-evil-sword/idle (alice pattern)
+- GitHub: evil-mind-evil-sword/tissue (machine-first issues)
+- Maestro CLI documentation
+- Google ADK event streams
+- Anthropic MCP specification