From c14075ae7ee71c7b904b86daf226cf9bb056c9ee Mon Sep 17 00:00:00 2001 From: dan Date: Fri, 9 Jan 2026 17:50:37 -0800 Subject: [PATCH] docs: web research on cross-agent patterns (via orch) Key findings from gemini --websearch: - Manager-Worker orchestration (Maestro pattern) - alice/idle adversarial review gates (emes) - Git-as-state for agent coordination - tissue for machine-first issue tracking - Circuit breakers: semantic drift, three-strike, budget limits - Sandboxing: Wasm and Docker playgrounds Validates our direction: beads, orch, file-based coordination. Gaps: orchestrator-enforced gates, agent messaging, sandboxing. Co-Authored-By: Claude Opus 4.5 --- .../cross-agent-patterns-web-research.md | 227 ++++++++++++++++++ 1 file changed, 227 insertions(+) create mode 100644 docs/research/cross-agent-patterns-web-research.md diff --git a/docs/research/cross-agent-patterns-web-research.md b/docs/research/cross-agent-patterns-web-research.md new file mode 100644 index 0000000..d9d2456 --- /dev/null +++ b/docs/research/cross-agent-patterns-web-research.md @@ -0,0 +1,227 @@ +# Cross-Agent Patterns: Web Research Synthesis + +> **Date:** 2026-01-09 +> **Method:** orch web research (gemini --websearch) +> **Related:** [skills-hf1](../../.beads/) (Cross-agent epic) + +## Executive Summary + +Multi-agent AI coding has shifted from single "do it all" agents to **specialized agent teams** with distinct roles. Key patterns emerging in 2025-2026: + +1. **Manager-Worker orchestration** - Central agent delegates to specialists +2. **Adversarial review gates** - Separate reviewer blocks completion until approved +3. **Git-as-state** - Repository is source of truth for agent coordination +4. **Machine-first issue tracking** - File-based, no API needed (tissue) +5. **Circuit breakers** - Prevent infinite loops via semantic drift detection + +--- + +## 1. Multi-Agent Orchestration Patterns + +### The "Maestro" Pattern (Centralized Orchestration) + +A central Manager agent breaks down features into atomic tasks and delegates to Workers. + +**Workflow:** +1. **Manager (e.g., Claude Sonnet):** Reads SPEC.md, creates plan.md +2. **Worker A (e.g., Gemini CLI):** "Scaffold database schema" - writes code, tests, reports back +3. **Worker B (e.g., OpenCode):** "Write API endpoints" - waits for A, then executes + +**Tooling:** Maestro CLI orchestrates Claude Code, Codex, local LLMs via `orchestrator.md`. + +### The "Inspector-Fixer" Loop (Dual-Agent) + +Strict separation: one agent writes, one agent finds faults. Never switch roles. + +- **Fixer (Claude Code):** Monitors `todo.md`, implements tasks, moves to `done.md` +- **Inspector (Gemini Pro):** Scans codebase, appends issues to `todo.md` (never fixes) + +**Key insight:** Inspector creates work for Fixer, not the other way around. + +--- + +## 2. Adversarial Quality Gates + +### The alice/idle Pattern (emes) + +From `evil-mind-evil-sword/idle` - review gate for autonomous agents. + +**Mechanism:** +1. **Stop Hook:** Intercepts agent's exit/finish signal +2. **Idle State:** Forces mandatory pause before completion +3. **Alice (Reviewer):** Read-only Opus agent inspects work +4. **Blocking:** If Alice finds issues → creates tissue issues → main agent loops back +5. **Exit:** Only when Alice returns `COMPLETE` + +**Critical design:** Enforcement is *external* to worker's prompt - system-level intervention. + +### Red Team Pre-Commit Hook + +Before commit, code passes through hostile reviewer from different model family: + +```bash +git diff | llm --model o1-preview --system "You are a hostile security auditor. Block this commit if you find any logic gaps." +``` + +If rejected, original agent receives critique and retries. + +### Test-Driven Agent Development (TDAD) + +Agents forbidden from writing implementation until failing test exists: + +1. Agent generates `test_feature.py` +2. System executes → asserts **FAILURE** +3. Agent generates `feature.py` +4. System executes → asserts **SUCCESS** + +Prevents "placebo tests" that always pass regardless of logic. + +--- + +## 3. Circuit Breaker Patterns + +### Semantic Drift Detection + +Use embedding model to check similarity of agent's thought trace: +- If last 3 thoughts are >95% similar → circuit breaks +- Forces strategy shift, not just retry + +### Three-Strike Tool Rule + +Same error message 3 times → inject system prompt: +> "You are stuck. You must try a different tool or approach. Do not retry the previous action." + +### Budget-Based Interrupts + +Replace step limits with token budgets per sub-task: +- If 50% budget burned on first step of 5-step plan → pause +- Request plan refinement or human intervention + +--- + +## 4. State Management (Non-MCP Alternatives) + +### Git-as-State + +Repository is source of truth: +- **Coordination:** Agent A writes file, Agent B reads diff +- **State:** Current HEAD of branch +- **History:** Git log = immutable, replayable decision history + +### jwz (Agent Messaging) + +From emes - lightweight email-thread-style messaging: +- Async communication between agents +- Preserves threading for topic separation +- Doesn't confuse main context window + +### The "Postbox" File Pattern + +Agents share state via filesystem: +- Standard: `.ai/context.md` or `claudecode.md` +- Agents dump thought process, obstacles, decisions before signing off +- Next agent reads to "load" state + +**Tooling:** Vibe Kanban - markdown board multiple agents read/write to. + +### Git Worktrees for Isolation + +Prevent agents overwriting each other in real-time: +- Agent A: `worktree/feature-login` +- Agent B: `worktree/refactor-db` +- Merge via PRs with standard conflict resolution + +--- + +## 5. Machine-First Issue Tracking + +### tissue (emes) + +From `evil-mind-evil-sword/tissue` - headless issue tracker for agents. + +**Design:** +- **File-based:** Issues stored as JSON/Markdown in `.tissue/` +- **No API:** Agents use standard file tools (read, write) +- **Git-native:** Issues version-controlled with code +- **Branch-aware:** Issue state branches with code branches + +**Workflow:** +1. Alice: `tissue new --title "Memory Leak"` → `.tissue/issues/1.json` +2. Worker reads issue, fixes code +3. Worker updates JSON status to `resolved` +4. Alice verifies, merges branch (code fix + closed issue) + +**Why this matters:** No desync between issue tracker and code state. + +--- + +## 6. Sandboxing & Security + +### Wasm Sandboxing (Browser-style) + +Tools like Pyodide run agent-generated code in WebAssembly sandbox: +- No host OS filesystem access unless explicit +- NVIDIA uses for data visualization code + +### Docker Playgrounds + +Agent runs in disposable container, not user's shell: +- Commands trapped and executed in isolation +- Permission scoping: read/write only in project directory +- Read-only access to rest of system + +--- + +## Recommended Stack (2025-2026) + +Based on research, the "Golden Path" for multi-agent teams: + +| Role | Tool | Notes | +|------|------|-------| +| **Orchestrator** | Maestro or custom | Manages high-level plan | +| **Primary Coder** | Claude Code | Via MCP for tool access | +| **Reviewer/QA** | Gemini Pro or Opus | "Watchful Inspector" role | +| **State** | Git + shared docs | `docs/arch.md`, worktrees | +| **Issues** | tissue or beads | File-based, git-native | +| **Messaging** | jwz | Async agent communication | + +--- + +## Key Insights for Our Cross-Agent Work + +### What Aligns With Our Direction + +| Pattern | Our Equivalent | Status | +|---------|---------------|--------| +| Git-as-state | beads (.beads/ in repo) | ✅ Have | +| Machine-first issues | beads, tissue | ✅ Have | +| File-based coordination | SKILL.md, AGENTS.md | ✅ Have | +| Multi-model consensus | orch | ✅ Have | +| Adversarial review | alice pattern | 🔄 Researching | + +### Gaps to Address + +| Gap | Pattern to Adopt | +|-----|------------------| +| Stop hook (Claude/Gemini only) | Orchestrator-enforced gate | +| Agent messaging | Consider jwz or build on beads | +| Sandbox for research agents | Docker/Wasm or OS-level | +| Circuit breakers | Semantic drift + three-strike | + +### Recommended Next Steps + +1. **Prototype orchestrator pattern** - Central agent enforces review gate +2. **Evaluate jwz** - Could complement beads for transient state +3. **Implement circuit breakers** - Semantic drift detection +4. **Sandbox research** - Docker-based for research subagents + +--- + +## Sources + +- orch web research via gemini --websearch +- GitHub: evil-mind-evil-sword/idle (alice pattern) +- GitHub: evil-mind-evil-sword/tissue (machine-first issues) +- Maestro CLI documentation +- Google ADK event streams +- Anthropic MCP specification