dan/skills

dan c14075ae7e docs: web research on cross-agent patterns (via orch)

Key findings from gemini --websearch:
- Manager-Worker orchestration (Maestro pattern)
- alice/idle adversarial review gates (emes)
- Git-as-state for agent coordination
- tissue for machine-first issue tracking
- Circuit breakers: semantic drift, three-strike, budget limits
- Sandboxing: Wasm and Docker playgrounds

Validates our direction: beads, orch, file-based coordination.
Gaps: orchestrator-enforced gates, agent messaging, sandboxing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-09 17:50:37 -08:00

7.2 KiB

Raw Blame History

Cross-Agent Patterns: Web Research Synthesis

Date: 2026-01-09 Method: orch web research (gemini --websearch) Related: skills-hf1 (Cross-agent epic)

Executive Summary

Multi-agent AI coding has shifted from single "do it all" agents to specialized agent teams with distinct roles. Key patterns emerging in 2025-2026:

Manager-Worker orchestration - Central agent delegates to specialists
Adversarial review gates - Separate reviewer blocks completion until approved
Git-as-state - Repository is source of truth for agent coordination
Machine-first issue tracking - File-based, no API needed (tissue)
Circuit breakers - Prevent infinite loops via semantic drift detection

1. Multi-Agent Orchestration Patterns

The "Maestro" Pattern (Centralized Orchestration)

A central Manager agent breaks down features into atomic tasks and delegates to Workers.

Workflow:

Manager (e.g., Claude Sonnet): Reads SPEC.md, creates plan.md
Worker A (e.g., Gemini CLI): "Scaffold database schema" - writes code, tests, reports back
Worker B (e.g., OpenCode): "Write API endpoints" - waits for A, then executes

Tooling: Maestro CLI orchestrates Claude Code, Codex, local LLMs via orchestrator.md.

The "Inspector-Fixer" Loop (Dual-Agent)

Strict separation: one agent writes, one agent finds faults. Never switch roles.

Fixer (Claude Code): Monitors todo.md, implements tasks, moves to done.md
Inspector (Gemini Pro): Scans codebase, appends issues to todo.md (never fixes)

Key insight: Inspector creates work for Fixer, not the other way around.

2. Adversarial Quality Gates

The alice/idle Pattern (emes)

From evil-mind-evil-sword/idle - review gate for autonomous agents.

Mechanism:

Stop Hook: Intercepts agent's exit/finish signal
Idle State: Forces mandatory pause before completion
Alice (Reviewer): Read-only Opus agent inspects work
Blocking: If Alice finds issues → creates tissue issues → main agent loops back
Exit: Only when Alice returns COMPLETE

Critical design: Enforcement is external to worker's prompt - system-level intervention.

Red Team Pre-Commit Hook

Before commit, code passes through hostile reviewer from different model family:

git diff | llm --model o1-preview --system "You are a hostile security auditor. Block this commit if you find any logic gaps."

If rejected, original agent receives critique and retries.

Test-Driven Agent Development (TDAD)

Agents forbidden from writing implementation until failing test exists:

Agent generates test_feature.py
System executes → asserts FAILURE
Agent generates feature.py
System executes → asserts SUCCESS

Prevents "placebo tests" that always pass regardless of logic.

3. Circuit Breaker Patterns

Semantic Drift Detection

Use embedding model to check similarity of agent's thought trace:

If last 3 thoughts are >95% similar → circuit breaks
Forces strategy shift, not just retry

Three-Strike Tool Rule

Same error message 3 times → inject system prompt:

"You are stuck. You must try a different tool or approach. Do not retry the previous action."

Budget-Based Interrupts

Replace step limits with token budgets per sub-task:

If 50% budget burned on first step of 5-step plan → pause
Request plan refinement or human intervention

4. State Management (Non-MCP Alternatives)

Git-as-State

Repository is source of truth:

Coordination: Agent A writes file, Agent B reads diff
State: Current HEAD of branch
History: Git log = immutable, replayable decision history

jwz (Agent Messaging)

From emes - lightweight email-thread-style messaging:

Async communication between agents
Preserves threading for topic separation
Doesn't confuse main context window

The "Postbox" File Pattern

Agents share state via filesystem:

Standard: .ai/context.md or claudecode.md
Agents dump thought process, obstacles, decisions before signing off
Next agent reads to "load" state

Tooling: Vibe Kanban - markdown board multiple agents read/write to.

Git Worktrees for Isolation

Prevent agents overwriting each other in real-time:

Agent A: worktree/feature-login
Agent B: worktree/refactor-db
Merge via PRs with standard conflict resolution

5. Machine-First Issue Tracking

tissue (emes)

From evil-mind-evil-sword/tissue - headless issue tracker for agents.

Design:

File-based: Issues stored as JSON/Markdown in .tissue/
No API: Agents use standard file tools (read, write)
Git-native: Issues version-controlled with code
Branch-aware: Issue state branches with code branches

Workflow:

Alice: tissue new --title "Memory Leak" → .tissue/issues/1.json
Worker reads issue, fixes code
Worker updates JSON status to resolved
Alice verifies, merges branch (code fix + closed issue)

Why this matters: No desync between issue tracker and code state.

6. Sandboxing & Security

Wasm Sandboxing (Browser-style)

Tools like Pyodide run agent-generated code in WebAssembly sandbox:

No host OS filesystem access unless explicit
NVIDIA uses for data visualization code

Docker Playgrounds

Agent runs in disposable container, not user's shell:

Commands trapped and executed in isolation
Permission scoping: read/write only in project directory
Read-only access to rest of system

Recommended Stack (2025-2026)

Based on research, the "Golden Path" for multi-agent teams:

Role	Tool	Notes
Orchestrator	Maestro or custom	Manages high-level plan
Primary Coder	Claude Code	Via MCP for tool access
Reviewer/QA	Gemini Pro or Opus	"Watchful Inspector" role
State	Git + shared docs	`docs/arch.md`, worktrees
Issues	tissue or beads	File-based, git-native
Messaging	jwz	Async agent communication

Key Insights for Our Cross-Agent Work

What Aligns With Our Direction

Pattern	Our Equivalent	Status
Git-as-state	beads (.beads/ in repo)	✅ Have
Machine-first issues	beads, tissue	✅ Have
File-based coordination	SKILL.md, AGENTS.md	✅ Have
Multi-model consensus	orch	✅ Have
Adversarial review	alice pattern	🔄 Researching

Gaps to Address

Gap	Pattern to Adopt
Stop hook (Claude/Gemini only)	Orchestrator-enforced gate
Agent messaging	Consider jwz or build on beads
Sandbox for research agents	Docker/Wasm or OS-level
Circuit breakers	Semantic drift + three-strike

Recommended Next Steps

Prototype orchestrator pattern - Central agent enforces review gate
Evaluate jwz - Could complement beads for transient state
Implement circuit breakers - Semantic drift detection
Sandbox research - Docker-based for research subagents

Sources

orch web research via gemini --websearch
GitHub: evil-mind-evil-sword/idle (alice pattern)
GitHub: evil-mind-evil-sword/tissue (machine-first issues)
Maestro CLI documentation
Google ADK event streams
Anthropic MCP specification

7.2 KiB Raw Blame History