Multi-agent coordination CLI with SQLite message bus: - State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED - Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat - SQLite WAL mode, dedicated heartbeat thread, channel-based IPC - cligen for CLI, tiny_sqlite for DB, ORC memory management Design docs for branch-per-worker, state machine, message passing, and human observability patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
10 KiB
Multi-Agent Footguns, Patterns, and Emerging Ideas
Status: Research synthesis Date: 2026-01-10 Sources: HN discussions, Reddit, practitioner blogs, orch consensus
Footguns: Lessons Learned the Hard Way
Git & Branch Chaos
| Footgun | Description | Mitigation |
|---|---|---|
| Force-resolve conflicts | Agents rebase improperly, rewrite history, break CI | No direct git access for agents; orchestrator owns git operations |
| Stale branches | Agent works on outdated branch for hours | Frequent auto-rebase; version check before major edits |
| Recovery nightmare | Broken git state is hard to recover | Git bundles for checkpoints (SkillFS pattern); worktree isolation |
| Branch naming confusion | worker-id/task-id becomes misleading on reassignment |
Use type/task-id; worker identity in commit author |
State & Database Issues
| Footgun | Description | Mitigation |
|---|---|---|
| Shared DB pollution | Agents debugging against mutated state, heisenbugs | Ephemeral namespaced DBs per branch; schema prefixes |
| Port conflicts | Multiple web servers on same port | Auto-increment ports; orchestrator manages allocation |
| Service duplication | 10 agents need 10 PostgreSQL/Redis instances | Container-per-worktree; or accept serialization |
| Feature flag races | Agents toggle flags in parallel | Namespace flags per agent/branch |
Coordination Failures
| Footgun | Description | Mitigation |
|---|---|---|
| State divergence | Each agent has different snapshot of reality | Single source of truth artifact; frequent rebase |
| Silent duplication | Two agents "fix" same bug differently | Central task ledger with explicit states; idempotent task IDs |
| Dependency deadlocks | A waits on B waits on A | Event-driven async; bounded time limits; no sync waits |
| Role collapse | Planner writes code; tester refactors | Narrow role boundaries; tool-level constraints |
Human Bottlenecks
| Footgun | Description | Mitigation |
|---|---|---|
| Review overload | 10 agents = 10 partial PRs to reconcile | Review funnel: worker → arbiter agent → single synthesized PR |
| Context switching | Human juggling parallel agent outputs | Size limits per PR; "one story per PR" |
| Morale drain | Endless nit-picking, people disable agents | Pre-review by lint/style agents; humans see substantive deltas only |
Agent-Specific Issues
| Footgun | Description | Mitigation |
|---|---|---|
| Hallucinated packages | 30% of suggested packages don't exist | Validate imports against known registries |
| Temporary fixes | Works in session, breaks in Docker | Require full env rebuild as acceptance test |
| Skill atrophy | Developers can't code without AI | Deliberate practice; understand what AI generates |
| Test/impl conspiracy | Brittle tests + brittle code pass together | Separate spec tests from impl tests; mutation testing |
Resource & Cost Issues
| Footgun | Description | Mitigation |
|---|---|---|
| Token blowups | Parallel agents saturate context/API limits | Hard budgets per agent; limit context sizes |
| Credit drain | AI fixing its own mistakes in loops | Circuit breakers; attempt limits |
| Timeout misreads | Rate limits interpreted as semantic failures | Structured error channels; retry with idempotency |
Emerging Patterns (2026)
The "Rule of 4"
Research shows effective team sizes limited to ~3-4 agents. Beyond this, communication overhead grows super-linearly (exponent 1.724). Cost of coordination outpaces value.
Implication: Don't build 10-agent swarms. Build 3-4 specialized agents with clear boundaries.
Spec-Driven Development
Adopted by Kiro, Tessl, GitHub Spec Kit:
requirements.md- what to builddesign.md- how to build ittasks.md- decomposed work items
Agents work from specs, not vague prompts. Specs are versioned; agents echo which version they used.
Layered Coordination (Not Monolithic)
Instead of one complex orchestrator, compose independent layers:
- Configuration management
- Issue tracking (JSONL, merge-friendly)
- Atomic locking (PostgreSQL advisory locks)
- Filesystem isolation (git worktrees)
- Validation gates
- Enforcement rules
- Session protocols
Each layer independently useful; failures isolated.
PostgreSQL Advisory Locks for Claims
Novel insight: Advisory locks auto-release on crash (no orphaned locks), operate in ~1ms, no table writes. Elegant solution for distributed claim races.
SELECT pg_try_advisory_lock(task_id_hash);
-- Work...
SELECT pg_advisory_unlock(task_id_hash);
-- Or: connection dies → auto-released
Git Bundles for Checkpoints (SkillFS)
Every agent sandbox is a git repo. Session ends → git bundle stored. New session → restore from bundle, continue where left off. Complete audit trail via git log.
Hierarchical Over Flat Swarms
Instead of 100-agent flat swarms:
- Nested coordination structures
- Partition the communication graph
- Supervisor per sub-team
- Only supervisors talk to each other
Plan-and-Execute Cost Pattern
Expensive model creates strategy; cheap models execute steps. Can reduce costs by 90%.
Orchestrator (Claude Opus) → Plan
Workers (Claude Haiku) → Execute steps
Reviewer (Claude Sonnet) → Validate
Bounded Autonomy Spectrum
Progressive autonomy based on risk:
- Human in the loop - approve each action
- Human on the loop - monitor, intervene if needed
- Human out of the loop - fully autonomous
Match to task complexity and outcome criticality.
Best Practices Synthesis
From HN Discussions
- Well-scoped tasks with tight contracts - Not vague prompts
- Automated testing gates - Agents must pass before review
- 2-3 agents realistic - Not 10 parallel
- Exclusive ownership per module - One writer per concern
- Short-lived branches - Frequent merge to prevent drift
From orch Consensus
- Treat agents as untrusted workers - Not peers with full access
- Machine-readable contracts - JSON schema between roles
- Per-agent logs with correlation IDs - Distributed systems observability
- Guardrail agents - Security/policy checks on every diff
- Versioned task specs - Bump version → re-run affected agents
From Practitioner Blogs
- Coordination ≠ isolation - Advisory locks (who works on what) + worktrees (how they work)
- JSONL for issues - One per line, deterministic merge rules
- Session protocols - Explicit start/close procedures
- Modular rules with includes - Template configuration
How This Applies to Our Design
Already Covered
| Pattern | Our Design |
|---|---|
| SQLite for coordination | ✅ bus.db with transactions |
| Git worktrees | ✅ branch-per-worker.md |
| State machine | ✅ worker-state-machine.md |
| Heartbeats/liveness | ✅ 10s interval in message-passing |
| Claim-check pattern | ✅ SQLite transactions |
| Task serialization | ✅ No uncommitted dependencies |
Should Add
| Pattern | Gap | Action |
|---|---|---|
| Spec-driven tasks | Tasks are just titles | Add structured task specs (requirements, design, acceptance) |
| Role boundaries | Not enforced | Add tool-level constraints per agent type |
| Review funnel | Missing arbiter | Add synthesis step before human review |
| Versioned specs | Not tracked | Add version field to task assignments |
| Cost budgets | Not implemented | Add token/time budgets per agent |
| Correlation IDs | Partial (correlation_id) | Ensure end-to-end tracing |
Validate Our Decisions
| Decision | Validation |
|---|---|
| SQLite over JSONL | ✅ Confirmed - JSONL for issues only, SQLite for coordination |
| Orchestrator creates branches | ✅ Confirmed - reduces agent setup, enforces policy |
| 3-4 agents max | ✅ Aligns with "Rule of 4" research |
| Mandatory rebase | ✅ Confirmed - prevents stale branch drift |
| Escalate semantic conflicts | ✅ Confirmed - agents hallucinate resolutions |
Open Questions Surfaced
- PostgreSQL advisory locks vs SQLite? - Do we need Postgres, or is SQLite sufficient?
- Git bundles for checkpoints? - Should we adopt SkillFS pattern?
- Spec files per task? - How structured should task specs be?
- Arbiter/synthesis agent? - Add to architecture before human review?
- Token budgets? - How to enforce across different agent types?
Sources
HN Discussions
- Superset: 10 Parallel Coding Agents
- Desktop App for Parallel Agentic Dev
- SkillFS: Git-backed Sandboxes
- Zenflow: Agent Orchestration
- Git Worktree for Parallel Dev
Blogs & Articles
- Building a Multi-Agent Development Workflow
- The Real Struggle with AI Coding Agents
- Why AI Coding Tools Don't Work For Me
- Microsoft: Multi-Agent Systems at Scale
- LangChain: How and When to Build Multi-Agent