# Multi-Agent Footguns, Patterns, and Emerging Ideas **Status**: Research synthesis **Date**: 2026-01-10 **Sources**: HN discussions, Reddit, practitioner blogs, orch consensus ## Footguns: Lessons Learned the Hard Way ### Git & Branch Chaos | Footgun | Description | Mitigation | |---------|-------------|------------| | **Force-resolve conflicts** | Agents rebase improperly, rewrite history, break CI | No direct git access for agents; orchestrator owns git operations | | **Stale branches** | Agent works on outdated branch for hours | Frequent auto-rebase; version check before major edits | | **Recovery nightmare** | Broken git state is hard to recover | Git bundles for checkpoints (SkillFS pattern); worktree isolation | | **Branch naming confusion** | `worker-id/task-id` becomes misleading on reassignment | Use `type/task-id`; worker identity in commit author | ### State & Database Issues | Footgun | Description | Mitigation | |---------|-------------|------------| | **Shared DB pollution** | Agents debugging against mutated state, heisenbugs | Ephemeral namespaced DBs per branch; schema prefixes | | **Port conflicts** | Multiple web servers on same port | Auto-increment ports; orchestrator manages allocation | | **Service duplication** | 10 agents need 10 PostgreSQL/Redis instances | Container-per-worktree; or accept serialization | | **Feature flag races** | Agents toggle flags in parallel | Namespace flags per agent/branch | ### Coordination Failures | Footgun | Description | Mitigation | |---------|-------------|------------| | **State divergence** | Each agent has different snapshot of reality | Single source of truth artifact; frequent rebase | | **Silent duplication** | Two agents "fix" same bug differently | Central task ledger with explicit states; idempotent task IDs | | **Dependency deadlocks** | A waits on B waits on A | Event-driven async; bounded time limits; no sync waits | | **Role collapse** | Planner writes code; tester refactors | Narrow role boundaries; tool-level constraints | ### Human Bottlenecks | Footgun | Description | Mitigation | |---------|-------------|------------| | **Review overload** | 10 agents = 10 partial PRs to reconcile | Review funnel: worker → arbiter agent → single synthesized PR | | **Context switching** | Human juggling parallel agent outputs | Size limits per PR; "one story per PR" | | **Morale drain** | Endless nit-picking, people disable agents | Pre-review by lint/style agents; humans see substantive deltas only | ### Agent-Specific Issues | Footgun | Description | Mitigation | |---------|-------------|------------| | **Hallucinated packages** | 30% of suggested packages don't exist | Validate imports against known registries | | **Temporary fixes** | Works in session, breaks in Docker | Require full env rebuild as acceptance test | | **Skill atrophy** | Developers can't code without AI | Deliberate practice; understand what AI generates | | **Test/impl conspiracy** | Brittle tests + brittle code pass together | Separate spec tests from impl tests; mutation testing | ### Resource & Cost Issues | Footgun | Description | Mitigation | |---------|-------------|------------| | **Token blowups** | Parallel agents saturate context/API limits | Hard budgets per agent; limit context sizes | | **Credit drain** | AI fixing its own mistakes in loops | Circuit breakers; attempt limits | | **Timeout misreads** | Rate limits interpreted as semantic failures | Structured error channels; retry with idempotency | ## Emerging Patterns (2026) ### The "Rule of 4" Research shows effective team sizes limited to ~3-4 agents. Beyond this, communication overhead grows super-linearly (exponent 1.724). Cost of coordination outpaces value. **Implication**: Don't build 10-agent swarms. Build 3-4 specialized agents with clear boundaries. ### Spec-Driven Development Adopted by Kiro, Tessl, GitHub Spec Kit: - `requirements.md` - what to build - `design.md` - how to build it - `tasks.md` - decomposed work items Agents work from specs, not vague prompts. Specs are versioned; agents echo which version they used. ### Layered Coordination (Not Monolithic) Instead of one complex orchestrator, compose independent layers: 1. Configuration management 2. Issue tracking (JSONL, merge-friendly) 3. Atomic locking (PostgreSQL advisory locks) 4. Filesystem isolation (git worktrees) 5. Validation gates 6. Enforcement rules 7. Session protocols Each layer independently useful; failures isolated. ### PostgreSQL Advisory Locks for Claims Novel insight: Advisory locks auto-release on crash (no orphaned locks), operate in ~1ms, no table writes. Elegant solution for distributed claim races. ```sql SELECT pg_try_advisory_lock(task_id_hash); -- Work... SELECT pg_advisory_unlock(task_id_hash); -- Or: connection dies → auto-released ``` ### Git Bundles for Checkpoints (SkillFS) Every agent sandbox is a git repo. Session ends → git bundle stored. New session → restore from bundle, continue where left off. Complete audit trail via `git log`. ### Hierarchical Over Flat Swarms Instead of 100-agent flat swarms: - Nested coordination structures - Partition the communication graph - Supervisor per sub-team - Only supervisors talk to each other ### Plan-and-Execute Cost Pattern Expensive model creates strategy; cheap models execute steps. Can reduce costs by 90%. ``` Orchestrator (Claude Opus) → Plan Workers (Claude Haiku) → Execute steps Reviewer (Claude Sonnet) → Validate ``` ### Bounded Autonomy Spectrum Progressive autonomy based on risk: 1. **Human in the loop** - approve each action 2. **Human on the loop** - monitor, intervene if needed 3. **Human out of the loop** - fully autonomous Match to task complexity and outcome criticality. ## Best Practices Synthesis ### From HN Discussions 1. **Well-scoped tasks with tight contracts** - Not vague prompts 2. **Automated testing gates** - Agents must pass before review 3. **2-3 agents realistic** - Not 10 parallel 4. **Exclusive ownership per module** - One writer per concern 5. **Short-lived branches** - Frequent merge to prevent drift ### From orch Consensus 1. **Treat agents as untrusted workers** - Not peers with full access 2. **Machine-readable contracts** - JSON schema between roles 3. **Per-agent logs with correlation IDs** - Distributed systems observability 4. **Guardrail agents** - Security/policy checks on every diff 5. **Versioned task specs** - Bump version → re-run affected agents ### From Practitioner Blogs 1. **Coordination ≠ isolation** - Advisory locks (who works on what) + worktrees (how they work) 2. **JSONL for issues** - One per line, deterministic merge rules 3. **Session protocols** - Explicit start/close procedures 4. **Modular rules with includes** - Template configuration ## How This Applies to Our Design ### Already Covered | Pattern | Our Design | |---------|------------| | SQLite for coordination | ✅ bus.db with transactions | | Git worktrees | ✅ branch-per-worker.md | | State machine | ✅ worker-state-machine.md | | Heartbeats/liveness | ✅ 10s interval in message-passing | | Claim-check pattern | ✅ SQLite transactions | | Task serialization | ✅ No uncommitted dependencies | ### Should Add | Pattern | Gap | Action | |---------|-----|--------| | Spec-driven tasks | Tasks are just titles | Add structured task specs (requirements, design, acceptance) | | Role boundaries | Not enforced | Add tool-level constraints per agent type | | Review funnel | Missing arbiter | Add synthesis step before human review | | Versioned specs | Not tracked | Add version field to task assignments | | Cost budgets | Not implemented | Add token/time budgets per agent | | Correlation IDs | Partial (correlation_id) | Ensure end-to-end tracing | ### Validate Our Decisions | Decision | Validation | |----------|------------| | SQLite over JSONL | ✅ Confirmed - JSONL for issues only, SQLite for coordination | | Orchestrator creates branches | ✅ Confirmed - reduces agent setup, enforces policy | | 3-4 agents max | ✅ Aligns with "Rule of 4" research | | Mandatory rebase | ✅ Confirmed - prevents stale branch drift | | Escalate semantic conflicts | ✅ Confirmed - agents hallucinate resolutions | ## Open Questions Surfaced 1. **PostgreSQL advisory locks vs SQLite?** - Do we need Postgres, or is SQLite sufficient? 2. **Git bundles for checkpoints?** - Should we adopt SkillFS pattern? 3. **Spec files per task?** - How structured should task specs be? 4. **Arbiter/synthesis agent?** - Add to architecture before human review? 5. **Token budgets?** - How to enforce across different agent types? ## Sources ### HN Discussions - [Superset: 10 Parallel Coding Agents](https://news.ycombinator.com/item?id=46368739) - [Desktop App for Parallel Agentic Dev](https://news.ycombinator.com/item?id=46027947) - [SkillFS: Git-backed Sandboxes](https://news.ycombinator.com/item?id=46543093) - [Zenflow: Agent Orchestration](https://news.ycombinator.com/item?id=46290617) - [Git Worktree for Parallel Dev](https://news.ycombinator.com/item?id=46510462) ### Blogs & Articles - [Building a Multi-Agent Development Workflow](https://itsgg.com/blog/2026/01/08/building-a-multi-agent-development-workflow/) - [The Real Struggle with AI Coding Agents](https://www.smiansh.com/blogs/the-real-struggle-with-ai-coding-agents-and-how-to-overcome-it/) - [Why AI Coding Tools Don't Work For Me](https://blog.miguelgrinberg.com/post/why-generative-ai-coding-tools-and-agents-do-not-work-for-me) - [Microsoft: Multi-Agent Systems at Scale](https://devblogs.microsoft.com/ise/multi-agent-systems-at-scale/) - [LangChain: How and When to Build Multi-Agent](https://blog.langchain.com/how-and-when-to-build-multi-agent-systems/) ### Research & Analysis - [VentureBeat: More Agents Isn't Better](https://venturebeat.com/orchestration/research-shows-more-agents-isnt-a-reliable-path-to-better-enterprise-ai) - [Deloitte: AI Agent Orchestration](https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2026/ai-agent-orchestration.html) - [10 Things Developers Want from Agentic IDEs](https://redmonk.com/kholterhoff/2025/12/22/10-things-developers-want-from-their-agentic-ides-in-2025/)