skills/docs/design/multi-agent-footguns-and-patterns.md

# Multi-Agent Footguns, Patterns, and Emerging Ideas

**Status**: Research synthesis
**Date**: 2026-01-10
**Sources**: HN discussions, Reddit, practitioner blogs, orch consensus

## Footguns: Lessons Learned the Hard Way

### Git & Branch Chaos

| Footgun | Description | Mitigation |
|---------|-------------|------------|
| **Force-resolve conflicts** | Agents rebase improperly, rewrite history, break CI | No direct git access for agents; orchestrator owns git operations |
| **Stale branches** | Agent works on outdated branch for hours | Frequent auto-rebase; version check before major edits |
| **Recovery nightmare** | Broken git state is hard to recover | Git bundles for checkpoints (SkillFS pattern); worktree isolation |
| **Branch naming confusion** | `worker-id/task-id` becomes misleading on reassignment | Use `type/task-id`; worker identity in commit author |

### State & Database Issues

| Footgun | Description | Mitigation |
|---------|-------------|------------|
| **Shared DB pollution** | Agents debugging against mutated state, heisenbugs | Ephemeral namespaced DBs per branch; schema prefixes |
| **Port conflicts** | Multiple web servers on same port | Auto-increment ports; orchestrator manages allocation |
| **Service duplication** | 10 agents need 10 PostgreSQL/Redis instances | Container-per-worktree; or accept serialization |
| **Feature flag races** | Agents toggle flags in parallel | Namespace flags per agent/branch |

### Coordination Failures

| Footgun | Description | Mitigation |
|---------|-------------|------------|
| **State divergence** | Each agent has different snapshot of reality | Single source of truth artifact; frequent rebase |
| **Silent duplication** | Two agents "fix" same bug differently | Central task ledger with explicit states; idempotent task IDs |
| **Dependency deadlocks** | A waits on B waits on A | Event-driven async; bounded time limits; no sync waits |
| **Role collapse** | Planner writes code; tester refactors | Narrow role boundaries; tool-level constraints |

### Human Bottlenecks

| Footgun | Description | Mitigation |
|---------|-------------|------------|
| **Review overload** | 10 agents = 10 partial PRs to reconcile | Review funnel: worker → arbiter agent → single synthesized PR |
| **Context switching** | Human juggling parallel agent outputs | Size limits per PR; "one story per PR" |
| **Morale drain** | Endless nit-picking, people disable agents | Pre-review by lint/style agents; humans see substantive deltas only |

### Agent-Specific Issues

| Footgun | Description | Mitigation |
|---------|-------------|------------|
| **Hallucinated packages** | 30% of suggested packages don't exist | Validate imports against known registries |
| **Temporary fixes** | Works in session, breaks in Docker | Require full env rebuild as acceptance test |
| **Skill atrophy** | Developers can't code without AI | Deliberate practice; understand what AI generates |
| **Test/impl conspiracy** | Brittle tests + brittle code pass together | Separate spec tests from impl tests; mutation testing |

### Resource & Cost Issues

| Footgun | Description | Mitigation |
|---------|-------------|------------|
| **Token blowups** | Parallel agents saturate context/API limits | Hard budgets per agent; limit context sizes |
| **Credit drain** | AI fixing its own mistakes in loops | Circuit breakers; attempt limits |
| **Timeout misreads** | Rate limits interpreted as semantic failures | Structured error channels; retry with idempotency |

## Emerging Patterns (2026)

### The "Rule of 4"

Research shows effective team sizes limited to ~3-4 agents. Beyond this, communication overhead grows super-linearly (exponent 1.724). Cost of coordination outpaces value.

**Implication**: Don't build 10-agent swarms. Build 3-4 specialized agents with clear boundaries.

### Spec-Driven Development

Adopted by Kiro, Tessl, GitHub Spec Kit:
- `requirements.md` - what to build
- `design.md` - how to build it
- `tasks.md` - decomposed work items

Agents work from specs, not vague prompts. Specs are versioned; agents echo which version they used.

### Layered Coordination (Not Monolithic)

Instead of one complex orchestrator, compose independent layers:
1. Configuration management
2. Issue tracking (JSONL, merge-friendly)
3. Atomic locking (PostgreSQL advisory locks)
4. Filesystem isolation (git worktrees)
5. Validation gates
6. Enforcement rules
7. Session protocols

Each layer independently useful; failures isolated.

### PostgreSQL Advisory Locks for Claims

Novel insight: Advisory locks auto-release on crash (no orphaned locks), operate in ~1ms, no table writes. Elegant solution for distributed claim races.

```sql
SELECT pg_try_advisory_lock(task_id_hash);
-- Work...
SELECT pg_advisory_unlock(task_id_hash);
-- Or: connection dies → auto-released
```

### Git Bundles for Checkpoints (SkillFS)

Every agent sandbox is a git repo. Session ends → git bundle stored. New session → restore from bundle, continue where left off. Complete audit trail via `git log`.

### Hierarchical Over Flat Swarms

Instead of 100-agent flat swarms:
- Nested coordination structures
- Partition the communication graph
- Supervisor per sub-team
- Only supervisors talk to each other

### Plan-and-Execute Cost Pattern

Expensive model creates strategy; cheap models execute steps. Can reduce costs by 90%.

```
Orchestrator (Claude Opus) → Plan
Workers (Claude Haiku) → Execute steps
Reviewer (Claude Sonnet) → Validate
```

### Bounded Autonomy Spectrum

Progressive autonomy based on risk:
1. **Human in the loop** - approve each action
2. **Human on the loop** - monitor, intervene if needed
3. **Human out of the loop** - fully autonomous

Match to task complexity and outcome criticality.

## Best Practices Synthesis

### From HN Discussions

1. **Well-scoped tasks with tight contracts** - Not vague prompts
2. **Automated testing gates** - Agents must pass before review
3. **2-3 agents realistic** - Not 10 parallel
4. **Exclusive ownership per module** - One writer per concern
5. **Short-lived branches** - Frequent merge to prevent drift

### From orch Consensus

1. **Treat agents as untrusted workers** - Not peers with full access
2. **Machine-readable contracts** - JSON schema between roles
3. **Per-agent logs with correlation IDs** - Distributed systems observability
4. **Guardrail agents** - Security/policy checks on every diff
5. **Versioned task specs** - Bump version → re-run affected agents

### From Practitioner Blogs

1. **Coordination ≠ isolation** - Advisory locks (who works on what) + worktrees (how they work)
2. **JSONL for issues** - One per line, deterministic merge rules
3. **Session protocols** - Explicit start/close procedures
4. **Modular rules with includes** - Template configuration

## How This Applies to Our Design

### Already Covered

| Pattern | Our Design |
|---------|------------|
| SQLite for coordination | ✅ bus.db with transactions |
| Git worktrees | ✅ branch-per-worker.md |
| State machine | ✅ worker-state-machine.md |
| Heartbeats/liveness | ✅ 10s interval in message-passing |
| Claim-check pattern | ✅ SQLite transactions |
| Task serialization | ✅ No uncommitted dependencies |

### Should Add

| Pattern | Gap | Action |
|---------|-----|--------|
| Spec-driven tasks | Tasks are just titles | Add structured task specs (requirements, design, acceptance) |
| Role boundaries | Not enforced | Add tool-level constraints per agent type |
| Review funnel | Missing arbiter | Add synthesis step before human review |
| Versioned specs | Not tracked | Add version field to task assignments |
| Cost budgets | Not implemented | Add token/time budgets per agent |
| Correlation IDs | Partial (correlation_id) | Ensure end-to-end tracing |

### Validate Our Decisions

| Decision | Validation |
|----------|------------|
| SQLite over JSONL | ✅ Confirmed - JSONL for issues only, SQLite for coordination |
| Orchestrator creates branches | ✅ Confirmed - reduces agent setup, enforces policy |
| 3-4 agents max | ✅ Aligns with "Rule of 4" research |
| Mandatory rebase | ✅ Confirmed - prevents stale branch drift |
| Escalate semantic conflicts | ✅ Confirmed - agents hallucinate resolutions |

## Open Questions Surfaced

1. **PostgreSQL advisory locks vs SQLite?** - Do we need Postgres, or is SQLite sufficient?
2. **Git bundles for checkpoints?** - Should we adopt SkillFS pattern?
3. **Spec files per task?** - How structured should task specs be?
4. **Arbiter/synthesis agent?** - Add to architecture before human review?
5. **Token budgets?** - How to enforce across different agent types?

## Sources

### HN Discussions
- [Superset: 10 Parallel Coding Agents](https://news.ycombinator.com/item?id=46368739)
- [Desktop App for Parallel Agentic Dev](https://news.ycombinator.com/item?id=46027947)
- [SkillFS: Git-backed Sandboxes](https://news.ycombinator.com/item?id=46543093)
- [Zenflow: Agent Orchestration](https://news.ycombinator.com/item?id=46290617)
- [Git Worktree for Parallel Dev](https://news.ycombinator.com/item?id=46510462)

### Blogs & Articles
- [Building a Multi-Agent Development Workflow](https://itsgg.com/blog/2026/01/08/building-a-multi-agent-development-workflow/)
- [The Real Struggle with AI Coding Agents](https://www.smiansh.com/blogs/the-real-struggle-with-ai-coding-agents-and-how-to-overcome-it/)
- [Why AI Coding Tools Don't Work For Me](https://blog.miguelgrinberg.com/post/why-generative-ai-coding-tools-and-agents-do-not-work-for-me)
- [Microsoft: Multi-Agent Systems at Scale](https://devblogs.microsoft.com/ise/multi-agent-systems-at-scale/)
- [LangChain: How and When to Build Multi-Agent](https://blog.langchain.com/how-and-when-to-build-multi-agent-systems/)

### Research & Analysis
- [VentureBeat: More Agents Isn't Better](https://venturebeat.com/orchestration/research-shows-more-agents-isnt-a-reliable-path-to-better-enterprise-ai)
- [Deloitte: AI Agent Orchestration](https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2026/ai-agent-orchestration.html)
- [10 Things Developers Want from Agentic IDEs](https://redmonk.com/kholterhoff/2025/12/22/10-things-developers-want-from-their-agentic-ides-in-2025/)