Multi-agent coordination CLI with SQLite message bus: - State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED - Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat - SQLite WAL mode, dedicated heartbeat thread, channel-based IPC - cligen for CLI, tiny_sqlite for DB, ORC memory management Design docs for branch-per-worker, state machine, message passing, and human observability patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
220 lines
10 KiB
Markdown
220 lines
10 KiB
Markdown
# Multi-Agent Footguns, Patterns, and Emerging Ideas
|
|
|
|
**Status**: Research synthesis
|
|
**Date**: 2026-01-10
|
|
**Sources**: HN discussions, Reddit, practitioner blogs, orch consensus
|
|
|
|
## Footguns: Lessons Learned the Hard Way
|
|
|
|
### Git & Branch Chaos
|
|
|
|
| Footgun | Description | Mitigation |
|
|
|---------|-------------|------------|
|
|
| **Force-resolve conflicts** | Agents rebase improperly, rewrite history, break CI | No direct git access for agents; orchestrator owns git operations |
|
|
| **Stale branches** | Agent works on outdated branch for hours | Frequent auto-rebase; version check before major edits |
|
|
| **Recovery nightmare** | Broken git state is hard to recover | Git bundles for checkpoints (SkillFS pattern); worktree isolation |
|
|
| **Branch naming confusion** | `worker-id/task-id` becomes misleading on reassignment | Use `type/task-id`; worker identity in commit author |
|
|
|
|
### State & Database Issues
|
|
|
|
| Footgun | Description | Mitigation |
|
|
|---------|-------------|------------|
|
|
| **Shared DB pollution** | Agents debugging against mutated state, heisenbugs | Ephemeral namespaced DBs per branch; schema prefixes |
|
|
| **Port conflicts** | Multiple web servers on same port | Auto-increment ports; orchestrator manages allocation |
|
|
| **Service duplication** | 10 agents need 10 PostgreSQL/Redis instances | Container-per-worktree; or accept serialization |
|
|
| **Feature flag races** | Agents toggle flags in parallel | Namespace flags per agent/branch |
|
|
|
|
### Coordination Failures
|
|
|
|
| Footgun | Description | Mitigation |
|
|
|---------|-------------|------------|
|
|
| **State divergence** | Each agent has different snapshot of reality | Single source of truth artifact; frequent rebase |
|
|
| **Silent duplication** | Two agents "fix" same bug differently | Central task ledger with explicit states; idempotent task IDs |
|
|
| **Dependency deadlocks** | A waits on B waits on A | Event-driven async; bounded time limits; no sync waits |
|
|
| **Role collapse** | Planner writes code; tester refactors | Narrow role boundaries; tool-level constraints |
|
|
|
|
### Human Bottlenecks
|
|
|
|
| Footgun | Description | Mitigation |
|
|
|---------|-------------|------------|
|
|
| **Review overload** | 10 agents = 10 partial PRs to reconcile | Review funnel: worker → arbiter agent → single synthesized PR |
|
|
| **Context switching** | Human juggling parallel agent outputs | Size limits per PR; "one story per PR" |
|
|
| **Morale drain** | Endless nit-picking, people disable agents | Pre-review by lint/style agents; humans see substantive deltas only |
|
|
|
|
### Agent-Specific Issues
|
|
|
|
| Footgun | Description | Mitigation |
|
|
|---------|-------------|------------|
|
|
| **Hallucinated packages** | 30% of suggested packages don't exist | Validate imports against known registries |
|
|
| **Temporary fixes** | Works in session, breaks in Docker | Require full env rebuild as acceptance test |
|
|
| **Skill atrophy** | Developers can't code without AI | Deliberate practice; understand what AI generates |
|
|
| **Test/impl conspiracy** | Brittle tests + brittle code pass together | Separate spec tests from impl tests; mutation testing |
|
|
|
|
### Resource & Cost Issues
|
|
|
|
| Footgun | Description | Mitigation |
|
|
|---------|-------------|------------|
|
|
| **Token blowups** | Parallel agents saturate context/API limits | Hard budgets per agent; limit context sizes |
|
|
| **Credit drain** | AI fixing its own mistakes in loops | Circuit breakers; attempt limits |
|
|
| **Timeout misreads** | Rate limits interpreted as semantic failures | Structured error channels; retry with idempotency |
|
|
|
|
## Emerging Patterns (2026)
|
|
|
|
### The "Rule of 4"
|
|
|
|
Research shows effective team sizes limited to ~3-4 agents. Beyond this, communication overhead grows super-linearly (exponent 1.724). Cost of coordination outpaces value.
|
|
|
|
**Implication**: Don't build 10-agent swarms. Build 3-4 specialized agents with clear boundaries.
|
|
|
|
### Spec-Driven Development
|
|
|
|
Adopted by Kiro, Tessl, GitHub Spec Kit:
|
|
- `requirements.md` - what to build
|
|
- `design.md` - how to build it
|
|
- `tasks.md` - decomposed work items
|
|
|
|
Agents work from specs, not vague prompts. Specs are versioned; agents echo which version they used.
|
|
|
|
### Layered Coordination (Not Monolithic)
|
|
|
|
Instead of one complex orchestrator, compose independent layers:
|
|
1. Configuration management
|
|
2. Issue tracking (JSONL, merge-friendly)
|
|
3. Atomic locking (PostgreSQL advisory locks)
|
|
4. Filesystem isolation (git worktrees)
|
|
5. Validation gates
|
|
6. Enforcement rules
|
|
7. Session protocols
|
|
|
|
Each layer independently useful; failures isolated.
|
|
|
|
### PostgreSQL Advisory Locks for Claims
|
|
|
|
Novel insight: Advisory locks auto-release on crash (no orphaned locks), operate in ~1ms, no table writes. Elegant solution for distributed claim races.
|
|
|
|
```sql
|
|
SELECT pg_try_advisory_lock(task_id_hash);
|
|
-- Work...
|
|
SELECT pg_advisory_unlock(task_id_hash);
|
|
-- Or: connection dies → auto-released
|
|
```
|
|
|
|
### Git Bundles for Checkpoints (SkillFS)
|
|
|
|
Every agent sandbox is a git repo. Session ends → git bundle stored. New session → restore from bundle, continue where left off. Complete audit trail via `git log`.
|
|
|
|
### Hierarchical Over Flat Swarms
|
|
|
|
Instead of 100-agent flat swarms:
|
|
- Nested coordination structures
|
|
- Partition the communication graph
|
|
- Supervisor per sub-team
|
|
- Only supervisors talk to each other
|
|
|
|
### Plan-and-Execute Cost Pattern
|
|
|
|
Expensive model creates strategy; cheap models execute steps. Can reduce costs by 90%.
|
|
|
|
```
|
|
Orchestrator (Claude Opus) → Plan
|
|
Workers (Claude Haiku) → Execute steps
|
|
Reviewer (Claude Sonnet) → Validate
|
|
```
|
|
|
|
### Bounded Autonomy Spectrum
|
|
|
|
Progressive autonomy based on risk:
|
|
1. **Human in the loop** - approve each action
|
|
2. **Human on the loop** - monitor, intervene if needed
|
|
3. **Human out of the loop** - fully autonomous
|
|
|
|
Match to task complexity and outcome criticality.
|
|
|
|
## Best Practices Synthesis
|
|
|
|
### From HN Discussions
|
|
|
|
1. **Well-scoped tasks with tight contracts** - Not vague prompts
|
|
2. **Automated testing gates** - Agents must pass before review
|
|
3. **2-3 agents realistic** - Not 10 parallel
|
|
4. **Exclusive ownership per module** - One writer per concern
|
|
5. **Short-lived branches** - Frequent merge to prevent drift
|
|
|
|
### From orch Consensus
|
|
|
|
1. **Treat agents as untrusted workers** - Not peers with full access
|
|
2. **Machine-readable contracts** - JSON schema between roles
|
|
3. **Per-agent logs with correlation IDs** - Distributed systems observability
|
|
4. **Guardrail agents** - Security/policy checks on every diff
|
|
5. **Versioned task specs** - Bump version → re-run affected agents
|
|
|
|
### From Practitioner Blogs
|
|
|
|
1. **Coordination ≠ isolation** - Advisory locks (who works on what) + worktrees (how they work)
|
|
2. **JSONL for issues** - One per line, deterministic merge rules
|
|
3. **Session protocols** - Explicit start/close procedures
|
|
4. **Modular rules with includes** - Template configuration
|
|
|
|
## How This Applies to Our Design
|
|
|
|
### Already Covered
|
|
|
|
| Pattern | Our Design |
|
|
|---------|------------|
|
|
| SQLite for coordination | ✅ bus.db with transactions |
|
|
| Git worktrees | ✅ branch-per-worker.md |
|
|
| State machine | ✅ worker-state-machine.md |
|
|
| Heartbeats/liveness | ✅ 10s interval in message-passing |
|
|
| Claim-check pattern | ✅ SQLite transactions |
|
|
| Task serialization | ✅ No uncommitted dependencies |
|
|
|
|
### Should Add
|
|
|
|
| Pattern | Gap | Action |
|
|
|---------|-----|--------|
|
|
| Spec-driven tasks | Tasks are just titles | Add structured task specs (requirements, design, acceptance) |
|
|
| Role boundaries | Not enforced | Add tool-level constraints per agent type |
|
|
| Review funnel | Missing arbiter | Add synthesis step before human review |
|
|
| Versioned specs | Not tracked | Add version field to task assignments |
|
|
| Cost budgets | Not implemented | Add token/time budgets per agent |
|
|
| Correlation IDs | Partial (correlation_id) | Ensure end-to-end tracing |
|
|
|
|
### Validate Our Decisions
|
|
|
|
| Decision | Validation |
|
|
|----------|------------|
|
|
| SQLite over JSONL | ✅ Confirmed - JSONL for issues only, SQLite for coordination |
|
|
| Orchestrator creates branches | ✅ Confirmed - reduces agent setup, enforces policy |
|
|
| 3-4 agents max | ✅ Aligns with "Rule of 4" research |
|
|
| Mandatory rebase | ✅ Confirmed - prevents stale branch drift |
|
|
| Escalate semantic conflicts | ✅ Confirmed - agents hallucinate resolutions |
|
|
|
|
## Open Questions Surfaced
|
|
|
|
1. **PostgreSQL advisory locks vs SQLite?** - Do we need Postgres, or is SQLite sufficient?
|
|
2. **Git bundles for checkpoints?** - Should we adopt SkillFS pattern?
|
|
3. **Spec files per task?** - How structured should task specs be?
|
|
4. **Arbiter/synthesis agent?** - Add to architecture before human review?
|
|
5. **Token budgets?** - How to enforce across different agent types?
|
|
|
|
## Sources
|
|
|
|
### HN Discussions
|
|
- [Superset: 10 Parallel Coding Agents](https://news.ycombinator.com/item?id=46368739)
|
|
- [Desktop App for Parallel Agentic Dev](https://news.ycombinator.com/item?id=46027947)
|
|
- [SkillFS: Git-backed Sandboxes](https://news.ycombinator.com/item?id=46543093)
|
|
- [Zenflow: Agent Orchestration](https://news.ycombinator.com/item?id=46290617)
|
|
- [Git Worktree for Parallel Dev](https://news.ycombinator.com/item?id=46510462)
|
|
|
|
### Blogs & Articles
|
|
- [Building a Multi-Agent Development Workflow](https://itsgg.com/blog/2026/01/08/building-a-multi-agent-development-workflow/)
|
|
- [The Real Struggle with AI Coding Agents](https://www.smiansh.com/blogs/the-real-struggle-with-ai-coding-agents-and-how-to-overcome-it/)
|
|
- [Why AI Coding Tools Don't Work For Me](https://blog.miguelgrinberg.com/post/why-generative-ai-coding-tools-and-agents-do-not-work-for-me)
|
|
- [Microsoft: Multi-Agent Systems at Scale](https://devblogs.microsoft.com/ise/multi-agent-systems-at-scale/)
|
|
- [LangChain: How and When to Build Multi-Agent](https://blog.langchain.com/how-and-when-to-build-multi-agent-systems/)
|
|
|
|
### Research & Analysis
|
|
- [VentureBeat: More Agents Isn't Better](https://venturebeat.com/orchestration/research-shows-more-agents-isnt-a-reliable-path-to-better-enterprise-ai)
|
|
- [Deloitte: AI Agent Orchestration](https://www.deloitte.com/us/en/insights/industry/technology/technology-media-and-telecom-predictions/2026/ai-agent-orchestration.html)
|
|
- [10 Things Developers Want from Agentic IDEs](https://redmonk.com/kholterhoff/2025/12/22/10-things-developers-want-from-their-agentic-ides-in-2025/)
|