dan/skills

dan 1c66d019bd feat: add worker CLI scaffold in Nim

Multi-agent coordination CLI with SQLite message bus:
- State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED
- Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat
- SQLite WAL mode, dedicated heartbeat thread, channel-based IPC
- cligen for CLI, tiny_sqlite for DB, ORC memory management

Design docs for branch-per-worker, state machine, message passing,
and human observability patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-10 18:47:47 -08:00

10 KiB

Raw Blame History

Multi-Agent Footguns, Patterns, and Emerging Ideas

Status: Research synthesis Date: 2026-01-10 Sources: HN discussions, Reddit, practitioner blogs, orch consensus

Footguns: Lessons Learned the Hard Way

Git & Branch Chaos

Footgun	Description	Mitigation
Force-resolve conflicts	Agents rebase improperly, rewrite history, break CI	No direct git access for agents; orchestrator owns git operations
Stale branches	Agent works on outdated branch for hours	Frequent auto-rebase; version check before major edits
Recovery nightmare	Broken git state is hard to recover	Git bundles for checkpoints (SkillFS pattern); worktree isolation
Branch naming confusion	`worker-id/task-id` becomes misleading on reassignment	Use `type/task-id`; worker identity in commit author

State & Database Issues

Footgun	Description	Mitigation
Shared DB pollution	Agents debugging against mutated state, heisenbugs	Ephemeral namespaced DBs per branch; schema prefixes
Port conflicts	Multiple web servers on same port	Auto-increment ports; orchestrator manages allocation
Service duplication	10 agents need 10 PostgreSQL/Redis instances	Container-per-worktree; or accept serialization
Feature flag races	Agents toggle flags in parallel	Namespace flags per agent/branch

Coordination Failures

Footgun	Description	Mitigation
State divergence	Each agent has different snapshot of reality	Single source of truth artifact; frequent rebase
Silent duplication	Two agents "fix" same bug differently	Central task ledger with explicit states; idempotent task IDs
Dependency deadlocks	A waits on B waits on A	Event-driven async; bounded time limits; no sync waits
Role collapse	Planner writes code; tester refactors	Narrow role boundaries; tool-level constraints

Human Bottlenecks

Footgun	Description	Mitigation
Review overload	10 agents = 10 partial PRs to reconcile	Review funnel: worker → arbiter agent → single synthesized PR
Context switching	Human juggling parallel agent outputs	Size limits per PR; "one story per PR"
Morale drain	Endless nit-picking, people disable agents	Pre-review by lint/style agents; humans see substantive deltas only

Agent-Specific Issues

Footgun	Description	Mitigation
Hallucinated packages	30% of suggested packages don't exist	Validate imports against known registries
Temporary fixes	Works in session, breaks in Docker	Require full env rebuild as acceptance test
Skill atrophy	Developers can't code without AI	Deliberate practice; understand what AI generates
Test/impl conspiracy	Brittle tests + brittle code pass together	Separate spec tests from impl tests; mutation testing

Resource & Cost Issues

Footgun	Description	Mitigation
Token blowups	Parallel agents saturate context/API limits	Hard budgets per agent; limit context sizes
Credit drain	AI fixing its own mistakes in loops	Circuit breakers; attempt limits
Timeout misreads	Rate limits interpreted as semantic failures	Structured error channels; retry with idempotency

Emerging Patterns (2026)

The "Rule of 4"

Research shows effective team sizes limited to ~3-4 agents. Beyond this, communication overhead grows super-linearly (exponent 1.724). Cost of coordination outpaces value.

Implication: Don't build 10-agent swarms. Build 3-4 specialized agents with clear boundaries.

Spec-Driven Development

Adopted by Kiro, Tessl, GitHub Spec Kit:

requirements.md - what to build
design.md - how to build it
tasks.md - decomposed work items

Agents work from specs, not vague prompts. Specs are versioned; agents echo which version they used.

Layered Coordination (Not Monolithic)

Instead of one complex orchestrator, compose independent layers:

Configuration management
Issue tracking (JSONL, merge-friendly)
Atomic locking (PostgreSQL advisory locks)
Filesystem isolation (git worktrees)
Validation gates
Enforcement rules
Session protocols

Each layer independently useful; failures isolated.

PostgreSQL Advisory Locks for Claims

Novel insight: Advisory locks auto-release on crash (no orphaned locks), operate in ~1ms, no table writes. Elegant solution for distributed claim races.

SELECT pg_try_advisory_lock(task_id_hash);
-- Work...
SELECT pg_advisory_unlock(task_id_hash);
-- Or: connection dies → auto-released

Git Bundles for Checkpoints (SkillFS)

Every agent sandbox is a git repo. Session ends → git bundle stored. New session → restore from bundle, continue where left off. Complete audit trail via git log.

Hierarchical Over Flat Swarms

Instead of 100-agent flat swarms:

Nested coordination structures
Partition the communication graph
Supervisor per sub-team
Only supervisors talk to each other

Plan-and-Execute Cost Pattern

Expensive model creates strategy; cheap models execute steps. Can reduce costs by 90%.

Orchestrator (Claude Opus) → Plan
Workers (Claude Haiku) → Execute steps
Reviewer (Claude Sonnet) → Validate

Bounded Autonomy Spectrum

Progressive autonomy based on risk:

Human in the loop - approve each action
Human on the loop - monitor, intervene if needed
Human out of the loop - fully autonomous

Match to task complexity and outcome criticality.

Best Practices Synthesis

From HN Discussions

Well-scoped tasks with tight contracts - Not vague prompts
Automated testing gates - Agents must pass before review
2-3 agents realistic - Not 10 parallel
Exclusive ownership per module - One writer per concern
Short-lived branches - Frequent merge to prevent drift

From orch Consensus

Treat agents as untrusted workers - Not peers with full access
Machine-readable contracts - JSON schema between roles
Per-agent logs with correlation IDs - Distributed systems observability
Guardrail agents - Security/policy checks on every diff
Versioned task specs - Bump version → re-run affected agents

From Practitioner Blogs

Coordination ≠ isolation - Advisory locks (who works on what) + worktrees (how they work)
JSONL for issues - One per line, deterministic merge rules
Session protocols - Explicit start/close procedures
Modular rules with includes - Template configuration

How This Applies to Our Design

Already Covered

Pattern	Our Design
SQLite for coordination	✅ bus.db with transactions
Git worktrees	✅ branch-per-worker.md
State machine	✅ worker-state-machine.md
Heartbeats/liveness	✅ 10s interval in message-passing
Claim-check pattern	✅ SQLite transactions
Task serialization	✅ No uncommitted dependencies

Should Add

Pattern	Gap	Action
Spec-driven tasks	Tasks are just titles	Add structured task specs (requirements, design, acceptance)
Role boundaries	Not enforced	Add tool-level constraints per agent type
Review funnel	Missing arbiter	Add synthesis step before human review
Versioned specs	Not tracked	Add version field to task assignments
Cost budgets	Not implemented	Add token/time budgets per agent
Correlation IDs	Partial (correlation_id)	Ensure end-to-end tracing

Validate Our Decisions

Decision	Validation
SQLite over JSONL	✅ Confirmed - JSONL for issues only, SQLite for coordination
Orchestrator creates branches	✅ Confirmed - reduces agent setup, enforces policy
3-4 agents max	✅ Aligns with "Rule of 4" research
Mandatory rebase	✅ Confirmed - prevents stale branch drift
Escalate semantic conflicts	✅ Confirmed - agents hallucinate resolutions

Open Questions Surfaced

PostgreSQL advisory locks vs SQLite? - Do we need Postgres, or is SQLite sufficient?
Git bundles for checkpoints? - Should we adopt SkillFS pattern?
Spec files per task? - How structured should task specs be?
Arbiter/synthesis agent? - Add to architecture before human review?
Token budgets? - How to enforce across different agent types?

10 KiB

Raw Blame History

Multi-Agent Footguns, Patterns, and Emerging Ideas

Footguns: Lessons Learned the Hard Way

Git & Branch Chaos

State & Database Issues

Coordination Failures

Human Bottlenecks

Agent-Specific Issues

Resource & Cost Issues

Emerging Patterns (2026)

The "Rule of 4"

Spec-Driven Development

Layered Coordination (Not Monolithic)

PostgreSQL Advisory Locks for Claims

Git Bundles for Checkpoints (SkillFS)

Hierarchical Over Flat Swarms

Plan-and-Execute Cost Pattern

Bounded Autonomy Spectrum

Best Practices Synthesis

From HN Discussions

From orch Consensus

From Practitioner Blogs

How This Applies to Our Design

Already Covered

Should Add

Validate Our Decisions

Open Questions Surfaced

Sources

HN Discussions

Blogs & Articles

Research & Analysis

10 KiB Raw Blame History

Multi-Agent Footguns, Patterns, and Emerging Ideas

Footguns: Lessons Learned the Hard Way

Git & Branch Chaos

State & Database Issues

Coordination Failures

Human Bottlenecks

Agent-Specific Issues

Resource & Cost Issues

Emerging Patterns (2026)

The "Rule of 4"

Spec-Driven Development

Layered Coordination (Not Monolithic)

PostgreSQL Advisory Locks for Claims

Git Bundles for Checkpoints (SkillFS)

Hierarchical Over Flat Swarms

Plan-and-Execute Cost Pattern

Bounded Autonomy Spectrum

Best Practices Synthesis

From HN Discussions

From orch Consensus

From Practitioner Blogs

How This Applies to Our Design

Already Covered

Should Add

Validate Our Decisions

Open Questions Surfaced

Sources

HN Discussions

Blogs & Articles

Research & Analysis

10 KiB

Raw Blame History