Multi-agent coordination CLI with SQLite message bus: - State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED - Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat - SQLite WAL mode, dedicated heartbeat thread, channel-based IPC - cligen for CLI, tiny_sqlite for DB, ORC memory management Design docs for branch-per-worker, state machine, message passing, and human observability patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
10 KiB
10 KiB
Multi-Agent MVP Scope
Status: Draft (v3 - Nim implementation) Goal: Define the minimal viable set of primitives to run 2-3 worker agents coordinated by a human-attended orchestrator. Language: Nim (ORC, cligen, tiny_sqlite)
Changelog
- v3: Nim implementation decision (single binary, fast startup, compiled)
- v2: Fixed BLOCKERs from orch spec-review (resolved open questions, added rejection workflow, added failure scenarios)
Current Design Docs
| Doc | Status | Content |
|---|---|---|
worker-state-machine.md |
✅ Complete | 8 states, transitions, file schema |
message-passing-layer.md |
✅ v4 | SQLite bus, Nim heartbeat thread, channels |
worker-cli-primitives.md |
✅ v3 | Nim CLI with cligen, state machine |
human-observability.md |
✅ Complete | Status dashboard, watch mode, stale detection |
branch-per-worker.md |
✅ Complete | Worktrees, integration branch, rebase protocol |
multi-agent-footguns-and-patterns.md |
✅ Complete | Research synthesis, validated decisions |
message-passing-comparison.md |
✅ Complete | Beads/Tissue comparison, SQLite rationale |
Task Triage for MVP
Tier 1: Essential for MVP (Must Have)
These are required to run the basic loop: assign → work → review → merge.
| Bead | Task | Rationale |
|---|---|---|
skills-sse |
Worker CLI commands | Core interface for orchestrator and agents |
skills-4oj |
Worker state machine | Already designed, need implementation |
skills-ms5 |
Message passing layer | Already designed (SQLite), need implementation |
skills-roq |
Branch-per-worker isolation | Already designed, need implementation |
skills-byq |
Integrate review-gate | Have review-gate, need to wire to worker flow |
skills-yak |
Human observability (status) | Human needs to see what's happening |
| NEW | Agent system prompt | LLM needs tool definitions for worker commands |
Tier 2: Important but Can Defer
| Bead | Task | Why Defer |
|---|---|---|
skills-0y9 |
Structured task specs | Can start with simple task descriptions |
skills-4a2 |
Role boundaries | Trust-based initially, add constraints later |
skills-31y |
Review funnel/arbiter | Works with 2-3 agents; needed at scale |
skills-zf6 |
Evidence artifacts | Can use simple JSON initially |
skills-1jc |
Stuck detection | Monitor manually first (stale detection in MVP) |
Tier 3: Nice to Have (Post-MVP)
| Bead | Task | Why Later |
|---|---|---|
skills-1qz |
Token budgets | Manual monitoring first |
skills-5ji |
Ephemeral namespaced envs | Single-project MVP |
skills-7n4 |
Rollback strategy | Manual rollback first |
skills-8ak |
Git bundle checkpoints | Worktrees sufficient |
skills-r62 |
Role + Veto pattern | Simple approve/reject first |
skills-udu |
Cross-agent compatibility | Single agent type first (Claude) |
skills-sh6 |
OpenHands research | Research complete |
skills-yc6 |
Document findings | Research captured |
MVP Feature Set
Commands
# Orchestrator commands (human runs)
worker spawn <task-id> [--description "..."] # Create branch, worktree, assign task
worker status [--watch] # Dashboard of all workers
worker approve <task-id> # IN_REVIEW → APPROVED
worker request-changes <task-id> # IN_REVIEW → WORKING (rejection)
worker merge <task-id> # APPROVED → COMPLETED
worker cancel <task-id> # * → FAILED (abort)
# Worker commands (agent runs from worktree)
worker start # ASSIGNED → WORKING
worker done [--skip-rebase] # WORKING → IN_REVIEW (includes rebase)
worker heartbeat # Liveness signal (via background thread)
worker fail <reason> # WORKING → FAILED
Data Flow
HAPPY PATH:
1. Human: worker spawn skills-abc
→ Creates feat/skills-abc branch
→ Creates worktrees/skills-abc
→ Publishes task_assign message
→ State: ASSIGNED
2. Agent: worker start
→ Publishes state_change (ASSIGNED → WORKING)
→ Starts HeartbeatThread (background)
→ Begins work
3. Agent: worker done
→ Runs git rebase origin/integration
→ Pushes branch
→ Publishes review_request
→ State: IN_REVIEW
4. Human: worker approve skills-abc
→ Publishes review_approved
→ State: APPROVED
5. Human: worker merge skills-abc
→ Merges to integration (retry loop for contention)
→ Cleans up branch/worktree
→ State: COMPLETED
REJECTION PATH:
4b. Human: worker request-changes skills-abc "Fix error handling"
→ Publishes changes_requested
→ State: WORKING
→ Agent resumes work, returns to step 3
CONFLICT PATH:
3b. Agent: worker done (rebase fails)
→ Rebase conflict detected, left in progress
→ State: CONFLICTED
→ Agent resolves conflicts, runs: git rebase --continue
→ Agent: worker done --skip-rebase
→ State: IN_REVIEW
Directory Structure
project/
├── .worker-state/
│ ├── bus.db # SQLite message bus (source of truth)
│ ├── bus.jsonl # Debug export (derived)
│ ├── blobs/ # Large payloads (content-addressable)
│ └── workers/
│ └── skills-abc.json # Worker state cache (derived from DB)
├── worktrees/ # Git worktrees (gitignored)
│ └── skills-abc/
│ └── .worker-ctx.json # Static context for this worker
└── .git/
Implementation Order
Prerequisites
# Nim dependencies
nimble install tiny_sqlite cligen jsony
Download SQLite amalgamation for static linking:
curl -O https://sqlite.org/2024/sqlite-amalgamation-3450000.zip
unzip sqlite-amalgamation-3450000.zip
cp sqlite-amalgamation-*/sqlite3.c src/libs/
Build Steps
-
Project setup
- Create
src/worker.nimblewith dependencies - Create
src/config.nimswith build flags (--mm:orc, --threads:on) - Set up static SQLite compilation
- Create
-
Message bus (
skills-ms5)src/worker/db.nim- SQLite schema, connection setupsrc/worker/bus.nim- publish/poll/ack functions- Dedicated heartbeat thread with channels
-
Worker state (
skills-4oj)src/worker/state.nim- State enum, transition guardssrc/worker/types.nim- Shared types- Compare-and-set with BEGIN IMMEDIATE
-
Branch primitives (
skills-roq)src/worker/git.nim- Worktree create/remove (osproc)- Rebase with conflict detection
- Merge with retry loop
-
CLI commands (
skills-sse)src/worker.nim- cligen dispatchMulti- All subcommands: spawn, status, start, done, approve, merge, cancel
- Background heartbeat thread
-
review-gate integration (
skills-byq)- review-gate calls
worker approve/worker request-changes - Stop hook checks worker state from bus.db
- review-gate calls
-
Status dashboard (
skills-yak)worker statuswith table output- Stale detection from heartbeats table
--watchmode for real-time updates
-
Agent system prompt
- Tool definitions for worker commands
- Context about worktree location, task description
- Instructions for heartbeat, done, conflict handling
Compilation
nim c -d:release --mm:orc --threads:on src/worker.nim
# Output: single static binary ~2-3MB
Success Criteria
MVP is complete when:
Happy Path
- Can spawn a worker with
worker spawn <task> - Worker appears in
worker statusdashboard with state and heartbeat - Agent can signal
worker startandworker done - Heartbeats track agent liveness (stale detection after 30s)
worker approvetransitions to APPROVEDworker mergecompletes the cycle- All state persists across session restarts
Failure Scenarios
- Rebase conflict detected → state CONFLICTED, rebase left in progress
- Agent timeout (no heartbeat 2+ min) → status shows STALE warning
worker request-changesreturns to WORKING with feedbackworker cancelaborts any state → FAILED- Concurrent merge attempts handled (retry loop succeeds)
Non-Goals for MVP
- Multiple orchestrators
- Cross-machine coordination
- Automatic conflict resolution (human intervenes)
- Token budgeting
- Structured task specs (simple descriptions)
- Arbiter agents
- Database isolation per worker
Resolved Questions
| Question | Decision | Rationale |
|---|---|---|
| Language | Nim | Single binary, fast startup, compiled, Python-like syntax |
| CLI framework | cligen | Auto-generates from proc signatures |
| SQLite wrapper | tiny_sqlite | Better than stdlib, RAII, prepared statements |
| Memory management | ORC | Handles cycles, deterministic destruction |
| Static linking | SQLite amalgamation | Single binary, no system dependencies |
| Source of truth | SQLite only | JSON files are derived caches; DB is authoritative |
| Heartbeat | Dedicated thread + channels | Nim threads don't share memory |
| Integration branch | Require exists | Human creates integration before first spawn |
| review-gate | Calls worker CLI | review-gate approve → worker approve |
| STALE state | Computed for display | Not a persistent state; derived from heartbeat age |
Nim Dependencies
| Package | Purpose |
|---|---|
tiny_sqlite |
SQLite wrapper with RAII |
cligen |
CLI subcommand generation |
jsony |
Fast JSON parsing (optional) |
stdlib osproc |
Git subprocess operations |
stdlib channels |
Thread communication |
stdlib times |
Epoch timestamps |
Spec Review Resolution
| Issue | Resolution |
|---|---|
| Missing rejection workflow | Added request-changes command and path in data flow |
| Agent system prompt missing | Added to Tier 1 implementation order |
| Source of truth confusion | Clarified SQLite primary, JSON derived |
| Test scenarios missing | Added failure scenarios 8-12 to success criteria |
| Heartbeat mechanism | Dedicated thread with own SQLite connection |
| Review-gate integration | Clarified review-gate calls worker CLI |
| Language choice | Nim for single binary, fast startup |