Multi-agent coordination CLI with SQLite message bus: - State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED - Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat - SQLite WAL mode, dedicated heartbeat thread, channel-based IPC - cligen for CLI, tiny_sqlite for DB, ORC memory management Design docs for branch-per-worker, state machine, message passing, and human observability patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
278 lines
10 KiB
Markdown
278 lines
10 KiB
Markdown
# Multi-Agent MVP Scope
|
|
|
|
**Status**: Draft (v3 - Nim implementation)
|
|
**Goal**: Define the minimal viable set of primitives to run 2-3 worker agents coordinated by a human-attended orchestrator.
|
|
**Language**: Nim (ORC, cligen, tiny_sqlite)
|
|
|
|
## Changelog
|
|
|
|
- **v3**: Nim implementation decision (single binary, fast startup, compiled)
|
|
- **v2**: Fixed BLOCKERs from orch spec-review (resolved open questions, added rejection workflow, added failure scenarios)
|
|
|
|
## Current Design Docs
|
|
|
|
| Doc | Status | Content |
|
|
|-----|--------|---------|
|
|
| `worker-state-machine.md` | ✅ Complete | 8 states, transitions, file schema |
|
|
| `message-passing-layer.md` | ✅ v4 | SQLite bus, Nim heartbeat thread, channels |
|
|
| `worker-cli-primitives.md` | ✅ v3 | Nim CLI with cligen, state machine |
|
|
| `human-observability.md` | ✅ Complete | Status dashboard, watch mode, stale detection |
|
|
| `branch-per-worker.md` | ✅ Complete | Worktrees, integration branch, rebase protocol |
|
|
| `multi-agent-footguns-and-patterns.md` | ✅ Complete | Research synthesis, validated decisions |
|
|
| `message-passing-comparison.md` | ✅ Complete | Beads/Tissue comparison, SQLite rationale |
|
|
|
|
## Task Triage for MVP
|
|
|
|
### Tier 1: Essential for MVP (Must Have)
|
|
|
|
These are required to run the basic loop: assign → work → review → merge.
|
|
|
|
| Bead | Task | Rationale |
|
|
|------|------|-----------|
|
|
| `skills-sse` | Worker CLI commands | Core interface for orchestrator and agents |
|
|
| `skills-4oj` | Worker state machine | Already designed, need implementation |
|
|
| `skills-ms5` | Message passing layer | Already designed (SQLite), need implementation |
|
|
| `skills-roq` | Branch-per-worker isolation | Already designed, need implementation |
|
|
| `skills-byq` | Integrate review-gate | Have review-gate, need to wire to worker flow |
|
|
| `skills-yak` | Human observability (status) | Human needs to see what's happening |
|
|
| **NEW** | Agent system prompt | LLM needs tool definitions for worker commands |
|
|
|
|
### Tier 2: Important but Can Defer
|
|
|
|
| Bead | Task | Why Defer |
|
|
|------|------|-----------|
|
|
| `skills-0y9` | Structured task specs | Can start with simple task descriptions |
|
|
| `skills-4a2` | Role boundaries | Trust-based initially, add constraints later |
|
|
| `skills-31y` | Review funnel/arbiter | Works with 2-3 agents; needed at scale |
|
|
| `skills-zf6` | Evidence artifacts | Can use simple JSON initially |
|
|
| `skills-1jc` | Stuck detection | Monitor manually first (stale detection in MVP) |
|
|
|
|
### Tier 3: Nice to Have (Post-MVP)
|
|
|
|
| Bead | Task | Why Later |
|
|
|------|------|-----------|
|
|
| `skills-1qz` | Token budgets | Manual monitoring first |
|
|
| `skills-5ji` | Ephemeral namespaced envs | Single-project MVP |
|
|
| `skills-7n4` | Rollback strategy | Manual rollback first |
|
|
| `skills-8ak` | Git bundle checkpoints | Worktrees sufficient |
|
|
| `skills-r62` | Role + Veto pattern | Simple approve/reject first |
|
|
| `skills-udu` | Cross-agent compatibility | Single agent type first (Claude) |
|
|
| `skills-sh6` | OpenHands research | Research complete |
|
|
| `skills-yc6` | Document findings | Research captured |
|
|
|
|
## MVP Feature Set
|
|
|
|
### Commands
|
|
|
|
```bash
|
|
# Orchestrator commands (human runs)
|
|
worker spawn <task-id> [--description "..."] # Create branch, worktree, assign task
|
|
worker status [--watch] # Dashboard of all workers
|
|
worker approve <task-id> # IN_REVIEW → APPROVED
|
|
worker request-changes <task-id> # IN_REVIEW → WORKING (rejection)
|
|
worker merge <task-id> # APPROVED → COMPLETED
|
|
worker cancel <task-id> # * → FAILED (abort)
|
|
|
|
# Worker commands (agent runs from worktree)
|
|
worker start # ASSIGNED → WORKING
|
|
worker done [--skip-rebase] # WORKING → IN_REVIEW (includes rebase)
|
|
worker heartbeat # Liveness signal (via background thread)
|
|
worker fail <reason> # WORKING → FAILED
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
HAPPY PATH:
|
|
|
|
1. Human: worker spawn skills-abc
|
|
→ Creates feat/skills-abc branch
|
|
→ Creates worktrees/skills-abc
|
|
→ Publishes task_assign message
|
|
→ State: ASSIGNED
|
|
|
|
2. Agent: worker start
|
|
→ Publishes state_change (ASSIGNED → WORKING)
|
|
→ Starts HeartbeatThread (background)
|
|
→ Begins work
|
|
|
|
3. Agent: worker done
|
|
→ Runs git rebase origin/integration
|
|
→ Pushes branch
|
|
→ Publishes review_request
|
|
→ State: IN_REVIEW
|
|
|
|
4. Human: worker approve skills-abc
|
|
→ Publishes review_approved
|
|
→ State: APPROVED
|
|
|
|
5. Human: worker merge skills-abc
|
|
→ Merges to integration (retry loop for contention)
|
|
→ Cleans up branch/worktree
|
|
→ State: COMPLETED
|
|
|
|
REJECTION PATH:
|
|
|
|
4b. Human: worker request-changes skills-abc "Fix error handling"
|
|
→ Publishes changes_requested
|
|
→ State: WORKING
|
|
→ Agent resumes work, returns to step 3
|
|
|
|
CONFLICT PATH:
|
|
|
|
3b. Agent: worker done (rebase fails)
|
|
→ Rebase conflict detected, left in progress
|
|
→ State: CONFLICTED
|
|
→ Agent resolves conflicts, runs: git rebase --continue
|
|
→ Agent: worker done --skip-rebase
|
|
→ State: IN_REVIEW
|
|
```
|
|
|
|
### Directory Structure
|
|
|
|
```
|
|
project/
|
|
├── .worker-state/
|
|
│ ├── bus.db # SQLite message bus (source of truth)
|
|
│ ├── bus.jsonl # Debug export (derived)
|
|
│ ├── blobs/ # Large payloads (content-addressable)
|
|
│ └── workers/
|
|
│ └── skills-abc.json # Worker state cache (derived from DB)
|
|
├── worktrees/ # Git worktrees (gitignored)
|
|
│ └── skills-abc/
|
|
│ └── .worker-ctx.json # Static context for this worker
|
|
└── .git/
|
|
```
|
|
|
|
## Implementation Order
|
|
|
|
### Prerequisites
|
|
|
|
```bash
|
|
# Nim dependencies
|
|
nimble install tiny_sqlite cligen jsony
|
|
```
|
|
|
|
Download SQLite amalgamation for static linking:
|
|
```bash
|
|
curl -O https://sqlite.org/2024/sqlite-amalgamation-3450000.zip
|
|
unzip sqlite-amalgamation-3450000.zip
|
|
cp sqlite-amalgamation-*/sqlite3.c src/libs/
|
|
```
|
|
|
|
### Build Steps
|
|
|
|
1. **Project setup**
|
|
- Create `src/worker.nimble` with dependencies
|
|
- Create `src/config.nims` with build flags (--mm:orc, --threads:on)
|
|
- Set up static SQLite compilation
|
|
|
|
2. **Message bus** (`skills-ms5`)
|
|
- `src/worker/db.nim` - SQLite schema, connection setup
|
|
- `src/worker/bus.nim` - publish/poll/ack functions
|
|
- Dedicated heartbeat thread with channels
|
|
|
|
3. **Worker state** (`skills-4oj`)
|
|
- `src/worker/state.nim` - State enum, transition guards
|
|
- `src/worker/types.nim` - Shared types
|
|
- Compare-and-set with BEGIN IMMEDIATE
|
|
|
|
4. **Branch primitives** (`skills-roq`)
|
|
- `src/worker/git.nim` - Worktree create/remove (osproc)
|
|
- Rebase with conflict detection
|
|
- Merge with retry loop
|
|
|
|
5. **CLI commands** (`skills-sse`)
|
|
- `src/worker.nim` - cligen dispatchMulti
|
|
- All subcommands: spawn, status, start, done, approve, merge, cancel
|
|
- Background heartbeat thread
|
|
|
|
6. **review-gate integration** (`skills-byq`)
|
|
- review-gate calls `worker approve` / `worker request-changes`
|
|
- Stop hook checks worker state from bus.db
|
|
|
|
7. **Status dashboard** (`skills-yak`)
|
|
- `worker status` with table output
|
|
- Stale detection from heartbeats table
|
|
- `--watch` mode for real-time updates
|
|
|
|
8. **Agent system prompt**
|
|
- Tool definitions for worker commands
|
|
- Context about worktree location, task description
|
|
- Instructions for heartbeat, done, conflict handling
|
|
|
|
### Compilation
|
|
|
|
```bash
|
|
nim c -d:release --mm:orc --threads:on src/worker.nim
|
|
# Output: single static binary ~2-3MB
|
|
```
|
|
|
|
## Success Criteria
|
|
|
|
MVP is complete when:
|
|
|
|
### Happy Path
|
|
1. [ ] Can spawn a worker with `worker spawn <task>`
|
|
2. [ ] Worker appears in `worker status` dashboard with state and heartbeat
|
|
3. [ ] Agent can signal `worker start` and `worker done`
|
|
4. [ ] Heartbeats track agent liveness (stale detection after 30s)
|
|
5. [ ] `worker approve` transitions to APPROVED
|
|
6. [ ] `worker merge` completes the cycle
|
|
7. [ ] All state persists across session restarts
|
|
|
|
### Failure Scenarios
|
|
8. [ ] Rebase conflict detected → state CONFLICTED, rebase left in progress
|
|
9. [ ] Agent timeout (no heartbeat 2+ min) → status shows STALE warning
|
|
10. [ ] `worker request-changes` returns to WORKING with feedback
|
|
11. [ ] `worker cancel` aborts any state → FAILED
|
|
12. [ ] Concurrent merge attempts handled (retry loop succeeds)
|
|
|
|
## Non-Goals for MVP
|
|
|
|
- Multiple orchestrators
|
|
- Cross-machine coordination
|
|
- Automatic conflict resolution (human intervenes)
|
|
- Token budgeting
|
|
- Structured task specs (simple descriptions)
|
|
- Arbiter agents
|
|
- Database isolation per worker
|
|
|
|
## Resolved Questions
|
|
|
|
| Question | Decision | Rationale |
|
|
|----------|----------|-----------|
|
|
| Language | **Nim** | Single binary, fast startup, compiled, Python-like syntax |
|
|
| CLI framework | **cligen** | Auto-generates from proc signatures |
|
|
| SQLite wrapper | **tiny_sqlite** | Better than stdlib, RAII, prepared statements |
|
|
| Memory management | **ORC** | Handles cycles, deterministic destruction |
|
|
| Static linking | **SQLite amalgamation** | Single binary, no system dependencies |
|
|
| Source of truth | **SQLite only** | JSON files are derived caches; DB is authoritative |
|
|
| Heartbeat | **Dedicated thread + channels** | Nim threads don't share memory |
|
|
| Integration branch | **Require exists** | Human creates `integration` before first spawn |
|
|
| review-gate | **Calls worker CLI** | `review-gate approve` → `worker approve` |
|
|
| STALE state | **Computed for display** | Not a persistent state; derived from heartbeat age |
|
|
|
|
## Nim Dependencies
|
|
|
|
| Package | Purpose |
|
|
|---------|---------|
|
|
| `tiny_sqlite` | SQLite wrapper with RAII |
|
|
| `cligen` | CLI subcommand generation |
|
|
| `jsony` | Fast JSON parsing (optional) |
|
|
| stdlib `osproc` | Git subprocess operations |
|
|
| stdlib `channels` | Thread communication |
|
|
| stdlib `times` | Epoch timestamps |
|
|
|
|
## Spec Review Resolution
|
|
|
|
| Issue | Resolution |
|
|
|-------|------------|
|
|
| Missing rejection workflow | Added `request-changes` command and path in data flow |
|
|
| Agent system prompt missing | Added to Tier 1 implementation order |
|
|
| Source of truth confusion | Clarified SQLite primary, JSON derived |
|
|
| Test scenarios missing | Added failure scenarios 8-12 to success criteria |
|
|
| Heartbeat mechanism | Dedicated thread with own SQLite connection |
|
|
| Review-gate integration | Clarified review-gate calls worker CLI |
|
|
| Language choice | Nim for single binary, fast startup |
|