skills/docs/design/mvp-scope.md
dan 1c66d019bd feat: add worker CLI scaffold in Nim
Multi-agent coordination CLI with SQLite message bus:
- State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED
- Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat
- SQLite WAL mode, dedicated heartbeat thread, channel-based IPC
- cligen for CLI, tiny_sqlite for DB, ORC memory management

Design docs for branch-per-worker, state machine, message passing,
and human observability patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 18:47:47 -08:00

10 KiB

Multi-Agent MVP Scope

Status: Draft (v3 - Nim implementation) Goal: Define the minimal viable set of primitives to run 2-3 worker agents coordinated by a human-attended orchestrator. Language: Nim (ORC, cligen, tiny_sqlite)

Changelog

  • v3: Nim implementation decision (single binary, fast startup, compiled)
  • v2: Fixed BLOCKERs from orch spec-review (resolved open questions, added rejection workflow, added failure scenarios)

Current Design Docs

Doc Status Content
worker-state-machine.md Complete 8 states, transitions, file schema
message-passing-layer.md v4 SQLite bus, Nim heartbeat thread, channels
worker-cli-primitives.md v3 Nim CLI with cligen, state machine
human-observability.md Complete Status dashboard, watch mode, stale detection
branch-per-worker.md Complete Worktrees, integration branch, rebase protocol
multi-agent-footguns-and-patterns.md Complete Research synthesis, validated decisions
message-passing-comparison.md Complete Beads/Tissue comparison, SQLite rationale

Task Triage for MVP

Tier 1: Essential for MVP (Must Have)

These are required to run the basic loop: assign → work → review → merge.

Bead Task Rationale
skills-sse Worker CLI commands Core interface for orchestrator and agents
skills-4oj Worker state machine Already designed, need implementation
skills-ms5 Message passing layer Already designed (SQLite), need implementation
skills-roq Branch-per-worker isolation Already designed, need implementation
skills-byq Integrate review-gate Have review-gate, need to wire to worker flow
skills-yak Human observability (status) Human needs to see what's happening
NEW Agent system prompt LLM needs tool definitions for worker commands

Tier 2: Important but Can Defer

Bead Task Why Defer
skills-0y9 Structured task specs Can start with simple task descriptions
skills-4a2 Role boundaries Trust-based initially, add constraints later
skills-31y Review funnel/arbiter Works with 2-3 agents; needed at scale
skills-zf6 Evidence artifacts Can use simple JSON initially
skills-1jc Stuck detection Monitor manually first (stale detection in MVP)

Tier 3: Nice to Have (Post-MVP)

Bead Task Why Later
skills-1qz Token budgets Manual monitoring first
skills-5ji Ephemeral namespaced envs Single-project MVP
skills-7n4 Rollback strategy Manual rollback first
skills-8ak Git bundle checkpoints Worktrees sufficient
skills-r62 Role + Veto pattern Simple approve/reject first
skills-udu Cross-agent compatibility Single agent type first (Claude)
skills-sh6 OpenHands research Research complete
skills-yc6 Document findings Research captured

MVP Feature Set

Commands

# Orchestrator commands (human runs)
worker spawn <task-id> [--description "..."]  # Create branch, worktree, assign task
worker status [--watch]                        # Dashboard of all workers
worker approve <task-id>                       # IN_REVIEW → APPROVED
worker request-changes <task-id>               # IN_REVIEW → WORKING (rejection)
worker merge <task-id>                         # APPROVED → COMPLETED
worker cancel <task-id>                        # * → FAILED (abort)

# Worker commands (agent runs from worktree)
worker start                               # ASSIGNED → WORKING
worker done [--skip-rebase]                # WORKING → IN_REVIEW (includes rebase)
worker heartbeat                           # Liveness signal (via background thread)
worker fail <reason>                       # WORKING → FAILED

Data Flow

HAPPY PATH:

1. Human: worker spawn skills-abc
   → Creates feat/skills-abc branch
   → Creates worktrees/skills-abc
   → Publishes task_assign message
   → State: ASSIGNED

2. Agent: worker start
   → Publishes state_change (ASSIGNED → WORKING)
   → Starts HeartbeatThread (background)
   → Begins work

3. Agent: worker done
   → Runs git rebase origin/integration
   → Pushes branch
   → Publishes review_request
   → State: IN_REVIEW

4. Human: worker approve skills-abc
   → Publishes review_approved
   → State: APPROVED

5. Human: worker merge skills-abc
   → Merges to integration (retry loop for contention)
   → Cleans up branch/worktree
   → State: COMPLETED

REJECTION PATH:

4b. Human: worker request-changes skills-abc "Fix error handling"
    → Publishes changes_requested
    → State: WORKING
    → Agent resumes work, returns to step 3

CONFLICT PATH:

3b. Agent: worker done (rebase fails)
    → Rebase conflict detected, left in progress
    → State: CONFLICTED
    → Agent resolves conflicts, runs: git rebase --continue
    → Agent: worker done --skip-rebase
    → State: IN_REVIEW

Directory Structure

project/
├── .worker-state/
│   ├── bus.db                    # SQLite message bus (source of truth)
│   ├── bus.jsonl                 # Debug export (derived)
│   ├── blobs/                    # Large payloads (content-addressable)
│   └── workers/
│       └── skills-abc.json       # Worker state cache (derived from DB)
├── worktrees/                    # Git worktrees (gitignored)
│   └── skills-abc/
│       └── .worker-ctx.json      # Static context for this worker
└── .git/

Implementation Order

Prerequisites

# Nim dependencies
nimble install tiny_sqlite cligen jsony

Download SQLite amalgamation for static linking:

curl -O https://sqlite.org/2024/sqlite-amalgamation-3450000.zip
unzip sqlite-amalgamation-3450000.zip
cp sqlite-amalgamation-*/sqlite3.c src/libs/

Build Steps

  1. Project setup

    • Create src/worker.nimble with dependencies
    • Create src/config.nims with build flags (--mm:orc, --threads:on)
    • Set up static SQLite compilation
  2. Message bus (skills-ms5)

    • src/worker/db.nim - SQLite schema, connection setup
    • src/worker/bus.nim - publish/poll/ack functions
    • Dedicated heartbeat thread with channels
  3. Worker state (skills-4oj)

    • src/worker/state.nim - State enum, transition guards
    • src/worker/types.nim - Shared types
    • Compare-and-set with BEGIN IMMEDIATE
  4. Branch primitives (skills-roq)

    • src/worker/git.nim - Worktree create/remove (osproc)
    • Rebase with conflict detection
    • Merge with retry loop
  5. CLI commands (skills-sse)

    • src/worker.nim - cligen dispatchMulti
    • All subcommands: spawn, status, start, done, approve, merge, cancel
    • Background heartbeat thread
  6. review-gate integration (skills-byq)

    • review-gate calls worker approve / worker request-changes
    • Stop hook checks worker state from bus.db
  7. Status dashboard (skills-yak)

    • worker status with table output
    • Stale detection from heartbeats table
    • --watch mode for real-time updates
  8. Agent system prompt

    • Tool definitions for worker commands
    • Context about worktree location, task description
    • Instructions for heartbeat, done, conflict handling

Compilation

nim c -d:release --mm:orc --threads:on src/worker.nim
# Output: single static binary ~2-3MB

Success Criteria

MVP is complete when:

Happy Path

  1. Can spawn a worker with worker spawn <task>
  2. Worker appears in worker status dashboard with state and heartbeat
  3. Agent can signal worker start and worker done
  4. Heartbeats track agent liveness (stale detection after 30s)
  5. worker approve transitions to APPROVED
  6. worker merge completes the cycle
  7. All state persists across session restarts

Failure Scenarios

  1. Rebase conflict detected → state CONFLICTED, rebase left in progress
  2. Agent timeout (no heartbeat 2+ min) → status shows STALE warning
  3. worker request-changes returns to WORKING with feedback
  4. worker cancel aborts any state → FAILED
  5. Concurrent merge attempts handled (retry loop succeeds)

Non-Goals for MVP

  • Multiple orchestrators
  • Cross-machine coordination
  • Automatic conflict resolution (human intervenes)
  • Token budgeting
  • Structured task specs (simple descriptions)
  • Arbiter agents
  • Database isolation per worker

Resolved Questions

Question Decision Rationale
Language Nim Single binary, fast startup, compiled, Python-like syntax
CLI framework cligen Auto-generates from proc signatures
SQLite wrapper tiny_sqlite Better than stdlib, RAII, prepared statements
Memory management ORC Handles cycles, deterministic destruction
Static linking SQLite amalgamation Single binary, no system dependencies
Source of truth SQLite only JSON files are derived caches; DB is authoritative
Heartbeat Dedicated thread + channels Nim threads don't share memory
Integration branch Require exists Human creates integration before first spawn
review-gate Calls worker CLI review-gate approveworker approve
STALE state Computed for display Not a persistent state; derived from heartbeat age

Nim Dependencies

Package Purpose
tiny_sqlite SQLite wrapper with RAII
cligen CLI subcommand generation
jsony Fast JSON parsing (optional)
stdlib osproc Git subprocess operations
stdlib channels Thread communication
stdlib times Epoch timestamps

Spec Review Resolution

Issue Resolution
Missing rejection workflow Added request-changes command and path in data flow
Agent system prompt missing Added to Tier 1 implementation order
Source of truth confusion Clarified SQLite primary, JSON derived
Test scenarios missing Added failure scenarios 8-12 to success criteria
Heartbeat mechanism Dedicated thread with own SQLite connection
Review-gate integration Clarified review-gate calls worker CLI
Language choice Nim for single binary, fast startup