Multi-agent coordination CLI with SQLite message bus: - State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED - Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat - SQLite WAL mode, dedicated heartbeat thread, channel-based IPC - cligen for CLI, tiny_sqlite for DB, ORC memory management Design docs for branch-per-worker, state machine, message passing, and human observability patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
10 KiB
Branch-per-Worker Isolation Design
Status: Draft Bead: skills-roq Epic: skills-s6y (Multi-agent orchestration: Lego brick architecture)
Overview
This document defines the git branching strategy for multi-agent coordination. Each worker operates in an isolated git worktree on a dedicated branch, with mandatory rebase before review.
Design Principles
- Orchestrator controls branch lifecycle - Creates, assigns, cleans up
- Worktrees for parallelism - Each worker gets isolated directory
- Integration branch as staging - Buffer before main
- SQLite = process truth, Git = code truth - Don't duplicate state
- Mandatory rebase - Fresh base before review (consensus requirement)
Key Decisions
Branch Naming: type/task-id
Decision: Use type/task-id (e.g., feat/skills-abc, fix/skills-xyz)
Rationale (2/3 consensus):
- Branch describes work, not worker
- Survives task reassignment (if Claude fails, Gemini can continue)
- Worker identity in commit author:
Author: claude-3.5 <agent@bot>
Rejected alternative: worker-id/task-id - becomes misleading on reassignment
Worktrees vs Checkout
Decision: Git worktrees (parallel directories)
Rationale (3/3 consensus):
git checkoutupdates working directory globally- If Worker A checks out while Worker B writes, corruption possible
- Worktrees share
.gitobject database, isolate filesystem - Maps cleanly to "one worker = one workspace"
/project/
├── .git/ # Shared object database
├── worktrees/
│ ├── skills-abc/ # Worker 1's worktree
│ │ └── (full working copy)
│ └── skills-xyz/ # Worker 2's worktree
│ └── (full working copy)
└── (main working copy) # For orchestrator
Integration Branch
Decision: Use rolling integration branch as staging before main
Rationale (3/3 consensus):
- AI agents introduce subtle regressions
- Integration branch = "demilitarized zone" for combined testing
- Human review before promoting to main
- Allows batching of related changes
main ─────────────────────────●─────────────
↑
integration ────●────●────●────●────●
↑ ↑ ↑ ↑
feat/T-101 ────● │ │ │
feat/T-102 ─────────● │ │
fix/T-103 ──────────────● │
feat/T-104 ───────────────────●
Conflict Handling
Decision: Worker resolves trivial conflicts, escalates semantic conflicts
Rationale (2/3 consensus - flash-or, gpt):
- Blanket "never resolve" is safe but slows throughput
- Mechanical conflicts (formatting, imports, non-overlapping) are safe
- Logic conflicts require human judgment
Rules:
def handle_rebase_conflict(conflict_info):
# Trivial: resolve automatically
if is_trivial_conflict(conflict_info):
resolve_mechanically()
run_tests()
if tests_pass():
continue_rebase()
else:
abort_and_escalate()
# Semantic: always escalate
else:
git_rebase_abort()
set_state(CONFLICTED)
notify_orchestrator()
Trivial conflict criteria:
- Only whitespace/formatting changes
- Import statement ordering
- Non-overlapping edits in same file
- Less than N lines changed in conflict region
Escalate if:
- Conflict touches core logic
- Conflict spans multiple files
- Test failures after resolution
- Uncertain about correctness
State Machine Mapping
Decision: SQLite is process truth, Git is code truth
Rationale (2/3 consensus - gemini, gpt):
- Don't encode state in Git (tags, notes) - causes sync issues
- Observable Git signals already exist:
| Worker State | Git Observable |
|---|---|
| ASSIGNED | Branch exists, worktree created |
| WORKING | New commits appearing |
| IN_REVIEW | Branch pushed, PR opened (or flag in SQLite) |
| APPROVED | PR approved |
| COMPLETED | Merged to integration/main |
| CONFLICTED | Rebase aborted, no new commits |
Link via task-id: Commit trailers connect the two:
feat: implement user authentication
Task: skills-abc
Agent: claude-3.5
Cross-Worker Dependencies
Decision: Strict serialization - don't depend on uncommitted work
Rationale (3/3 consensus):
- "Speculative execution" creates house of cards
- If A's code rejected, B's work becomes invalid
- Cheaper to wait than waste tokens on orphaned code
Pattern for parallel work on related features:
- Orchestrator creates epic branch:
epic/auth-system - Both workers branch from epic:
feat/T-101,feat/T-102 - Workers rebase onto epic, not main
- Epic merged to integration when all tasks complete
Branch Cleanup
Decision: Delete after merge, archive failures
Rationale (3/3 consensus):
- Prevent branch bloat
- Archive failures for post-mortem analysis
# On successful merge
git branch -d feat/T-101
git worktree remove worktrees/T-101
# On failure/abandonment
git branch -m feat/T-101 archive/T-101-$(date +%Y%m%d)
git worktree remove worktrees/T-101
Workflow
1. Task Assignment
Orchestrator prepares workspace:
# 1. Fetch latest
git fetch origin
# 2. Create branch from integration
git branch feat/$TASK_ID origin/integration
# 3. Create worktree
git worktree add worktrees/$TASK_ID feat/$TASK_ID
# 4. Update SQLite
publish(db, 'orchestrator', {
'type': 'task_assign',
'to': worker_id,
'correlation_id': task_id,
'payload': {
'branch': f'feat/{task_id}',
'worktree': f'worktrees/{task_id}'
}
})
2. Worker Starts
Worker receives assignment:
cd worktrees/$TASK_ID
# Confirm environment
git status
git log --oneline -3
# Begin work...
3. Worker Commits
During work:
git add -A
git commit -m "feat: implement feature X
Task: $TASK_ID
Agent: $AGENT_ID"
Update SQLite:
publish(db, agent_id, {
'type': 'state_change',
'correlation_id': task_id,
'payload': {'from': 'ASSIGNED', 'to': 'WORKING'}
})
4. Pre-Review Rebase (Mandatory)
Before requesting review:
# 1. Fetch latest integration
git fetch origin integration
# 2. Attempt rebase
git rebase origin/integration
# 3. Handle result
if [ $? -eq 0 ]; then
# Success - push and request review
git push -u origin feat/$TASK_ID
# Update SQLite: IN_REVIEW
else
# Conflict - check if trivial
if is_trivial_conflict; then
resolve_and_continue
else
git rebase --abort
# Update SQLite: CONFLICTED
fi
fi
5. Review
Review happens (human or review-gate):
# Check review state
review = get_review_state(task_id)
if review['decision'] == 'approved':
publish(db, 'reviewer', {
'type': 'review_result',
'correlation_id': task_id,
'payload': {'decision': 'approved'}
})
elif review['decision'] == 'changes_requested':
publish(db, 'reviewer', {
'type': 'review_result',
'correlation_id': task_id,
'payload': {
'decision': 'changes_requested',
'feedback': review['comments']
}
})
# Worker returns to WORKING state
6. Merge
On approval:
# Orchestrator merges to integration
git checkout integration
git merge --no-ff feat/$TASK_ID -m "Merge feat/$TASK_ID: $TITLE"
git push origin integration
# Cleanup
git branch -d feat/$TASK_ID
git push origin --delete feat/$TASK_ID
git worktree remove worktrees/$TASK_ID
7. Promote to Main
Periodically (or per-task):
# When integration is green
git checkout main
git merge --ff-only integration
git push origin main
Directory Structure
/project/
├── .git/
├── .worker-state/
│ ├── bus.db # SQLite message bus
│ └── workers/
│ └── worker-auth.json
├── worktrees/ # Worker worktrees (gitignored)
│ ├── skills-abc/
│ └── skills-xyz/
└── (main working copy)
Add to .gitignore:
worktrees/
Conflict Resolution Script
#!/bin/bash
# scripts/try-rebase.sh
TASK_ID=$1
TARGET_BRANCH=${2:-origin/integration}
cd worktrees/$TASK_ID
git fetch origin
# Attempt rebase
if git rebase $TARGET_BRANCH; then
echo "Rebase successful"
exit 0
fi
# Check conflict severity
CONFLICT_FILES=$(git diff --name-only --diff-filter=U)
CONFLICT_COUNT=$(echo "$CONFLICT_FILES" | wc -l)
# Trivial: single file, small diff
if [ "$CONFLICT_COUNT" -le 2 ]; then
# Try automatic resolution for whitespace/formatting
for file in $CONFLICT_FILES; do
if git checkout --theirs "$file" 2>/dev/null; then
git add "$file"
else
echo "Cannot auto-resolve: $file"
git rebase --abort
exit 2 # CONFLICTED
fi
done
if git rebase --continue; then
echo "Auto-resolved trivial conflicts"
exit 0
fi
fi
# Non-trivial: abort and escalate
git rebase --abort
echo "Conflict requires human intervention"
exit 2 # CONFLICTED
Integration with State Machine
| State | Git Action | SQLite Message |
|---|---|---|
| IDLE → ASSIGNED | Branch + worktree created | task_assign |
| ASSIGNED → WORKING | First commit | state_change |
| WORKING → IN_REVIEW | Push + rebase success | review_request |
| WORKING → CONFLICTED | Rebase failed | state_change + escalate |
| IN_REVIEW → APPROVED | Review passes | review_result |
| IN_REVIEW → WORKING | Changes requested | review_result |
| APPROVED → COMPLETED | Merged | task_done |
Open Questions
- Worktree location:
./worktrees/or/tmp/worktrees/? - Integration → main cadence: Per-task, hourly, daily, manual?
- Epic branches: How complex should the epic workflow be?
- Failed branch retention: How long to keep archived branches?
References
- OpenHands: https://docs.openhands.dev/sdk/guides/iterative-refinement
- Gastown worktrees: https://github.com/steveyegge/gastown
- Git worktrees: https://git-scm.com/docs/git-worktree