skills/docs/design/branch-per-worker.md
dan 1c66d019bd feat: add worker CLI scaffold in Nim
Multi-agent coordination CLI with SQLite message bus:
- State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED
- Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat
- SQLite WAL mode, dedicated heartbeat thread, channel-based IPC
- cligen for CLI, tiny_sqlite for DB, ORC memory management

Design docs for branch-per-worker, state machine, message passing,
and human observability patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 18:47:47 -08:00

407 lines
10 KiB
Markdown

# Branch-per-Worker Isolation Design
**Status**: Draft
**Bead**: skills-roq
**Epic**: skills-s6y (Multi-agent orchestration: Lego brick architecture)
## Overview
This document defines the git branching strategy for multi-agent coordination. Each worker operates in an isolated git worktree on a dedicated branch, with mandatory rebase before review.
## Design Principles
1. **Orchestrator controls branch lifecycle** - Creates, assigns, cleans up
2. **Worktrees for parallelism** - Each worker gets isolated directory
3. **Integration branch as staging** - Buffer before main
4. **SQLite = process truth, Git = code truth** - Don't duplicate state
5. **Mandatory rebase** - Fresh base before review (consensus requirement)
## Key Decisions
### Branch Naming: `type/task-id`
**Decision**: Use `type/task-id` (e.g., `feat/skills-abc`, `fix/skills-xyz`)
**Rationale** (2/3 consensus):
- Branch describes *work*, not *worker*
- Survives task reassignment (if Claude fails, Gemini can continue)
- Worker identity in commit author: `Author: claude-3.5 <agent@bot>`
**Rejected alternative**: `worker-id/task-id` - becomes misleading on reassignment
### Worktrees vs Checkout
**Decision**: Git worktrees (parallel directories)
**Rationale** (3/3 consensus):
- `git checkout` updates working directory globally
- If Worker A checks out while Worker B writes, corruption possible
- Worktrees share `.git` object database, isolate filesystem
- Maps cleanly to "one worker = one workspace"
```
/project/
├── .git/ # Shared object database
├── worktrees/
│ ├── skills-abc/ # Worker 1's worktree
│ │ └── (full working copy)
│ └── skills-xyz/ # Worker 2's worktree
│ └── (full working copy)
└── (main working copy) # For orchestrator
```
### Integration Branch
**Decision**: Use rolling `integration` branch as staging before `main`
**Rationale** (3/3 consensus):
- AI agents introduce subtle regressions
- Integration branch = "demilitarized zone" for combined testing
- Human review before promoting to main
- Allows batching of related changes
```
main ─────────────────────────●─────────────
integration ────●────●────●────●────●
↑ ↑ ↑ ↑
feat/T-101 ────● │ │ │
feat/T-102 ─────────● │ │
fix/T-103 ──────────────● │
feat/T-104 ───────────────────●
```
### Conflict Handling
**Decision**: Worker resolves trivial conflicts, escalates semantic conflicts
**Rationale** (2/3 consensus - flash-or, gpt):
- Blanket "never resolve" is safe but slows throughput
- Mechanical conflicts (formatting, imports, non-overlapping) are safe
- Logic conflicts require human judgment
**Rules**:
```python
def handle_rebase_conflict(conflict_info):
# Trivial: resolve automatically
if is_trivial_conflict(conflict_info):
resolve_mechanically()
run_tests()
if tests_pass():
continue_rebase()
else:
abort_and_escalate()
# Semantic: always escalate
else:
git_rebase_abort()
set_state(CONFLICTED)
notify_orchestrator()
```
**Trivial conflict criteria**:
- Only whitespace/formatting changes
- Import statement ordering
- Non-overlapping edits in same file
- Less than N lines changed in conflict region
**Escalate if**:
- Conflict touches core logic
- Conflict spans multiple files
- Test failures after resolution
- Uncertain about correctness
### State Machine Mapping
**Decision**: SQLite is process truth, Git is code truth
**Rationale** (2/3 consensus - gemini, gpt):
- Don't encode state in Git (tags, notes) - causes sync issues
- Observable Git signals already exist:
| Worker State | Git Observable |
|--------------|----------------|
| ASSIGNED | Branch exists, worktree created |
| WORKING | New commits appearing |
| IN_REVIEW | Branch pushed, PR opened (or flag in SQLite) |
| APPROVED | PR approved |
| COMPLETED | Merged to integration/main |
| CONFLICTED | Rebase aborted, no new commits |
**Link via task-id**: Commit trailers connect the two:
```
feat: implement user authentication
Task: skills-abc
Agent: claude-3.5
```
### Cross-Worker Dependencies
**Decision**: Strict serialization - don't depend on uncommitted work
**Rationale** (3/3 consensus):
- "Speculative execution" creates house of cards
- If A's code rejected, B's work becomes invalid
- Cheaper to wait than waste tokens on orphaned code
**Pattern for parallel work on related features**:
1. Orchestrator creates epic branch: `epic/auth-system`
2. Both workers branch from epic: `feat/T-101`, `feat/T-102`
3. Workers rebase onto epic, not main
4. Epic merged to integration when all tasks complete
### Branch Cleanup
**Decision**: Delete after merge, archive failures
**Rationale** (3/3 consensus):
- Prevent branch bloat
- Archive failures for post-mortem analysis
```bash
# On successful merge
git branch -d feat/T-101
git worktree remove worktrees/T-101
# On failure/abandonment
git branch -m feat/T-101 archive/T-101-$(date +%Y%m%d)
git worktree remove worktrees/T-101
```
## Workflow
### 1. Task Assignment
Orchestrator prepares workspace:
```bash
# 1. Fetch latest
git fetch origin
# 2. Create branch from integration
git branch feat/$TASK_ID origin/integration
# 3. Create worktree
git worktree add worktrees/$TASK_ID feat/$TASK_ID
# 4. Update SQLite
publish(db, 'orchestrator', {
'type': 'task_assign',
'to': worker_id,
'correlation_id': task_id,
'payload': {
'branch': f'feat/{task_id}',
'worktree': f'worktrees/{task_id}'
}
})
```
### 2. Worker Starts
Worker receives assignment:
```bash
cd worktrees/$TASK_ID
# Confirm environment
git status
git log --oneline -3
# Begin work...
```
### 3. Worker Commits
During work:
```bash
git add -A
git commit -m "feat: implement feature X
Task: $TASK_ID
Agent: $AGENT_ID"
```
Update SQLite:
```python
publish(db, agent_id, {
'type': 'state_change',
'correlation_id': task_id,
'payload': {'from': 'ASSIGNED', 'to': 'WORKING'}
})
```
### 4. Pre-Review Rebase (Mandatory)
Before requesting review:
```bash
# 1. Fetch latest integration
git fetch origin integration
# 2. Attempt rebase
git rebase origin/integration
# 3. Handle result
if [ $? -eq 0 ]; then
# Success - push and request review
git push -u origin feat/$TASK_ID
# Update SQLite: IN_REVIEW
else
# Conflict - check if trivial
if is_trivial_conflict; then
resolve_and_continue
else
git rebase --abort
# Update SQLite: CONFLICTED
fi
fi
```
### 5. Review
Review happens (human or review-gate):
```python
# Check review state
review = get_review_state(task_id)
if review['decision'] == 'approved':
publish(db, 'reviewer', {
'type': 'review_result',
'correlation_id': task_id,
'payload': {'decision': 'approved'}
})
elif review['decision'] == 'changes_requested':
publish(db, 'reviewer', {
'type': 'review_result',
'correlation_id': task_id,
'payload': {
'decision': 'changes_requested',
'feedback': review['comments']
}
})
# Worker returns to WORKING state
```
### 6. Merge
On approval:
```bash
# Orchestrator merges to integration
git checkout integration
git merge --no-ff feat/$TASK_ID -m "Merge feat/$TASK_ID: $TITLE"
git push origin integration
# Cleanup
git branch -d feat/$TASK_ID
git push origin --delete feat/$TASK_ID
git worktree remove worktrees/$TASK_ID
```
### 7. Promote to Main
Periodically (or per-task):
```bash
# When integration is green
git checkout main
git merge --ff-only integration
git push origin main
```
## Directory Structure
```
/project/
├── .git/
├── .worker-state/
│ ├── bus.db # SQLite message bus
│ └── workers/
│ └── worker-auth.json
├── worktrees/ # Worker worktrees (gitignored)
│ ├── skills-abc/
│ └── skills-xyz/
└── (main working copy)
```
Add to `.gitignore`:
```
worktrees/
```
## Conflict Resolution Script
```bash
#!/bin/bash
# scripts/try-rebase.sh
TASK_ID=$1
TARGET_BRANCH=${2:-origin/integration}
cd worktrees/$TASK_ID
git fetch origin
# Attempt rebase
if git rebase $TARGET_BRANCH; then
echo "Rebase successful"
exit 0
fi
# Check conflict severity
CONFLICT_FILES=$(git diff --name-only --diff-filter=U)
CONFLICT_COUNT=$(echo "$CONFLICT_FILES" | wc -l)
# Trivial: single file, small diff
if [ "$CONFLICT_COUNT" -le 2 ]; then
# Try automatic resolution for whitespace/formatting
for file in $CONFLICT_FILES; do
if git checkout --theirs "$file" 2>/dev/null; then
git add "$file"
else
echo "Cannot auto-resolve: $file"
git rebase --abort
exit 2 # CONFLICTED
fi
done
if git rebase --continue; then
echo "Auto-resolved trivial conflicts"
exit 0
fi
fi
# Non-trivial: abort and escalate
git rebase --abort
echo "Conflict requires human intervention"
exit 2 # CONFLICTED
```
## Integration with State Machine
| State | Git Action | SQLite Message |
|-------|------------|----------------|
| IDLE → ASSIGNED | Branch + worktree created | `task_assign` |
| ASSIGNED → WORKING | First commit | `state_change` |
| WORKING → IN_REVIEW | Push + rebase success | `review_request` |
| WORKING → CONFLICTED | Rebase failed | `state_change` + `escalate` |
| IN_REVIEW → APPROVED | Review passes | `review_result` |
| IN_REVIEW → WORKING | Changes requested | `review_result` |
| APPROVED → COMPLETED | Merged | `task_done` |
## Open Questions
1. **Worktree location**: `./worktrees/` or `/tmp/worktrees/`?
2. **Integration → main cadence**: Per-task, hourly, daily, manual?
3. **Epic branches**: How complex should the epic workflow be?
4. **Failed branch retention**: How long to keep archived branches?
## References
- OpenHands: https://docs.openhands.dev/sdk/guides/iterative-refinement
- Gastown worktrees: https://github.com/steveyegge/gastown
- Git worktrees: https://git-scm.com/docs/git-worktree