Multi-agent coordination CLI with SQLite message bus: - State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED - Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat - SQLite WAL mode, dedicated heartbeat thread, channel-based IPC - cligen for CLI, tiny_sqlite for DB, ORC memory management Design docs for branch-per-worker, state machine, message passing, and human observability patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
407 lines
10 KiB
Markdown
407 lines
10 KiB
Markdown
# Branch-per-Worker Isolation Design
|
|
|
|
**Status**: Draft
|
|
**Bead**: skills-roq
|
|
**Epic**: skills-s6y (Multi-agent orchestration: Lego brick architecture)
|
|
|
|
## Overview
|
|
|
|
This document defines the git branching strategy for multi-agent coordination. Each worker operates in an isolated git worktree on a dedicated branch, with mandatory rebase before review.
|
|
|
|
## Design Principles
|
|
|
|
1. **Orchestrator controls branch lifecycle** - Creates, assigns, cleans up
|
|
2. **Worktrees for parallelism** - Each worker gets isolated directory
|
|
3. **Integration branch as staging** - Buffer before main
|
|
4. **SQLite = process truth, Git = code truth** - Don't duplicate state
|
|
5. **Mandatory rebase** - Fresh base before review (consensus requirement)
|
|
|
|
## Key Decisions
|
|
|
|
### Branch Naming: `type/task-id`
|
|
|
|
**Decision**: Use `type/task-id` (e.g., `feat/skills-abc`, `fix/skills-xyz`)
|
|
|
|
**Rationale** (2/3 consensus):
|
|
- Branch describes *work*, not *worker*
|
|
- Survives task reassignment (if Claude fails, Gemini can continue)
|
|
- Worker identity in commit author: `Author: claude-3.5 <agent@bot>`
|
|
|
|
**Rejected alternative**: `worker-id/task-id` - becomes misleading on reassignment
|
|
|
|
### Worktrees vs Checkout
|
|
|
|
**Decision**: Git worktrees (parallel directories)
|
|
|
|
**Rationale** (3/3 consensus):
|
|
- `git checkout` updates working directory globally
|
|
- If Worker A checks out while Worker B writes, corruption possible
|
|
- Worktrees share `.git` object database, isolate filesystem
|
|
- Maps cleanly to "one worker = one workspace"
|
|
|
|
```
|
|
/project/
|
|
├── .git/ # Shared object database
|
|
├── worktrees/
|
|
│ ├── skills-abc/ # Worker 1's worktree
|
|
│ │ └── (full working copy)
|
|
│ └── skills-xyz/ # Worker 2's worktree
|
|
│ └── (full working copy)
|
|
└── (main working copy) # For orchestrator
|
|
```
|
|
|
|
### Integration Branch
|
|
|
|
**Decision**: Use rolling `integration` branch as staging before `main`
|
|
|
|
**Rationale** (3/3 consensus):
|
|
- AI agents introduce subtle regressions
|
|
- Integration branch = "demilitarized zone" for combined testing
|
|
- Human review before promoting to main
|
|
- Allows batching of related changes
|
|
|
|
```
|
|
main ─────────────────────────●─────────────
|
|
↑
|
|
integration ────●────●────●────●────●
|
|
↑ ↑ ↑ ↑
|
|
feat/T-101 ────● │ │ │
|
|
feat/T-102 ─────────● │ │
|
|
fix/T-103 ──────────────● │
|
|
feat/T-104 ───────────────────●
|
|
```
|
|
|
|
### Conflict Handling
|
|
|
|
**Decision**: Worker resolves trivial conflicts, escalates semantic conflicts
|
|
|
|
**Rationale** (2/3 consensus - flash-or, gpt):
|
|
- Blanket "never resolve" is safe but slows throughput
|
|
- Mechanical conflicts (formatting, imports, non-overlapping) are safe
|
|
- Logic conflicts require human judgment
|
|
|
|
**Rules**:
|
|
```python
|
|
def handle_rebase_conflict(conflict_info):
|
|
# Trivial: resolve automatically
|
|
if is_trivial_conflict(conflict_info):
|
|
resolve_mechanically()
|
|
run_tests()
|
|
if tests_pass():
|
|
continue_rebase()
|
|
else:
|
|
abort_and_escalate()
|
|
|
|
# Semantic: always escalate
|
|
else:
|
|
git_rebase_abort()
|
|
set_state(CONFLICTED)
|
|
notify_orchestrator()
|
|
```
|
|
|
|
**Trivial conflict criteria**:
|
|
- Only whitespace/formatting changes
|
|
- Import statement ordering
|
|
- Non-overlapping edits in same file
|
|
- Less than N lines changed in conflict region
|
|
|
|
**Escalate if**:
|
|
- Conflict touches core logic
|
|
- Conflict spans multiple files
|
|
- Test failures after resolution
|
|
- Uncertain about correctness
|
|
|
|
### State Machine Mapping
|
|
|
|
**Decision**: SQLite is process truth, Git is code truth
|
|
|
|
**Rationale** (2/3 consensus - gemini, gpt):
|
|
- Don't encode state in Git (tags, notes) - causes sync issues
|
|
- Observable Git signals already exist:
|
|
|
|
| Worker State | Git Observable |
|
|
|--------------|----------------|
|
|
| ASSIGNED | Branch exists, worktree created |
|
|
| WORKING | New commits appearing |
|
|
| IN_REVIEW | Branch pushed, PR opened (or flag in SQLite) |
|
|
| APPROVED | PR approved |
|
|
| COMPLETED | Merged to integration/main |
|
|
| CONFLICTED | Rebase aborted, no new commits |
|
|
|
|
**Link via task-id**: Commit trailers connect the two:
|
|
```
|
|
feat: implement user authentication
|
|
|
|
Task: skills-abc
|
|
Agent: claude-3.5
|
|
```
|
|
|
|
### Cross-Worker Dependencies
|
|
|
|
**Decision**: Strict serialization - don't depend on uncommitted work
|
|
|
|
**Rationale** (3/3 consensus):
|
|
- "Speculative execution" creates house of cards
|
|
- If A's code rejected, B's work becomes invalid
|
|
- Cheaper to wait than waste tokens on orphaned code
|
|
|
|
**Pattern for parallel work on related features**:
|
|
1. Orchestrator creates epic branch: `epic/auth-system`
|
|
2. Both workers branch from epic: `feat/T-101`, `feat/T-102`
|
|
3. Workers rebase onto epic, not main
|
|
4. Epic merged to integration when all tasks complete
|
|
|
|
### Branch Cleanup
|
|
|
|
**Decision**: Delete after merge, archive failures
|
|
|
|
**Rationale** (3/3 consensus):
|
|
- Prevent branch bloat
|
|
- Archive failures for post-mortem analysis
|
|
|
|
```bash
|
|
# On successful merge
|
|
git branch -d feat/T-101
|
|
git worktree remove worktrees/T-101
|
|
|
|
# On failure/abandonment
|
|
git branch -m feat/T-101 archive/T-101-$(date +%Y%m%d)
|
|
git worktree remove worktrees/T-101
|
|
```
|
|
|
|
## Workflow
|
|
|
|
### 1. Task Assignment
|
|
|
|
Orchestrator prepares workspace:
|
|
|
|
```bash
|
|
# 1. Fetch latest
|
|
git fetch origin
|
|
|
|
# 2. Create branch from integration
|
|
git branch feat/$TASK_ID origin/integration
|
|
|
|
# 3. Create worktree
|
|
git worktree add worktrees/$TASK_ID feat/$TASK_ID
|
|
|
|
# 4. Update SQLite
|
|
publish(db, 'orchestrator', {
|
|
'type': 'task_assign',
|
|
'to': worker_id,
|
|
'correlation_id': task_id,
|
|
'payload': {
|
|
'branch': f'feat/{task_id}',
|
|
'worktree': f'worktrees/{task_id}'
|
|
}
|
|
})
|
|
```
|
|
|
|
### 2. Worker Starts
|
|
|
|
Worker receives assignment:
|
|
|
|
```bash
|
|
cd worktrees/$TASK_ID
|
|
|
|
# Confirm environment
|
|
git status
|
|
git log --oneline -3
|
|
|
|
# Begin work...
|
|
```
|
|
|
|
### 3. Worker Commits
|
|
|
|
During work:
|
|
|
|
```bash
|
|
git add -A
|
|
git commit -m "feat: implement feature X
|
|
|
|
Task: $TASK_ID
|
|
Agent: $AGENT_ID"
|
|
```
|
|
|
|
Update SQLite:
|
|
```python
|
|
publish(db, agent_id, {
|
|
'type': 'state_change',
|
|
'correlation_id': task_id,
|
|
'payload': {'from': 'ASSIGNED', 'to': 'WORKING'}
|
|
})
|
|
```
|
|
|
|
### 4. Pre-Review Rebase (Mandatory)
|
|
|
|
Before requesting review:
|
|
|
|
```bash
|
|
# 1. Fetch latest integration
|
|
git fetch origin integration
|
|
|
|
# 2. Attempt rebase
|
|
git rebase origin/integration
|
|
|
|
# 3. Handle result
|
|
if [ $? -eq 0 ]; then
|
|
# Success - push and request review
|
|
git push -u origin feat/$TASK_ID
|
|
# Update SQLite: IN_REVIEW
|
|
else
|
|
# Conflict - check if trivial
|
|
if is_trivial_conflict; then
|
|
resolve_and_continue
|
|
else
|
|
git rebase --abort
|
|
# Update SQLite: CONFLICTED
|
|
fi
|
|
fi
|
|
```
|
|
|
|
### 5. Review
|
|
|
|
Review happens (human or review-gate):
|
|
|
|
```python
|
|
# Check review state
|
|
review = get_review_state(task_id)
|
|
|
|
if review['decision'] == 'approved':
|
|
publish(db, 'reviewer', {
|
|
'type': 'review_result',
|
|
'correlation_id': task_id,
|
|
'payload': {'decision': 'approved'}
|
|
})
|
|
elif review['decision'] == 'changes_requested':
|
|
publish(db, 'reviewer', {
|
|
'type': 'review_result',
|
|
'correlation_id': task_id,
|
|
'payload': {
|
|
'decision': 'changes_requested',
|
|
'feedback': review['comments']
|
|
}
|
|
})
|
|
# Worker returns to WORKING state
|
|
```
|
|
|
|
### 6. Merge
|
|
|
|
On approval:
|
|
|
|
```bash
|
|
# Orchestrator merges to integration
|
|
git checkout integration
|
|
git merge --no-ff feat/$TASK_ID -m "Merge feat/$TASK_ID: $TITLE"
|
|
git push origin integration
|
|
|
|
# Cleanup
|
|
git branch -d feat/$TASK_ID
|
|
git push origin --delete feat/$TASK_ID
|
|
git worktree remove worktrees/$TASK_ID
|
|
```
|
|
|
|
### 7. Promote to Main
|
|
|
|
Periodically (or per-task):
|
|
|
|
```bash
|
|
# When integration is green
|
|
git checkout main
|
|
git merge --ff-only integration
|
|
git push origin main
|
|
```
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
/project/
|
|
├── .git/
|
|
├── .worker-state/
|
|
│ ├── bus.db # SQLite message bus
|
|
│ └── workers/
|
|
│ └── worker-auth.json
|
|
├── worktrees/ # Worker worktrees (gitignored)
|
|
│ ├── skills-abc/
|
|
│ └── skills-xyz/
|
|
└── (main working copy)
|
|
```
|
|
|
|
Add to `.gitignore`:
|
|
```
|
|
worktrees/
|
|
```
|
|
|
|
## Conflict Resolution Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/try-rebase.sh
|
|
|
|
TASK_ID=$1
|
|
TARGET_BRANCH=${2:-origin/integration}
|
|
|
|
cd worktrees/$TASK_ID
|
|
|
|
git fetch origin
|
|
|
|
# Attempt rebase
|
|
if git rebase $TARGET_BRANCH; then
|
|
echo "Rebase successful"
|
|
exit 0
|
|
fi
|
|
|
|
# Check conflict severity
|
|
CONFLICT_FILES=$(git diff --name-only --diff-filter=U)
|
|
CONFLICT_COUNT=$(echo "$CONFLICT_FILES" | wc -l)
|
|
|
|
# Trivial: single file, small diff
|
|
if [ "$CONFLICT_COUNT" -le 2 ]; then
|
|
# Try automatic resolution for whitespace/formatting
|
|
for file in $CONFLICT_FILES; do
|
|
if git checkout --theirs "$file" 2>/dev/null; then
|
|
git add "$file"
|
|
else
|
|
echo "Cannot auto-resolve: $file"
|
|
git rebase --abort
|
|
exit 2 # CONFLICTED
|
|
fi
|
|
done
|
|
|
|
if git rebase --continue; then
|
|
echo "Auto-resolved trivial conflicts"
|
|
exit 0
|
|
fi
|
|
fi
|
|
|
|
# Non-trivial: abort and escalate
|
|
git rebase --abort
|
|
echo "Conflict requires human intervention"
|
|
exit 2 # CONFLICTED
|
|
```
|
|
|
|
## Integration with State Machine
|
|
|
|
| State | Git Action | SQLite Message |
|
|
|-------|------------|----------------|
|
|
| IDLE → ASSIGNED | Branch + worktree created | `task_assign` |
|
|
| ASSIGNED → WORKING | First commit | `state_change` |
|
|
| WORKING → IN_REVIEW | Push + rebase success | `review_request` |
|
|
| WORKING → CONFLICTED | Rebase failed | `state_change` + `escalate` |
|
|
| IN_REVIEW → APPROVED | Review passes | `review_result` |
|
|
| IN_REVIEW → WORKING | Changes requested | `review_result` |
|
|
| APPROVED → COMPLETED | Merged | `task_done` |
|
|
|
|
## Open Questions
|
|
|
|
1. **Worktree location**: `./worktrees/` or `/tmp/worktrees/`?
|
|
2. **Integration → main cadence**: Per-task, hourly, daily, manual?
|
|
3. **Epic branches**: How complex should the epic workflow be?
|
|
4. **Failed branch retention**: How long to keep archived branches?
|
|
|
|
## References
|
|
|
|
- OpenHands: https://docs.openhands.dev/sdk/guides/iterative-refinement
|
|
- Gastown worktrees: https://github.com/steveyegge/gastown
|
|
- Git worktrees: https://git-scm.com/docs/git-worktree
|