skills/docs/design/branch-per-worker.md
dan 1c66d019bd feat: add worker CLI scaffold in Nim
Multi-agent coordination CLI with SQLite message bus:
- State machine: ASSIGNED -> WORKING -> IN_REVIEW -> APPROVED -> COMPLETED
- Commands: spawn, start, done, approve, merge, cancel, fail, heartbeat
- SQLite WAL mode, dedicated heartbeat thread, channel-based IPC
- cligen for CLI, tiny_sqlite for DB, ORC memory management

Design docs for branch-per-worker, state machine, message passing,
and human observability patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 18:47:47 -08:00

10 KiB

Branch-per-Worker Isolation Design

Status: Draft Bead: skills-roq Epic: skills-s6y (Multi-agent orchestration: Lego brick architecture)

Overview

This document defines the git branching strategy for multi-agent coordination. Each worker operates in an isolated git worktree on a dedicated branch, with mandatory rebase before review.

Design Principles

  1. Orchestrator controls branch lifecycle - Creates, assigns, cleans up
  2. Worktrees for parallelism - Each worker gets isolated directory
  3. Integration branch as staging - Buffer before main
  4. SQLite = process truth, Git = code truth - Don't duplicate state
  5. Mandatory rebase - Fresh base before review (consensus requirement)

Key Decisions

Branch Naming: type/task-id

Decision: Use type/task-id (e.g., feat/skills-abc, fix/skills-xyz)

Rationale (2/3 consensus):

  • Branch describes work, not worker
  • Survives task reassignment (if Claude fails, Gemini can continue)
  • Worker identity in commit author: Author: claude-3.5 <agent@bot>

Rejected alternative: worker-id/task-id - becomes misleading on reassignment

Worktrees vs Checkout

Decision: Git worktrees (parallel directories)

Rationale (3/3 consensus):

  • git checkout updates working directory globally
  • If Worker A checks out while Worker B writes, corruption possible
  • Worktrees share .git object database, isolate filesystem
  • Maps cleanly to "one worker = one workspace"
/project/
├── .git/                    # Shared object database
├── worktrees/
│   ├── skills-abc/          # Worker 1's worktree
│   │   └── (full working copy)
│   └── skills-xyz/          # Worker 2's worktree
│       └── (full working copy)
└── (main working copy)      # For orchestrator

Integration Branch

Decision: Use rolling integration branch as staging before main

Rationale (3/3 consensus):

  • AI agents introduce subtle regressions
  • Integration branch = "demilitarized zone" for combined testing
  • Human review before promoting to main
  • Allows batching of related changes
main          ─────────────────────────●─────────────
                                       ↑
integration   ────●────●────●────●────●
                  ↑    ↑    ↑    ↑
feat/T-101   ────●    │    │    │
feat/T-102   ─────────●    │    │
fix/T-103    ──────────────●    │
feat/T-104   ───────────────────●

Conflict Handling

Decision: Worker resolves trivial conflicts, escalates semantic conflicts

Rationale (2/3 consensus - flash-or, gpt):

  • Blanket "never resolve" is safe but slows throughput
  • Mechanical conflicts (formatting, imports, non-overlapping) are safe
  • Logic conflicts require human judgment

Rules:

def handle_rebase_conflict(conflict_info):
    # Trivial: resolve automatically
    if is_trivial_conflict(conflict_info):
        resolve_mechanically()
        run_tests()
        if tests_pass():
            continue_rebase()
        else:
            abort_and_escalate()

    # Semantic: always escalate
    else:
        git_rebase_abort()
        set_state(CONFLICTED)
        notify_orchestrator()

Trivial conflict criteria:

  • Only whitespace/formatting changes
  • Import statement ordering
  • Non-overlapping edits in same file
  • Less than N lines changed in conflict region

Escalate if:

  • Conflict touches core logic
  • Conflict spans multiple files
  • Test failures after resolution
  • Uncertain about correctness

State Machine Mapping

Decision: SQLite is process truth, Git is code truth

Rationale (2/3 consensus - gemini, gpt):

  • Don't encode state in Git (tags, notes) - causes sync issues
  • Observable Git signals already exist:
Worker State Git Observable
ASSIGNED Branch exists, worktree created
WORKING New commits appearing
IN_REVIEW Branch pushed, PR opened (or flag in SQLite)
APPROVED PR approved
COMPLETED Merged to integration/main
CONFLICTED Rebase aborted, no new commits

Link via task-id: Commit trailers connect the two:

feat: implement user authentication

Task: skills-abc
Agent: claude-3.5

Cross-Worker Dependencies

Decision: Strict serialization - don't depend on uncommitted work

Rationale (3/3 consensus):

  • "Speculative execution" creates house of cards
  • If A's code rejected, B's work becomes invalid
  • Cheaper to wait than waste tokens on orphaned code

Pattern for parallel work on related features:

  1. Orchestrator creates epic branch: epic/auth-system
  2. Both workers branch from epic: feat/T-101, feat/T-102
  3. Workers rebase onto epic, not main
  4. Epic merged to integration when all tasks complete

Branch Cleanup

Decision: Delete after merge, archive failures

Rationale (3/3 consensus):

  • Prevent branch bloat
  • Archive failures for post-mortem analysis
# On successful merge
git branch -d feat/T-101
git worktree remove worktrees/T-101

# On failure/abandonment
git branch -m feat/T-101 archive/T-101-$(date +%Y%m%d)
git worktree remove worktrees/T-101

Workflow

1. Task Assignment

Orchestrator prepares workspace:

# 1. Fetch latest
git fetch origin

# 2. Create branch from integration
git branch feat/$TASK_ID origin/integration

# 3. Create worktree
git worktree add worktrees/$TASK_ID feat/$TASK_ID

# 4. Update SQLite
publish(db, 'orchestrator', {
    'type': 'task_assign',
    'to': worker_id,
    'correlation_id': task_id,
    'payload': {
        'branch': f'feat/{task_id}',
        'worktree': f'worktrees/{task_id}'
    }
})

2. Worker Starts

Worker receives assignment:

cd worktrees/$TASK_ID

# Confirm environment
git status
git log --oneline -3

# Begin work...

3. Worker Commits

During work:

git add -A
git commit -m "feat: implement feature X

Task: $TASK_ID
Agent: $AGENT_ID"

Update SQLite:

publish(db, agent_id, {
    'type': 'state_change',
    'correlation_id': task_id,
    'payload': {'from': 'ASSIGNED', 'to': 'WORKING'}
})

4. Pre-Review Rebase (Mandatory)

Before requesting review:

# 1. Fetch latest integration
git fetch origin integration

# 2. Attempt rebase
git rebase origin/integration

# 3. Handle result
if [ $? -eq 0 ]; then
    # Success - push and request review
    git push -u origin feat/$TASK_ID
    # Update SQLite: IN_REVIEW
else
    # Conflict - check if trivial
    if is_trivial_conflict; then
        resolve_and_continue
    else
        git rebase --abort
        # Update SQLite: CONFLICTED
    fi
fi

5. Review

Review happens (human or review-gate):

# Check review state
review = get_review_state(task_id)

if review['decision'] == 'approved':
    publish(db, 'reviewer', {
        'type': 'review_result',
        'correlation_id': task_id,
        'payload': {'decision': 'approved'}
    })
elif review['decision'] == 'changes_requested':
    publish(db, 'reviewer', {
        'type': 'review_result',
        'correlation_id': task_id,
        'payload': {
            'decision': 'changes_requested',
            'feedback': review['comments']
        }
    })
    # Worker returns to WORKING state

6. Merge

On approval:

# Orchestrator merges to integration
git checkout integration
git merge --no-ff feat/$TASK_ID -m "Merge feat/$TASK_ID: $TITLE"
git push origin integration

# Cleanup
git branch -d feat/$TASK_ID
git push origin --delete feat/$TASK_ID
git worktree remove worktrees/$TASK_ID

7. Promote to Main

Periodically (or per-task):

# When integration is green
git checkout main
git merge --ff-only integration
git push origin main

Directory Structure

/project/
├── .git/
├── .worker-state/
│   ├── bus.db              # SQLite message bus
│   └── workers/
│       └── worker-auth.json
├── worktrees/              # Worker worktrees (gitignored)
│   ├── skills-abc/
│   └── skills-xyz/
└── (main working copy)

Add to .gitignore:

worktrees/

Conflict Resolution Script

#!/bin/bash
# scripts/try-rebase.sh

TASK_ID=$1
TARGET_BRANCH=${2:-origin/integration}

cd worktrees/$TASK_ID

git fetch origin

# Attempt rebase
if git rebase $TARGET_BRANCH; then
    echo "Rebase successful"
    exit 0
fi

# Check conflict severity
CONFLICT_FILES=$(git diff --name-only --diff-filter=U)
CONFLICT_COUNT=$(echo "$CONFLICT_FILES" | wc -l)

# Trivial: single file, small diff
if [ "$CONFLICT_COUNT" -le 2 ]; then
    # Try automatic resolution for whitespace/formatting
    for file in $CONFLICT_FILES; do
        if git checkout --theirs "$file" 2>/dev/null; then
            git add "$file"
        else
            echo "Cannot auto-resolve: $file"
            git rebase --abort
            exit 2  # CONFLICTED
        fi
    done

    if git rebase --continue; then
        echo "Auto-resolved trivial conflicts"
        exit 0
    fi
fi

# Non-trivial: abort and escalate
git rebase --abort
echo "Conflict requires human intervention"
exit 2  # CONFLICTED

Integration with State Machine

State Git Action SQLite Message
IDLE → ASSIGNED Branch + worktree created task_assign
ASSIGNED → WORKING First commit state_change
WORKING → IN_REVIEW Push + rebase success review_request
WORKING → CONFLICTED Rebase failed state_change + escalate
IN_REVIEW → APPROVED Review passes review_result
IN_REVIEW → WORKING Changes requested review_result
APPROVED → COMPLETED Merged task_done

Open Questions

  1. Worktree location: ./worktrees/ or /tmp/worktrees/?
  2. Integration → main cadence: Per-task, hourly, daily, manual?
  3. Epic branches: How complex should the epic workflow be?
  4. Failed branch retention: How long to keep archived branches?

References