fix(worker): improve spawn reliability and add noFetch flag

- Change default base branch from origin/integration to main - Add --noFetch flag to skip git fetch (for offline/sandbox use) - Add try/except with rollback on spawn failure - Improve error message for missing review-gate - Add Codex auth.json symlink to use-skills.sh - Include worker orchestration AAR from 2026-01-13 Addresses pain points from worker-orchestration-aar-2026-01-13.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 09:29:36 -08:00 · 2026-01-15 09:29:36 -08:00 · 461c5ac148
parent d463fe58bb
commit 461c5ac148
5 changed files with 114 additions and 26 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
@ -38,6 +38,7 @@
 {"id":"skills-3ja","title":"Design: Cross-agent quality gate architecture","description":"Design a quality gate pattern that works regardless of agent.\n\nRequirements:\n- Worker agent can be Claude, Gemini, OpenCode, etc.\n- Reviewer agent can be any capable model\n- Gate blocks completion until reviewer approves\n- Circuit breakers prevent infinite loops\n- Works in autonomous/unattended scenarios\n\nBuilding on alice/idle research (docs/research/idle-alice-quality-gate.md):\n- alice uses Claude hooks + jwz state\n- We need agent-agnostic equivalent\n\nConsiderations:\n- State management: jwz vs beads vs simple files\n- Enforcement: mechanical vs protocol-based\n- Reviewer selection: orch consensus vs single model\n- Activation: opt-in prefix vs context-based\n\nOutput: Architecture doc with component design","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-09T17:14:20.657906484-08:00","created_by":"dan","updated_at":"2026-01-09T19:33:36.694607649-08:00","closed_at":"2026-01-09T19:33:36.694607649-08:00","close_reason":"Consolidated into skills-8sj"}
 {"id":"skills-3o7","title":"Fix ai-skills.nix missing sha256 hash","description":"modules/ai-skills.nix:16 has empty sha256 placeholder for opencode-skills npm package. Either get actual hash or remove/comment out the incomplete fetchFromNpm approach.","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-11-30T11:58:24.404929863-08:00","updated_at":"2025-11-30T12:12:39.372107348-08:00","closed_at":"2025-11-30T12:12:39.372107348-08:00"}
 {"id":"skills-3uv9","title":"Consider logging cleanup failures in removeWorktree/removeBranch","description":"[ERROR] LOW git.nim:55,60,62 - Cleanup operations ignore failures. May leave orphaned resources. Consider logging failures for debugging.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-10T19:52:14.792134512-08:00","created_by":"dan","updated_at":"2026-01-10T20:37:04.764769727-08:00","closed_at":"2026-01-10T20:37:04.764769727-08:00","close_reason":"Implemented consistent error handling strategy"}
+{"id":"skills-475o","title":"use-skills.sh: redundant file existence check before symlink","description":"## Source\nCode review of uncommitted changes (2026-01-15)\n\n## Finding\n[SECURITY] LOW `bin/use-skills.sh:20-25`\n\nTOCTOU race between `-e` check and `ln -sf`. Unlikely to be exploitable in practice (single-user context), but the check is redundant since `ln -sf` is idempotent.\n\n## Suggestion\nRemove the redundant check - `ln -sf` handles existing files. Simplifies code and eliminates theoretical race.","status":"open","priority":3,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-15T09:28:20.001316946-08:00","created_by":"dan","updated_at":"2026-01-15T09:28:20.001316946-08:00"}
 {"id":"skills-4a2","title":"Design: Role boundaries with tool constraints","description":"Prevent role collapse footgun (planner writing code, tester refactoring). Implement tool-level constraints per agent type: some agents read-only, some propose patches only, only orchestrator commits. Reject outputs that violate role boundaries. From orch consensus and HN practitioner feedback.","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-10T15:41:00.12208959-08:00","created_by":"dan","updated_at":"2026-01-10T15:41:00.12208959-08:00","dependencies":[{"issue_id":"skills-4a2","depends_on_id":"skills-s6y","type":"blocks","created_at":"2026-01-10T15:41:00.123172375-08:00","created_by":"dan"}]}
 {"id":"skills-4dnt","title":"HQ: WIP limits and capacity management","description":"**Raised by:** gpt (primary), gemini\n\n**Problem:**\nHQ becomes a bottleneck if constantly spawning, reviewing, commenting, merging. Without limits, coordination overhead dominates. Can spawn too many workers and exhaust resources.\n\n**gpt:**\n\u003e \"HQ becomes a human-like project manager... this doesn't scale unless review time is small and predictable. Cap WIP (workers in WORKING) based on HQ review bandwidth; enforce WIP limits like Kanban. Don't spawn new workers if \u003eN in review or if HQ backlog exists.\"\n\n**gemini:**\n\u003e \"Dispatch: Spawn new work only if capacity allows.\"\n\n**Suggested fixes:**\n1. Max workers in WORKING limit\n2. Max open PRs / IN_REVIEW limit\n3. Prioritize by dependency chain + risk + expected review time\n4. Session budget: max workers, max retries, cooldown policy\n5. Rate limits on spawning","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-12T09:23:25.433134222-08:00","created_by":"dan","updated_at":"2026-01-12T09:23:25.433134222-08:00"}
 {"id":"skills-4fe","title":"Reframe: Epic around abstract layers (not tools)","description":"Reframe skills-hf1 epic around concepts, not implementations.\n\n## Abstract Layers\n\n| Layer | Concept | Purpose |\n|-------|---------|---------|\n| **Message Passing** | Async agent coordination | Session handoffs, status updates |\n| **Memory** | Persistent work items | Issues, dependencies, review state |\n| **Enforcement** | Quality gates | Block completion until approved |\n\n## Current Problem\nEpic references specific tools (jwz, beads, hooks) rather than concepts.\nMakes it hard to swap implementations.\n\n## Proposed Changes\n1. Update epic description with layer abstractions\n2. Define interface requirements for each layer\n3. Note current implementations as examples, not requirements\n\n## Deliverable\nUpdated epic with tool-agnostic framing","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-09T19:32:57.483804644-08:00","created_by":"dan","updated_at":"2026-01-09T19:34:10.530947162-08:00","closed_at":"2026-01-09T19:34:10.530947162-08:00","close_reason":"Epic skills-hf1 reframed around abstract layers"}
@ -147,6 +148,7 @@
 {"id":"skills-f2p","title":"Skills + Molecules Integration","description":"Integrate skills with beads molecules system.\n\nDesign work tracked in dotfiles (dotfiles-jjb).\n\nComponents:\n- Checklist support (lightweight skills)\n- Audit integration (bd audit for skill execution)\n- Skill frontmatter for triggers/tracking\n- Proto packaging alongside skills\n\nSee: ~/proj/dotfiles ADR work","status":"closed","priority":2,"issue_type":"epic","created_at":"2025-12-23T17:58:55.999438985-05:00","updated_at":"2025-12-23T19:22:38.577280129-05:00","closed_at":"2025-12-23T19:22:38.577280129-05:00","close_reason":"Superseded by skills-4u0 (migrated from dotfiles)","dependencies":[{"issue_id":"skills-f2p","depends_on_id":"skills-vpy","type":"blocks","created_at":"2025-12-23T17:59:17.976956454-05:00","created_by":"daemon"},{"issue_id":"skills-f2p","depends_on_id":"skills-u3d","type":"blocks","created_at":"2025-12-23T17:59:18.015216054-05:00","created_by":"daemon"}]}
 {"id":"skills-f8yd","title":"Extract column width constants for status table","description":"[EVOLVE] LOW worker.nim:84,104-108 - Column widths hardcoded (14,12,8,12,8). Extract to constants or compute dynamically.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-10T20:12:12.129638606-08:00","created_by":"dan","updated_at":"2026-01-11T15:46:39.019717619-08:00","closed_at":"2026-01-11T15:46:39.019717619-08:00","close_reason":"Closed"}
 {"id":"skills-fdu","title":"Verify usage of BusJsonlPath, BlobsDir, WorkersDir constants","description":"[DEAD] LOW - Constants defined in types.nim:64-66 but may be unused. Verify usage in db.nim/state.nim, delete if unused.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-10T18:50:54.020137275-08:00","created_by":"dan","updated_at":"2026-01-10T20:41:09.695978483-08:00","closed_at":"2026-01-10T20:41:09.695978483-08:00","close_reason":"Dead code cleanup complete"}
+{"id":"skills-fext","title":"worker/git.nim: default fromBranch inconsistent with worker.nim","description":"## Source\nCode review of uncommitted changes (2026-01-15)\n\n## Finding\n[SMELL] LOW `src/worker/git.nim:36`\n\nDefault `fromBranch` in git.nim is still \"origin/integration\" but worker.nim changed to \"main\". The git.nim default is now dead code since worker.nim always passes the value.\n\n## Suggestion\nEither keep defaults consistent (both \"main\") or remove default from git.nim since it's always called with explicit value.","status":"open","priority":3,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-15T09:28:25.408287349-08:00","created_by":"dan","updated_at":"2026-01-15T09:28:25.408287349-08:00"}
 {"id":"skills-fjo7","title":"Test HQ Workflow","status":"open","priority":2,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-12T21:02:24.034970739-08:00","created_by":"dan","updated_at":"2026-01-12T21:02:24.034970739-08:00"}
 {"id":"skills-fo3","title":"Compare WORKFLOWS.md with upstream","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-03T20:15:54.283175561-08:00","updated_at":"2025-12-03T20:19:28.897037199-08:00","closed_at":"2025-12-03T20:19:28.897037199-08:00","dependencies":[{"issue_id":"skills-fo3","depends_on_id":"skills-ebh","type":"discovered-from","created_at":"2025-12-03T20:15:54.286009672-08:00","created_by":"daemon","metadata":"{}"}]}
 {"id":"skills-fqu","title":"Research: Agent capability matrix","description":"Document what each agent can and cannot do for cross-agent design decisions.\n\nAgents to cover:\n- Claude Code (claude CLI)\n- Gemini (gemini CLI / AI Studio)\n- OpenCode\n- Codex (OpenAI)\n\nCapabilities to assess:\n- Hooks / lifecycle events\n- Subagent spawning\n- File system access (paths, restrictions)\n- CLI tool execution\n- State persistence\n- Context window / memory\n\nOutput: Matrix showing capability parity and gaps","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-09T17:14:20.541961958-08:00","created_by":"dan","updated_at":"2026-01-09T17:32:23.730556916-08:00","closed_at":"2026-01-09T17:32:23.730556916-08:00","close_reason":"Capability matrix complete: docs/research/agent-capability-matrix.md"}
@ -221,6 +223,7 @@
 {"id":"skills-qeh","title":"Add README.md for web-research skill","description":"web-research skill has SKILL.md and scripts but no README.md. AGENTS.md says README.md is for humans, contains installation instructions, usage examples, prerequisites.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:14.475647113-08:00","updated_at":"2025-12-28T22:37:48.339288261-05:00","closed_at":"2025-12-28T22:37:48.339288261-05:00","close_reason":"Added README.md with prerequisites, usage examples, and cross-references","dependencies":[{"issue_id":"skills-qeh","depends_on_id":"skills-vb5","type":"blocks","created_at":"2025-11-30T12:01:30.278784381-08:00","created_by":"daemon","metadata":"{}"}]}
 {"id":"skills-qekj","title":"Start heartbeat before state transition in start command","description":"[ERROR] MED worker.nim:202-206 - Heartbeat started after state transition. If heartbeat fails, worker is WORKING but not heartbeating. Start heartbeat before transition, or handle failure by reverting state.","status":"closed","priority":2,"issue_type":"bug","created_at":"2026-01-10T20:12:10.656162605-08:00","created_by":"dan","updated_at":"2026-01-10T20:55:02.327535804-08:00","closed_at":"2026-01-10T20:55:02.327535804-08:00","close_reason":"P2 bugs fixed"}
 {"id":"skills-qiq0","title":"Extract DefaultRemote and IntegrationBranch constants","description":"[EVOLVE] LOW git.nim - 'origin' remote and 'integration' branch hardcoded throughout (lines 40,66,67,93,96,97,109,133). Extract to constants in types.nim.","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-10T19:52:14.580188398-08:00","created_by":"dan","updated_at":"2026-01-10T20:32:28.36719341-08:00","closed_at":"2026-01-10T20:32:28.36719341-08:00","close_reason":"Created utils.nim with common helpers"}
+{"id":"skills-qjln","title":"worker spawn: duplicated success output block","description":"## Source\nCode review of uncommitted changes (2026-01-15)\n\n## Finding\n[SMELL] MED `src/worker.nim:40-52`\n\nSuccess message block duplicated with only one line different (review status). 8 identical echo lines repeated.\n\n## Suggestion\nExtract common output to a helper proc or use a single block with conditional review line.","status":"open","priority":3,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-15T09:28:14.216667646-08:00","created_by":"dan","updated_at":"2026-01-15T09:28:14.216667646-08:00"}
 {"id":"skills-qng9","title":"Agent capability benchmark harness","description":"**Status: Design/Brainstorming** - exploring approaches before building\n\n## Vision\nTest and benchmark agent capability on real software engineering tasks.\nEnable private evals on our actual workflows.\n\n## Key Questions (unresolved)\n1. What's the simplest thing that teaches us something?\n2. What's the orchestrator? CLI? Daemon? Just \"invoke claude with context\"?\n3. Where does task decomposition happen?\n4. How much infrastructure do we need vs. just trying things?\n\n## Approaches Considered\n\n### A) Full harness (designed, not built)\n- Scenario YAML schema (done: docs/specs/scenario-schema.md)\n- Verification pipeline: properties → LLM-judge → human\n- Scripted mode (integration) + Live mode (real agents)\n- Benchmarking dimensions\n- **Risk**: Over-engineered before we know what we need\n\n### B) Minimal spike (proposed)\n- Simple script: try-task.sh \"task description\" fixture/\n- Manually invoke Claude in worker context\n- See what happens, learn, iterate\n- **Benefit**: Fast learning, no premature abstraction\n\n### C) Middle ground\n- Start with B, grow toward A based on learnings\n\n## Artifacts Created (exploratory)\n- docs/specs/scenario-schema.md - YAML schema (may simplify)\n- tests/scenarios/{easy,medium,hard}/*.yaml - Example scenarios\n- tests/fixtures/ - Test fixture stubs\n\n## Next Step\nSpike: Actually try running Claude on a task in worker context.\nLearn what works, what breaks, what's needed.\n\n## Related\n- Worker CLI: src/worker.nim (built)\n- Review-gate: skills/review-gate/ (built)\n- Orchestrator: NOT BUILT (shape unknown)","status":"open","priority":2,"issue_type":"epic","created_at":"2026-01-11T16:19:22.737836269-08:00","created_by":"dan","updated_at":"2026-01-11T16:38:40.60324944-08:00"}
 {"id":"skills-qqaa","title":"worker CLI: Safe rebase handling for parallel workers","description":"**Raised by:** flash-or, gemini, gpt (all three)\n\n**Problem:**\nParallel workers branch from same master. When Worker A merges, Worker B is stale. LLMs are notoriously bad at git rebase - they hallucinate conflict resolutions or force push.\n\n**flash-or:**\n\u003e \"Mandatory 'worker rebase \u003cid\u003e' step after any merge to master. HQ should refuse to merge any branch that isn't functionally 'fast-forward' compatible.\"\n\n**gemini:**\n\u003e \"An LLM (Worker B) acts very poorly when asked to 'git rebase'. It often hallucinates conflict resolutions. The system needs an auto-rebase tool that fails safely. Do not ask the LLM to run 'git rebase -i'.\"\n\n**gpt:**\n\u003e \"Workers in long tasks will drift from master and incur conflicts, plus re-review churn. Require periodic rebases at a heartbeat interval or before marking IN_REVIEW.\"\n\n**Suggested fixes:**\n1. Pre-merge rebase requirement (verified by HQ)\n2. Auto-rebase tool that fails safely (no interactive rebase)\n3. Periodic rebase during long tasks\n4. HQ takes conflict resolution directly for complex cases\n5. \"Salvage mode\" - pull commits before canceling stale worker","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-12T09:20:38.348129207-08:00","created_by":"dan","updated_at":"2026-01-12T09:36:53.208834903-08:00","comments":[{"id":6,"issue_id":"skills-qqaa","author":"dan","text":"[RECLASSIFY:2026-01-12T09:36:53-08:00] Moved from HQ to worker CLI layer. \n\nThis is a worker lifecycle concern, not an HQ orchestration decision. The worker CLI should handle rebase safely - HQ just needs to know if it succeeded or failed.\n\nKey: worker done already does rebase. Issue is making it safer (no interactive rebase, fail-safe auto-rebase).","created_at":"2026-01-12T17:36:53Z"}]}
 {"id":"skills-r3k","title":"Extract helper for repetitive null-check pattern in poll()","description":"[SMELL] LOW db.nim:167-176 - Same null-check pattern repeated 5 times. Extract helper: proc optField[T](row, idx): Option[T]","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-10T18:52:40.828545508-08:00","created_by":"dan","updated_at":"2026-01-11T15:34:20.547557264-08:00","closed_at":"2026-01-11T15:34:20.547557264-08:00","close_reason":"Closed"}
@ -244,6 +247,7 @@
 {"id":"skills-uan","title":"worklog: merge Guidelines and Remember sections","description":"Guidelines (8 points) and Remember (6 points) sections overlap significantly - both emphasize comprehensiveness, future context, semantic compression. Consolidate into single principles list. Found by bloat lens review.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-25T02:03:16.148596791-05:00","updated_at":"2025-12-27T10:05:51.527595332-05:00","closed_at":"2025-12-27T10:05:51.527595332-05:00","close_reason":"Closed"}
 {"id":"skills-udu","title":"Design: Cross-agent compatibility layer","description":"Make primitives work with Claude, Gemini, Codex, etc.\n\n## Challenge\nDifferent agents have different:\n- CLI interfaces (claude -p, gemini, codex)\n- Permission models\n- Hook support (Claude has Stop hooks, others don't)\n\n## Approach\nworker spawn abstracts the agent:\n  worker spawn --agent=claude \"task\"\n  worker spawn --agent=gemini \"task\"\n  worker spawn --agent=codex \"task\"\n\nEach agent adapter handles:\n- Command-line invocation\n- Output capture\n- Permission prompt detection\n- Completion detection\n\n## File-based coordination\nAll agents can read/write files.\n.worker-state/ is the universal interface.\nNo agent-specific hooks required for coordination.\n\n## Hook-enhanced (optional)\nClaude: Stop hook for hard gating\nOthers: Orchestrator polling for soft gating","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-10T12:14:51.639787315-08:00","created_by":"dan","updated_at":"2026-01-10T12:14:51.639787315-08:00","dependencies":[{"issue_id":"skills-udu","depends_on_id":"skills-s6y","type":"blocks","created_at":"2026-01-10T12:15:10.2649542-08:00","created_by":"dan"}],"comments":[{"id":16,"issue_id":"skills-udu","author":"dan","text":"[HQ:status:2026-01-12T13:52:39-08:00] Decided against --agent flag. Cross-agent instructions added directly to HQ SKILL.md instead. Simpler approach - HQ just runs bash commands per agent type. See 'Cross-Agent Compatibility' and 'Launch by Agent Type' sections.","created_at":"2026-01-12T21:52:39Z"}]}
 {"id":"skills-ut4","title":"Investigate: Sandbox for research-only subagents","description":"Can we ensure research/explore subagents run in a restricted sandbox?\n\n## Context\nWhen spawning subagents for research tasks (codebase exploration, web search, reading files), they should be read-only and sandboxed - no writes, no destructive commands.\n\n## Questions to Answer\n1. Does Claude Code Task tool support sandbox restrictions for subagents?\n2. Can we pass sandbox mode to Gemini CLI subagents?\n3. How does OpenCode's permission system work for subagents?\n4. Can Codex subagents inherit sandbox restrictions?\n\n## Desired Behavior\n- Research subagent can: Read, Grep, Glob, WebFetch, WebSearch\n- Research subagent cannot: Write, Edit, Bash (destructive), delete\n\n## Security Benefit\nPrevents research tasks from accidentally (or maliciously) modifying files or running destructive commands.\n\nRelated: Cross-agent quality gate architecture (skills-3ja)","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-09T17:31:05.49739394-08:00","created_by":"dan","updated_at":"2026-01-09T17:31:05.49739394-08:00"}
+{"id":"skills-ux6h","title":"worker spawn: error message lacks step context","description":"## Source\nCode review of uncommitted changes (2026-01-15)\n\n## Finding\n[ERROR] MED `src/worker.nim:54`\n\nError message only shows `e.msg`, losing stack trace and context about which step failed (worktree creation vs context file vs DB insert).\n\n## Suggestion\nAdd step context: \"Error during worktree creation: {e.msg}\" or similar. Consider logging full exception for debugging.","status":"open","priority":2,"issue_type":"bug","owner":"dan@delpad","created_at":"2026-01-15T09:28:09.159125356-08:00","created_by":"dan","updated_at":"2026-01-15T09:28:09.159125356-08:00"}
 {"id":"skills-uz4","title":"Compare RESUMABILITY.md with upstream","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-03T20:15:54.897754095-08:00","updated_at":"2025-12-03T20:19:29.384645842-08:00","closed_at":"2025-12-03T20:19:29.384645842-08:00","dependencies":[{"issue_id":"skills-uz4","depends_on_id":"skills-ebh","type":"discovered-from","created_at":"2025-12-03T20:15:54.899671178-08:00","created_by":"daemon","metadata":"{}"}]}
 {"id":"skills-v6p","title":"Move Message type from db.nim to types.nim","description":"[COUPLING] LOW db.nim:130-141 - Message type defined in db.nim but other types are in types.nim. Move for consistency.","status":"open","priority":4,"issue_type":"task","created_at":"2026-01-10T18:52:41.927231152-08:00","created_by":"dan","updated_at":"2026-01-10T18:52:41.927231152-08:00"}
 {"id":"skills-vb5","title":"Resolve web search design questions","description":"web_search_brainstorm.md has unanswered design questions: single smart skill vs explicit flags, specific sources priority, raw links vs summaries. Need user input to finalize web-search/web-research direction.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:33.482270742-08:00","updated_at":"2025-12-28T22:21:05.814118092-05:00","closed_at":"2025-12-28T22:21:05.814118092-05:00","close_reason":"Resolved: keep 2 skills, web-search for OpenCode only (Claude has built-in), web-research for both. Source filtering via WebSearch domains. Summaries by default."}
@ -268,6 +272,7 @@
 {"id":"skills-ybq","title":"Reorganize lens directory structure","description":"Current structure puts ops lenses as subdirectory of code-review lenses:\n\n```\n~/.config/lenses/           \u003c- code-review lenses\n~/.config/lenses/ops/       \u003c- ops-review lenses\n```\n\nThis is asymmetric. Consider:\n\nOption A: Separate top-level directories\n```\n~/.config/lenses/code-review/\n~/.config/lenses/ops-review/\n```\n\nOption B: Keep flat but with prefixes\n```\n~/.config/lenses/code-*.md\n~/.config/lenses/ops-*.md\n```\n\nOption C: Per-skill lens directories\n```\n~/.claude/skills/code-review/lenses/\n~/.claude/skills/ops-review/lenses/\n```\n\nRequires updating:\n- modules/ai-skills.nix (deployment paths)\n- skills/code-review/SKILL.md (expected paths)\n- skills/ops-review/SKILL.md (expected paths)","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T21:57:06.726997606-05:00","created_by":"dan","updated_at":"2026-01-02T00:24:53.647409845-05:00","closed_at":"2026-01-02T00:24:53.647409845-05:00","close_reason":"Reorganized lens directories: code-review → ~/.config/lenses/code/, ops-review → ~/.config/lenses/ops/. Updated ai-skills.nix, SKILL.md, and README references."}
 {"id":"skills-yc6","title":"Research: Document brainstorm findings","description":"Capture research findings in docs/research/ or docs/design/.\n\n## Sources to document\n1. orch consensus on permission patterns (sonar, gemini)\n2. orch brainstorm on creative patterns (flash-or, qwen, gpt, gemini)\n3. Gastown architecture analysis\n4. Steve Yegge Larry Wall/Perl critique (Lego vs pirate ships)\n5. LangGraph breakpoints pattern\n6. MetaGPT software company pattern\n7. Claude Code permission-based gating\n\n## Key patterns to document\n- Negative permission (exclusion-based)\n- Evidence artifacts (structured handoff)\n- Rubber Duck interrupt (stuck detection)\n- Role + Veto (some block, some do)\n- Circuit breakers (non-progress detection)\n- Capability Provenance Pipeline (GPT)\n\n## Output\ndocs/design/multi-agent-lego-architecture.md","notes":"Research complete. Created docs/design/multi-agent-footguns-and-patterns.md with synthesis of HN discussions, practitioner blogs, and orch consensus. Key findings: Rule of 4 (3-4 agents max), spec-driven development, layered coordination, PostgreSQL advisory locks pattern, git bundles for checkpoints. Validated our SQLite, worktree, and rebase decisions. Identified gaps: structured task specs, role boundaries, review funnel, token budgets.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-10T12:15:04.476532719-08:00","created_by":"dan","updated_at":"2026-01-10T15:34:24.496673317-08:00","dependencies":[{"issue_id":"skills-yc6","depends_on_id":"skills-s6y","type":"blocks","created_at":"2026-01-10T12:15:10.316852381-08:00","created_by":"dan"}]}
 {"id":"skills-yxv","title":"worklog: extract hardcoded path to variable","description":"SKILL.md repeats ~/.claude/skills/worklog/ path 4-5 times. Define SKILL_ROOT once, reference throughout. Found by bloat+smells lens review.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-25T02:03:15.831699081-05:00","updated_at":"2025-12-27T10:05:51.532722628-05:00","closed_at":"2025-12-27T10:05:51.532722628-05:00","close_reason":"Closed"}
+{"id":"skills-yylq","title":"worker spawn: rollback may miss partially-created branches","description":"## Source\nCode review of uncommitted changes (2026-01-15)\n\n## Finding\n[ERROR] MED `src/worker.nim:56-62`\n\nRollback checks `worktree != \"\"` and `branch != \"\"` but these are only set AFTER createWorktree succeeds. If createWorktree fails mid-way (after branch created but before worktree), branch won't be cleaned up.\n\n## Evidence\nThe AAR noted \"Partial worktrees and branches were created without worker registry entries\" - this fix may not fully address that.\n\n## Suggestion\nMove variable assignment inside try block to track partial state, or have createWorktree handle its own rollback atomically.","status":"open","priority":2,"issue_type":"bug","owner":"dan@delpad","created_at":"2026-01-15T09:28:02.674685905-08:00","created_by":"dan","updated_at":"2026-01-15T09:28:02.674685905-08:00"}
 {"id":"skills-zf6","title":"Design: Evidence artifacts for review handoff","description":"Structured handoff between agents, not chat transcripts.\n\n## Pattern (from GPT brainstorm)\nDon't share chat transcripts between agents.\nShare evidence artifacts:\n- structured issue description\n- failing test output\n- minimal reproduction\n- proposed diff (patch)\n- reasoning trace summary (3 sentences max)\n\n## Implementation\nWorker completion writes to .worker-state/X.json:\n{\n  \"status\": \"needs_review\",\n  \"evidence\": {\n    \"summary\": \"Added rate limiting to auth endpoint\",\n    \"diff_file\": \".worker-state/X.diff\",\n    \"test_output\": \"...\",\n    \"reasoning\": \"Rate limiting needed per issue #123\"\n  }\n}\n\nReviewer reads evidence, not full transcript.\n\n## Benefits\n- Reduces cross-contamination of mistakes\n- Faster review (structured, not conversational)\n- Model-agnostic (any agent can produce/consume)","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-10T12:14:33.537487043-08:00","created_by":"dan","updated_at":"2026-01-10T12:14:33.537487043-08:00","dependencies":[{"issue_id":"skills-zf6","depends_on_id":"skills-s6y","type":"blocks","created_at":"2026-01-10T12:15:10.105913085-08:00","created_by":"dan"}]}
 {"id":"skills-zp5","title":"Create skills marketplace.json registry","description":"Central registry of all skills for plugin discovery. Follow emes marketplace pattern.","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-09T10:59:24.933190155-08:00","created_by":"dan","updated_at":"2026-01-09T11:21:19.452762097-08:00","closed_at":"2026-01-09T11:21:19.452762097-08:00","close_reason":"Created .claude-plugin/marketplace.json with orch as first plugin. More plugins added as skills are converted.","dependencies":[{"issue_id":"skills-zp5","depends_on_id":"skills-6x1","type":"blocks","created_at":"2026-01-09T10:59:33.223533468-08:00","created_by":"dan"}]}
 {"id":"skills-zws1","title":"Create hello-world script for spike test","status":"closed","priority":2,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-12T21:06:53.040848941-08:00","created_by":"dan","updated_at":"2026-01-12T21:12:40.790376387-08:00","closed_at":"2026-01-12T21:12:40.790376387-08:00","close_reason":"Closed"}
--- a/bin/use-skills.sh
+++ b/bin/use-skills.sh
@ -13,6 +13,18 @@

 set -euo pipefail

+# Default to per-repo Codex skills unless overridden by the caller.
+export CODEX_HOME="${CODEX_HOME:-$PWD/.codex}"
+
+# Ensure global auth is available in per-repo CODEX_HOME to prevent login resets
+if [[ -n "${CODEX_HOME:-}" && -f "$HOME/.codex/auth.json" ]]; then
+  mkdir -p "$CODEX_HOME"
+  # Only link if not already linked/present to avoid constant touching
+  if [[ ! -e "$CODEX_HOME/auth.json" ]]; then
+    ln -sf "$HOME/.codex/auth.json" "$CODEX_HOME/auth.json"
+  fi
+fi
+
 # Default repo - uses local git, override with SKILLS_REPO for remote
 SKILLS_REPO="${SKILLS_REPO:-git+file://$HOME/proj/skills}"

--- a/src/worker.nim
+++ b/src/worker.nim
@ -19,7 +19,7 @@ import worker/[types, db, state, git, context, heartbeat, utils, review]
 # -----------------------------------------------------------------------------

 proc spawn(taskId: string, description: string = "",
-           fromBranch: string = "origin/integration") =
+           fromBranch: string = "main", noFetch: bool = false) =
  ## Create a new worker workspace
  discard validateTaskId(taskId)
  let db = openBusDb()
@ -33,28 +33,41 @@ proc spawn(taskId: string, description: string = "",
    echo "  Worktree: ", existing.get.worktree
    quit(ExitSuccess)

-  # Create git worktree
-  let (branch, worktree) = createWorktree(taskId, fromBranch)
+  var branch, worktree: string
+  try:
+    # Create git worktree
+    (branch, worktree) = createWorktree(taskId, fromBranch, noFetch)

-  # Create context file
-  discard createWorkerContext(taskId, branch, worktree, description)
+    # Create context file
+    discard createWorkerContext(taskId, branch, worktree, description)

-  # Create worker in DB
-  discard db.createWorker(taskId, branch, worktree, description)
+    # Create worker in DB
+    discard db.createWorker(taskId, branch, worktree, description)

-  # Enable review-gate for this task
-  if enableReview(taskId):
-    echo "Created worker: ", taskId
-    echo "  Branch: ", branch
-    echo "  Worktree: ", worktree
-    echo "  State: ASSIGNED"
-    echo "  Review: enabled"
-  else:
-    echo "Created worker: ", taskId
-    echo "  Branch: ", branch
-    echo "  Worktree: ", worktree
-    echo "  State: ASSIGNED"
-    echo "  Review: (review-gate not available)"
+    # Enable review-gate for this task
+    if enableReview(taskId):
+      echo "Created worker: ", taskId
+      echo "  Branch: ", branch
+      echo "  Worktree: ", worktree
+      echo "  State: ASSIGNED"
+      echo "  Review: enabled"
+    else:
+      echo "Created worker: ", taskId
+      echo "  Branch: ", branch
+      echo "  Worktree: ", worktree
+      echo "  State: ASSIGNED"
+      echo "  Review: (review-gate not available - install 'review-gate' for review integration)"
+
+  except CatchableError as e:
+    echo "Error spawning worker: ", e.msg
+    # Rollback
+    if worktree != "":
+      echo "Rolling back worktree..."
+      removeWorktree(taskId)
+    if branch != "":
+      echo "Rolling back branch..."
+      removeBranch(taskId, remote = false)
+    quit(1)

 # Status table column widths
 const
@ -406,7 +419,8 @@ when isMainModule:
  dispatchMulti(
    [spawn, help = {"taskId": "Task identifier",
                    "description": "Task description",
-                    "fromBranch": "Base branch"}],
+                    "fromBranch": "Base branch",
+                    "noFetch": "Skip git fetch"}],
    [status, help = {"state": "Filter by state",
                     "stale": "Show only stale workers",
                     "json": "Output as JSON",
--- a/src/worker/git.nim
+++ b/src/worker/git.nim
@ -33,15 +33,16 @@ proc runGitCheck*(args: openArray[string], workDir = ""): string =
    raise newException(GitError, &"Git command failed ({code}): {output}")
  return output.strip()

-proc createWorktree*(taskId: string, fromBranch: string = "origin/integration"): tuple[branch, worktree: string] =
+proc createWorktree*(taskId: string, fromBranch: string = "origin/integration", noFetch: bool = false): tuple[branch, worktree: string] =
  ## Create a worktree for a task
  let branch = branchName(taskId)
  let worktree = worktreePath(taskId)

-  # Fetch latest (warn but continue on failure - may be offline)
-  let (fetchOut, fetchCode) = runGit(["fetch", "origin"])
-  if fetchCode != 0:
-    logWarn("createWorktree", "fetch failed, continuing with local state: " & fetchOut.strip())
+  if not noFetch:
+    # Fetch latest (warn but continue on failure - may be offline)
+    let (fetchOut, fetchCode) = runGit(["fetch", "origin"])
+    if fetchCode != 0:
+      logWarn("createWorktree", "fetch failed, continuing with local state: " & fetchOut.strip())

  # Create branch from base
  discard runGitCheck(["branch", branch, fromBranch])
--- a/worker-orchestration-aar-2026-01-13.md
+++ b/worker-orchestration-aar-2026-01-13.md
@ -0,0 +1,56 @@
+# Worker Orchestration AAR (2026-01-13)
+
+## Goal
+Parallelize three bd issues using `worker spawn` and background worker agents.
+
+## Scope
+- Issues: `talu-gsga`, `talu-jid5`, `talu-w8oq`
+- Base branch: `main`
+- Worker model: `sonnet-4.5`
+
+## Environment
+- Repo: `/home/dan/proj/talu`
+- Network: restricted sandbox (fetch requires escalation)
+- Tools: `worker`, `bd`, `git`
+
+## What Happened
+1. Ran `worker spawn` using positional args; command failed due to CLI syntax.
+2. Retried with `-t` and `-d` flags; `worker` attempted `git fetch` and failed due to network restriction.
+3. Retried with `-f main`; fetch still attempted and failed.
+4. Partial worktrees and branches were created without worker registry entries.
+5. Manually removed worktrees and deleted branches.
+6. Re-ran `worker spawn` with network escalation; workers created successfully.
+7. `review-gate` was not found, so review integration was disabled.
+8. Rendered worker prompts and launched background workers.
+
+## What Went Well
+- After network access, `worker spawn` created worktrees/branches reliably.
+- Prompt rendering and background worker launch were straightforward.
+
+## Pain Points
+- `worker spawn` always attempts `git fetch`, even when `--fromBranch` is local.
+- Default base branch is `origin/integration`, which is not present in this repo.
+- Spawn failures left behind branches and worktrees without worker registry state.
+- Missing `review-gate` produces warnings without guidance on setup.
+- Network access requirements are easy to miss during first-time use.
+
+## Impact
+- Time lost to retries and cleanup before workers could start.
+- Non-obvious failure modes and manual recovery steps.
+
+## Observed Errors
+- `spawn does not expect non-option arguments at "talu-gsga"`
+- `fatal: not a valid object name: 'origin/integration'`
+- `ssh: connect to host 192.168.1.108 port 2222: failure`
+- `WARN: enableReview: failed for <id>: review-gate not found`
+
+## Recommendations
+1. Allow `worker spawn` to skip `git fetch` when the base branch is local.
+2. Make the default base branch configurable or auto-detect a local main branch.
+3. Roll back branch/worktree on spawn failure to avoid manual cleanup.
+4. Improve error messaging to distinguish network vs branch-not-found.
+5. Provide setup guidance when `review-gate` is missing.
+
+## Questions for Worker Team
+- Can `worker spawn` be configured to avoid network fetches?
+- Is there a way to set a global default base branch (e.g., `main`)?