skills/worker-orchestration-aar-2026-01-13.md
dan 461c5ac148 fix(worker): improve spawn reliability and add noFetch flag
- Change default base branch from origin/integration to main
- Add --noFetch flag to skip git fetch (for offline/sandbox use)
- Add try/except with rollback on spawn failure
- Improve error message for missing review-gate
- Add Codex auth.json symlink to use-skills.sh
- Include worker orchestration AAR from 2026-01-13

Addresses pain points from worker-orchestration-aar-2026-01-13.md

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 09:29:36 -08:00

57 lines
2.4 KiB
Markdown

# Worker Orchestration AAR (2026-01-13)
## Goal
Parallelize three bd issues using `worker spawn` and background worker agents.
## Scope
- Issues: `talu-gsga`, `talu-jid5`, `talu-w8oq`
- Base branch: `main`
- Worker model: `sonnet-4.5`
## Environment
- Repo: `/home/dan/proj/talu`
- Network: restricted sandbox (fetch requires escalation)
- Tools: `worker`, `bd`, `git`
## What Happened
1. Ran `worker spawn` using positional args; command failed due to CLI syntax.
2. Retried with `-t` and `-d` flags; `worker` attempted `git fetch` and failed due to network restriction.
3. Retried with `-f main`; fetch still attempted and failed.
4. Partial worktrees and branches were created without worker registry entries.
5. Manually removed worktrees and deleted branches.
6. Re-ran `worker spawn` with network escalation; workers created successfully.
7. `review-gate` was not found, so review integration was disabled.
8. Rendered worker prompts and launched background workers.
## What Went Well
- After network access, `worker spawn` created worktrees/branches reliably.
- Prompt rendering and background worker launch were straightforward.
## Pain Points
- `worker spawn` always attempts `git fetch`, even when `--fromBranch` is local.
- Default base branch is `origin/integration`, which is not present in this repo.
- Spawn failures left behind branches and worktrees without worker registry state.
- Missing `review-gate` produces warnings without guidance on setup.
- Network access requirements are easy to miss during first-time use.
## Impact
- Time lost to retries and cleanup before workers could start.
- Non-obvious failure modes and manual recovery steps.
## Observed Errors
- `spawn does not expect non-option arguments at "talu-gsga"`
- `fatal: not a valid object name: 'origin/integration'`
- `ssh: connect to host 192.168.1.108 port 2222: failure`
- `WARN: enableReview: failed for <id>: review-gate not found`
## Recommendations
1. Allow `worker spawn` to skip `git fetch` when the base branch is local.
2. Make the default base branch configurable or auto-detect a local main branch.
3. Roll back branch/worktree on spawn failure to avoid manual cleanup.
4. Improve error messaging to distinguish network vs branch-not-found.
5. Provide setup guidance when `review-gate` is missing.
## Questions for Worker Team
- Can `worker spawn` be configured to avoid network fetches?
- Is there a way to set a global default base branch (e.g., `main`)?