Covers: review-gate Stop hook fixes, circuit breaker, research on OpenHands/Gastown/JWZ patterns, epic creation, phased approach Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
221 lines
11 KiB
Org Mode
221 lines
11 KiB
Org Mode
#+TITLE: Multi-Agent "Lego Brick" Architecture Design and review-gate Prototype
|
|
#+DATE: 2026-01-10
|
|
#+KEYWORDS: multi-agent, review-gate, orchestration, stop-hook, circuit-breaker, gastown, openhands, jwz, worker-agents
|
|
#+COMMITS: 7
|
|
#+COMPRESSION_STATUS: uncompressed
|
|
|
|
* Session Summary
|
|
** Date: 2026-01-10
|
|
** Focus Area: Multi-agent coordination architecture design and review-gate quality gate implementation
|
|
|
|
This was a significant research and design session that evolved from fixing the review-gate Stop hook implementation into a broader architectural investigation of multi-agent coordination patterns. The session culminated in defining a "Lego brick" architecture - simple, composable primitives for agent coordination that avoid the complexity of systems like Gastown.
|
|
|
|
* Accomplishments
|
|
- [X] Fixed review-gate Stop hook output format (exit code 2 + JSON to stderr)
|
|
- [X] Implemented circuit breaker pattern to prevent infinite Stop hook loops
|
|
- [X] Created comprehensive test harness (46 unit tests + integration test)
|
|
- [X] Researched multi-agent patterns from OpenHands, Gastown, JWZ, MetaGPT
|
|
- [X] Created epic skills-s6y "Multi-agent orchestration: Lego brick architecture"
|
|
- [X] Filed 11 design tasks covering all architecture components
|
|
- [X] Defined phased approach: Simple JSONL → JWZ-like → Talu/service layer
|
|
- [X] Analyzed Steve Yegge's Larry Wall critique - "Lego bricks vs pirate ships"
|
|
- [X] Compared our approach with OpenHands iterative refinement pattern
|
|
- [ ] Live Stop hook test (blocked by API credits, harness ready)
|
|
|
|
* Key Decisions
|
|
** Decision 1: Exit code 2 + JSON for Stop hook blocking
|
|
- Context: Initial implementation used exit code 1 with plain text, which didn't block Claude Code properly
|
|
- Research revealed: Claude Code Stop hooks require exit code 2 with JSON to stderr containing "decision": "block" and "reason" field
|
|
- Rationale: This is the documented pattern that causes Claude to continue with feedback
|
|
- Impact: review-gate now properly integrates with Claude Code Stop hooks
|
|
|
|
** Decision 2: Circuit breaker with 3 attempts
|
|
- Context: Stop hook blocking caused infinite loop → stack overflow crash
|
|
- Options considered:
|
|
1. Allow exit on stop_hook_active (soft gate, only blocks once)
|
|
2. Track attempts, circuit breaker after N (hard gate with escape hatch)
|
|
- Rationale: Hard gate with circuit breaker gives agent multiple chances to spawn reviewer before giving up
|
|
- Implementation: .attempts file tracks blocks, trips at 3 (configurable via REVIEW_MAX_ATTEMPTS)
|
|
|
|
** Decision 3: Lego bricks over Gastown complexity
|
|
- Context: Gastown (Steve Yegge) has Mayor, Polecats, Convoys, Hooks, Refineries, Witnesses - Kubernetes-level complexity
|
|
- Yegge's own critique of Larry Wall: "Perl excels at marketing glossy shortcuts while failing to provide fundamental compositional primitives"
|
|
- Rationale: We want generic Lego bricks, not specialized pirate ships
|
|
- Impact: Architecture defined as 6 composable primitives: worker spawn/status/merge, review-gate, worker stuck, worker veto
|
|
|
|
** Decision 4: Local filesystem for all coordination
|
|
- Context: JWZ uses JSONL + SQLite, Gastown uses git worktrees, OpenHands uses git branches + PRs
|
|
- Key insight: "Any agent that can read/write files can participate" - no network calls for coordination
|
|
- Rationale: Simplicity, cross-agent compatibility, no external dependencies
|
|
- Impact: .worker-state/ directory with JSONL files for message passing
|
|
|
|
** Decision 5: Branch-per-worker for code isolation
|
|
- Context: OpenHands uses git branches, each agent on own branch, PRs to rolling branch
|
|
- Rationale: Clean isolation, easy rollback, familiar git workflow
|
|
- Impact: worker spawn creates branch, worker merge handles integration
|
|
|
|
** Decision 6: Phased approach for message passing
|
|
- Context: JWZ adds SQLite cache, threading, identity system over basic JSONL
|
|
- Options:
|
|
1. Simple JSONL now (cat | jq)
|
|
2. JWZ-like features when needed (SQLite, threading)
|
|
3. Talu service layer for rich querying (future)
|
|
- Rationale: Start simple, grow as needed - avoid premature complexity
|
|
- Impact: Phase 1 is just JSONL files, complexity added only when pain points emerge
|
|
|
|
** Decision 7: Drop "Negative Permission" pattern
|
|
- Context: Brainstorm suggested "agent asks what NOT to do" pattern
|
|
- User feedback: "Not loving it"
|
|
- Rationale: Added complexity without clear benefit for our use case
|
|
- Impact: Closed skills-den, removed from architecture
|
|
|
|
* Problems & Solutions
|
|
| Problem | Solution | Learning |
|
|
|---------|----------|----------|
|
|
| Stop hook exit code 1 didn't block | Changed to exit code 2 + JSON stderr | Claude Code hooks need specific format |
|
|
| Infinite loop → stack overflow crash | Circuit breaker with attempt tracking | stop_hook_active flag exists but isn't enough |
|
|
| Can't test Stop hook mid-session | Created integration test harness | Hooks load at session start |
|
|
| API credits blocked live test | Harness detects and skips gracefully | Test harness must handle external failures |
|
|
| Message passing complexity unclear | Phased approach: simple → complex | Don't build JWZ-level features until needed |
|
|
|
|
* Technical Details
|
|
|
|
** Code Changes
|
|
- Total files modified: 13
|
|
- Key files changed:
|
|
- `skills/review-gate/scripts/review-gate` - Exit code 2, JSON output, circuit breaker
|
|
- `skills/review-gate/tests/test-review-gate.sh` - Updated for new behavior (46 tests)
|
|
- `skills/review-gate/tests/test-hook-integration.sh` - New integration test harness
|
|
- `.claude/hooks.json` - Stop hook configuration
|
|
- `.claude-plugin/marketplace.json` - Added review-gate skill
|
|
|
|
** Commands Used
|
|
#+begin_src bash
|
|
# Run unit tests
|
|
./skills/review-gate/tests/test-review-gate.sh
|
|
|
|
# Run integration test (skip live if no credits)
|
|
SKIP_LIVE_TEST=1 ./skills/review-gate/tests/test-hook-integration.sh
|
|
|
|
# Test hook output format
|
|
echo '{"stop_hook_active": true}' | ./skills/review-gate/scripts/review-gate check test-session
|
|
|
|
# Research with orch
|
|
orch consensus "multi-agent patterns..." sonar gemini --serial --websearch
|
|
orch consensus "creative architectures..." flash-or qwen gpt gemini --temperature 1.3 --mode brainstorm
|
|
#+end_src
|
|
|
|
** Architecture Notes
|
|
*** Lego Brick Architecture
|
|
#+begin_src
|
|
.worker-state/
|
|
├── worker-auth.json # status, task, evidence
|
|
├── worker-auth.log # output capture
|
|
├── messages/ # message passing (append-only JSONL)
|
|
│ ├── orchestrator.jsonl
|
|
│ └── worker-auth.jsonl
|
|
└── reviews/
|
|
└── worker-auth.json # review state (pending/approved/rejected)
|
|
#+end_src
|
|
|
|
*** Message Format (Phase 1)
|
|
#+begin_src json
|
|
{"ts": "2026-01-10T12:00:00Z", "from": "worker-auth", "type": "done", "data": {...}}
|
|
#+end_src
|
|
|
|
*** Stop Hook Flow
|
|
1. Claude finishes response → Stop hook fires
|
|
2. review-gate check → pending → exit 2 + JSON block
|
|
3. Claude continues with feedback (attempt 1)
|
|
4. Stop hook fires again (stop_hook_active=true) → still pending → block (attempt 2)
|
|
5. Repeat until approved or circuit breaker trips (attempt 3)
|
|
|
|
* Process and Workflow
|
|
|
|
** What Worked Well
|
|
- orch consensus with multiple models for research (sonar, gemini, gpt, qwen)
|
|
- Web research found key patterns (OpenHands iterative refinement, JWZ architecture)
|
|
- Incremental test development - unit tests first, then integration harness
|
|
- Beads for tracking design decisions and research findings
|
|
- Steve Yegge's Larry Wall critique provided perfect framing ("Lego vs pirate ships")
|
|
|
|
** What Was Challenging
|
|
- Stop hook testing impossible mid-session (hooks load at start)
|
|
- API credit limits blocked live integration test
|
|
- Initial confusion about exit codes (1 vs 2)
|
|
- Gastown documentation is sparse on implementation details
|
|
- JWZ repo is Zig-based, harder to quickly understand
|
|
|
|
* Learning and Insights
|
|
|
|
** Technical Insights
|
|
- Claude Code Stop hooks need very specific format: exit 2 + JSON to stderr with decision/reason
|
|
- stop_hook_active flag indicates continuation - must handle to prevent loops
|
|
- OpenHands uses "iterative refinement": worker → critique → feedback loop
|
|
- JWZ adds SQLite cache over JSONL for queries when files get large
|
|
- Gastown uses git worktrees as "hooks" (different from Claude hooks)
|
|
|
|
** Process Insights
|
|
- Research before building saves rework (we changed exit code after research)
|
|
- "Start simple, grow as needed" applies to message passing complexity
|
|
- Cross-agent compatibility requires lowest-common-denominator interface (files)
|
|
- 80-90% automation is realistic expectation (OpenHands quote)
|
|
|
|
** Architectural Insights
|
|
- External reviewer process (not inline spawning) prevents loops
|
|
- Git branches for code isolation, files for coordination
|
|
- Agents don't talk directly - they coordinate through artifacts
|
|
- Some agents "do", some agents "block" (veto pattern)
|
|
|
|
* Context for Future Work
|
|
|
|
** Open Questions
|
|
- How to handle permission requests from workers? (bubbling up)
|
|
- Rolling branch pattern vs direct merge to main?
|
|
- When does JSONL complexity warrant SQLite cache?
|
|
- How to detect semantic loops (not just repeated commands)?
|
|
|
|
** Next Steps
|
|
- Test Stop hook in fresh session (manual)
|
|
- Implement worker spawn/status primitives (skills-sse)
|
|
- Implement branch-per-worker isolation (skills-roq)
|
|
- Document research findings in docs/design/
|
|
|
|
** Related Work
|
|
- Epic: skills-s6y "Multi-agent orchestration: Lego brick architecture"
|
|
- Existing: review-gate CLI with 46 tests, circuit breaker
|
|
- Research: OpenHands, Gastown, JWZ, MetaGPT patterns
|
|
- Future: Talu (~/proj/talu) for graph-based coordination
|
|
|
|
** External Resources Consulted
|
|
- https://openhands.dev/blog/automating-massive-refactors-with-parallel-agents
|
|
- https://arxiv.org/abs/2511.03690 (OpenHands SDK paper)
|
|
- https://github.com/steveyegge/gastown
|
|
- https://github.com/evil-mind-evil-sword/jwz
|
|
- https://sites.google.com/site/steveyegge2/ancient-languages-perl (Larry Wall critique)
|
|
- https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04
|
|
|
|
* Raw Notes
|
|
** Brainstorm Highlights (from orch consensus)
|
|
- "Budget-Based Sandbox" / Entropy Ledger - action costs
|
|
- "Rubber Duck Interrupt" - force stuck agent to explain in 200 tokens
|
|
- "Abstraction Elevator" - stuck on syntax → force pseudocode
|
|
- "Evidence Artifacts" - structured handoff, not chat transcripts
|
|
- "Role + Veto" - some agents can only block, not do
|
|
- "Capability Provenance Pipeline" (GPT) - most practical composite pattern
|
|
|
|
** Yegge Quote (on Larry Wall)
|
|
"Perl excels at marketing glossy shortcuts for common tasks while failing to provide fundamental compositional primitives. This trades genuine power for 'fast results.'"
|
|
|
|
Applied to Gastown: It's the pirate ship, not the Lego bricks.
|
|
|
|
** OpenHands Key Quote
|
|
"Don't expect 100% automation—tasks are 80-90% automatable. You need a human who understands full context."
|
|
|
|
* Session Metrics
|
|
- Commits made: 7
|
|
- Files touched: 13
|
|
- Lines added/removed: +1224/-2
|
|
- Tests added: 46 (unit) + integration harness
|
|
- Tests passing: 46/46
|