From 9efc9aaa9fafd00fb626b2e3cf4b852485c4c59a Mon Sep 17 00:00:00 2001 From: dan Date: Fri, 23 Jan 2026 10:14:16 -0800 Subject: [PATCH] docs: add worklog for ralph-wiggum iteration counter fix --- .beads/issues.jsonl | 3 +- ...1-23-ralph-wiggum-iteration-counter-fix.md | 147 ++++++++++++++++++ 2 files changed, 149 insertions(+), 1 deletion(-) create mode 100644 docs/worklogs/2026-01-23-ralph-wiggum-iteration-counter-fix.md diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 5fbe891..4ae431b 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -133,6 +133,7 @@ {"id":"skills-al5","title":"Consider repo-setup-verification skill","description":"The dotfiles repo has a repo-setup-prompt.md verification checklist that could become a skill.\n\n**Source**: ~/proj/dotfiles/docs/repo-setup-prompt.md\n\n**What it does**:\n- Verifies .envrc has use_api_keys and skills loading\n- Checks .skills manifest exists with appropriate skills\n- Optionally checks beads setup\n- Verifies API keys are loaded\n\n**As a skill it could**:\n- Be invoked to audit any repo's agent setup\n- Offer to fix missing pieces\n- Provide consistent onboarding for new repos\n\n**Questions**:\n- Is this better as a skill vs a slash command?\n- Should it auto-fix or just report?\n- Does it belong in skills repo or dotfiles?","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T12:38:32.561337354-08:00","updated_at":"2025-12-28T22:22:57.639520516-05:00","closed_at":"2025-12-28T22:22:57.639520516-05:00","close_reason":"Decided: keep as prompt doc in dotfiles, not a skill. Claude can read it when asked. No wrapper benefit, and it's dotfiles-specific setup (not general skill). ai-tools-doctor handles version checking separately."} {"id":"skills-ankb","title":"Define Intent/Approach/Work workflow","description":"Document the three-phase workflow for structured beads.\n\n## Phases\n\n### Intent (what)\n- **Problem statement** - what's broken or missing?\n- **Context** - what code/docs were read? Include file hashes/versions if long-running\n- **What could go wrong?** - edge cases, failure modes (AI is happy-path oriented)\n- **Proposed solution** - high-level, not technical\n- **Constraints** - requirements, limits, must-haves\n\n### Approach (how)\n- **Technical approach** - how we'll solve it\n- **Interface contracts** - define types/signatures BEFORE logic\n- **Rejected alternatives** - what we considered and why not (prevents AI defaulting to generic)\n- **Verification plan** - how will we prove success? Prefer automated tests\n- **Side effects** - security, performance, auth implications\n- **Rollback plan** - how to undo if it fails in production\n- **Non-goals** - what we will NOT touch\n- **Files to change**\n- Dependencies and risks\n\n### Work (do)\n- **Pre-flight check** - \"I have all info needed. Missing: [X]\"\n- Concrete steps as checkboxes\n- Each item small and testable\n- First step often: write the failing test (red test)\n- Commit after 1-2 checkboxes (atomic commits)\n- Check off as completed\n\n### Review (verify)\n- Human reviews diff against Intent\n- Raw test/compiler output required (no \"trust me it works\")\n- Definition of Done checklist:\n - [ ] Meets Intent?\n - [ ] Follows Approach?\n - [ ] Verification passed?\n - [ ] Scaffolding removed? (debug prints, commented code, unused imports)\n - [ ] Linter passes?\n - [ ] Docs updated if needed?\n\n---\n\n## Workflow Mechanics\n\n### Human Gates\nHuman-in-the-loop at every phase transition:\n```\nIntent → [approve] → Approach → [approve] → Work → [execute] → Review → [done]\n```\n\n**Anti-rubber-stamp**: Don't just click approve. Add a critique, constraint, or question to prove engagement. If you can't find anything, you didn't read it.\n\n### Context Anchoring\nDuring Work phase, re-inject Intent + Approach summary as system prompt. Don't rely on chat history alone - AI will drift by step 5.\n\n### Pivot Protocol\nWhen Work reveals Approach was wrong:\n1. **Stop** - don't hack around it\n2. **Diagnose** - AI summarizes WHY it failed\n3. **Learn** - failure reason becomes negative constraint in Intent (\"Do NOT use X because Y\")\n4. **Revert** - return to Approach phase\n5. **Revise** - update Approach with new constraint\n\nIf changing \u003e2 lines of Approach to make Work succeed → trigger Pivot.\n\n### Complexity Promotion\nWhen a Work item grows complex mid-implementation:\n- **Promote it** to its own bead with Intent/Approach/Work\n- Original checkbox becomes reference to new bead\n- **Depth limit**: Max 2 levels of nesting. Deeper = human intervention required.\n\n### When to Use Structure\n**Rule of Three** (+ security):\n- Affects \u003e3 files\n- Introduces \u003e1 new dependency\n- Changes existing interface/API\n- **Involves security, auth, or data persistence** (even if 1 file)\n- Requires reasoning, not just recall\n\nIf any true → use full structure.\n\n---\n\n## Example: Human Gate in Action\n\n### Scene: AI proposes Intent\n\n**AI**: \n```\n## Intent\n**Problem**: Users report slow dashboard load times\n**Context**: Read src/pages/Dashboard.tsx, src/api/metrics.ts\n**Solution**: Add caching to metrics API\n**Constraints**: Must invalidate on data change\n```\n\n**Human (anti-rubber-stamp)**:\n\u003e \"What's the current load time? What's our target? Also, did you check if the bottleneck is actually the API vs rendering?\"\n\n**AI revises Intent**:\n```\n**Problem**: Dashboard loads in 4.2s, target is \u003c1s. Profiling shows 3.8s spent in /api/metrics\n**Context**: Read src/pages/Dashboard.tsx, src/api/metrics.ts, ran Chrome DevTools profile\n...\n```\n\n**Human**: \"Approved. Proceed to Approach.\"\n\n---\n\n## Example: Pivot Protocol in Action\n\n### Scene: Work reveals Approach was wrong\n\n**Approach said**: Use Redis INCR for rate limiting\n**Work step 3**: Implement sliding window\n\n**AI hits blocker**:\n\u003e \"Redis INCR doesn't support sliding window natively. The MULTI/EXEC approach adds 15ms latency, violating our 5ms constraint.\"\n\n**Pivot triggered**:\n\n1. **Stop**: AI halts, doesn't try to hack around it\n2. **Diagnose**: \"Redis INCR is fixed-window only. Sliding window requires Lua script or sorted sets, both add latency.\"\n3. **Learn**: Add to Intent constraints: \"Do NOT use Redis INCR - doesn't support sliding window within latency budget\"\n4. **Revert**: Return to Approach\n5. **Revise**: New Approach uses Redis sorted sets with pipeline, or considers in-memory with sync\n\n---\n\n## Example: Complexity Promotion\n\n### Scene: Work item grows complex\n\n**Original Work list**:\n```\n- [ ] Add user authentication\n- [ ] Add rate limiting \n- [ ] Add logging\n```\n\n**During \"Add user authentication\"**:\n\u003e This needs JWT setup, session management, password hashing, login/logout endpoints, middleware, and tests. Too big for a checkbox.\n\n**Promote**:\n```\n- [ ] Add user authentication → See bead skills-xyz\n- [ ] Add rate limiting\n- [ ] Add logging\n```\n\n**New bead skills-xyz** gets full Intent/Approach/Work treatment.\n\n---\n\n## Meta-Insight\n\n\u003e \"This framework is a Context Compression algorithm. Intent compresses the Past, Approach compresses the Future, Work is the Decompression.\"\n\nThe critical success factor is **strictness of Human Gates**. Rigorous Approach review = magic Work phase. Rubber-stamp = hallucination engine.\n\n---\n\n## Deliverable\n- Document phase definitions ✓\n- Human gate requirements ✓\n- Anti-rubber-stamp guidance ✓\n- Context anchoring ✓\n- Pivot protocol with diagnose/learn ✓\n- Complexity promotion with depth limit ✓\n- Threshold heuristics ✓\n- Examples ✓","status":"closed","priority":2,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-18T08:13:57.556869846-08:00","created_by":"dan","updated_at":"2026-01-18T13:22:26.42620096-08:00","closed_at":"2026-01-18T13:22:26.42620096-08:00","close_reason":"Complete with phases, mechanics, templates, and examples","dependencies":[{"issue_id":"skills-ankb","depends_on_id":"skills-oh8m","type":"blocks","created_at":"2026-01-18T08:14:32.423457925-08:00","created_by":"dan"},{"issue_id":"skills-ankb","depends_on_id":"skills-ya44","type":"blocks","created_at":"2026-01-18T08:14:44.502821773-08:00","created_by":"dan"}]} {"id":"skills-audh","title":"Use parseEnum for heartbeat status instead of case statement","description":"[SMELL] LOW worker.nim:276-280 - Status string parsed with case statement with silent fallback. Use parseEnum or direct HeartbeatStatus input, error on invalid.","status":"closed","priority":4,"issue_type":"task","created_at":"2026-01-10T20:12:11.408603257-08:00","created_by":"dan","updated_at":"2026-01-11T15:46:39.025667838-08:00","closed_at":"2026-01-11T15:46:39.025667838-08:00","close_reason":"Closed"} +{"id":"skills-b6ww","title":"Add pi extension for commit hygiene and session cleanup","description":"## Problem\n\nEnding a session with uncommitted changes scattered across multiple repos is a recurring issue. Today's session example:\n\n- dotfiles: 48 uncommitted files, mixed concerns (modules, skills, scripts, worklogs)\n- skills: ~30 uncommitted files from multiple sessions (Jan 21 + Jan 22)\n- Changes from different logical units mixed together\n- Temp files, .bak files, node_modules, __pycache__ in untracked\n\nWe only caught this at the end when manually running `git status`. By then, the context of what-goes-with-what is fading.\n\n## Desired Behavior\n\n### During Session\n- Periodic nudge: \"You have N uncommitted changes across M repos\"\n- After significant milestones: \"Good stopping point - want to commit?\"\n- Awareness of dirty state in footer/status\n\n### End of Session\n- Before exit: inventory of uncommitted work\n- Guided commit flow: group changes by logical unit\n- Cleanup prompt: temp files, .bak, caches\n- Option to stash with descriptive message if not ready to commit\n\n### Commit Assistance\n- Suggest logical groupings based on file paths/types\n- Draft commit messages from recent conversation context\n- Warn about mixing unrelated changes\n- Check for secrets/large files before commit\n\n## Prior Art\n\nFrom today's pi extension research:\n\n- **git-checkpoint.ts** (official example): Creates stash checkpoints each turn, offers restore on fork\n- **dirty-repo-guard.ts** (official example): Warns if starting with uncommitted changes\n- **auto-commit-on-exit.ts** (official example): Commits on session end\n\nThese are pieces of the puzzle but not a complete solution.\n\n## Proposed Extension: session-hygiene\n\n```typescript\n// Footer widget showing dirty state\nctx.ui.setWidget(\"git-status\", [\"3 repos dirty: dotfiles(12) skills(8) talu(2)\"], { placement: \"belowEditor\" });\n\n// Periodic check (every N turns or on idle)\npi.on(\"turn_end\", async (event, ctx) =\u003e {\n const dirty = await checkDirtyRepos();\n if (dirty.totalFiles \u003e 20) {\n ctx.ui.notify(`${dirty.totalFiles} uncommitted files across ${dirty.repos.length} repos`, \"warning\");\n }\n});\n\n// End of session\npi.on(\"session_shutdown\", async (event, ctx) =\u003e {\n const dirty = await checkDirtyRepos();\n if (dirty.totalFiles \u003e 0) {\n const action = await ctx.ui.select(\"Uncommitted changes detected\", [\n \"Review and commit\",\n \"Stash all with message\",\n \"Exit anyway\"\n ]);\n // ...\n }\n});\n\n// Command for manual trigger\npi.registerCommand(\"commit\", {\n description: \"Guided commit flow for dirty repos\",\n handler: async (args, ctx) =\u003e {\n // Show dirty repos\n // Suggest logical groupings\n // Draft commit messages\n // Execute commits\n }\n});\n```\n\n## Grouping Heuristics\n\nSuggest groupings based on:\n- File path patterns (modules/, skills/, docs/)\n- File types (.nix, .md, .ts)\n- Modification time proximity\n- Conversation context (\"we were working on X\")\n\n## Multi-Repo Awareness\n\nNeed to track multiple repos:\n- Current working directory repo\n- Known project repos (dotfiles, skills, talu, etc.)\n- Recently touched repos this session\n\n## Cleanup Patterns\n\nAuto-suggest cleanup for:\n- `*.bak` files\n- `__pycache__/`\n- `node_modules/` in skill directories (should be in .gitignore)\n- Temp files in /tmp patterns\n- `.swp` files\n- Empty directories\n\n## Open Questions\n\n1. How to track \"recently touched repos\" across session?\n2. Should commits be automatic or always prompted?\n3. How aggressive should cleanup suggestions be?\n4. Integration with ralph loops (commit after each iteration?)\n\n## Success Criteria\n\n- [ ] Never end session with 48 uncommitted files unnoticed\n- [ ] Logical commit groupings suggested correctly 80%+ of time\n- [ ] No accidental commits of secrets or large files\n- [ ] Cleanup removes obvious temp files without data loss\n- [ ] Works across multiple repos in a session","status":"open","priority":2,"issue_type":"feature","owner":"dan@delpad","created_at":"2026-01-23T00:34:01.115488482-08:00","created_by":"dan","updated_at":"2026-01-23T00:34:01.115488482-08:00","labels":["hygiene","pi-extension","workflow"]} {"id":"skills-bcu","title":"Design doc-review skill","description":"# doc-review skill\n\nFight documentation drift with a non-interactive review process that generates patchfiles for human review.\n\n## Problem\n- No consistent documentation system across repos\n- Stale content accumulates\n- Structural inconsistencies (docs not optimized for agents)\n\n## Envisioned Workflow\n\n```bash\n# Phase 1: Generate patches (non-interactive, use spare credits, test models)\ndoc-review scan ~/proj/foo --model claude-sonnet --output /tmp/foo-patches/\n\n# Phase 2: Review patches (interactive session)\ncd ~/proj/foo\nclaude # human reviews patches, applies selectively\n```\n\n## Design Decisions Made\n\n- **Trigger**: Manual invocation (not CI). Use case includes burning extra LLM credits, testing models repeatably.\n- **Source of truth**: Style guide embedded in prompt template. Blessed defaults, overridable per-repo.\n- **Output**: Patchfiles for human review in interactive Claude session.\n- **Chunking**: Based on absolute size, not file count. Logical chunks easy for Claude to review.\n- **Scope detection**: Graph-based discovery starting from README.md or AGENTS.md, not glob-all-markdown.\n\n## Open Design Work\n\n### Agent-Friendly Doc Conventions (needs brainstorming)\nWhat makes docs agent-readable?\n- Explicit context (no \"as mentioned above\")\n- Clear section headers for navigation\n- Self-contained sections\n- Consistent terminology\n- Front-loaded summaries\n- ???\n\n### Prompt Content\nFull design round needed on:\n- What conventions to enforce\n- How to express them in prompt\n- Examples of \"good\" vs \"bad\"\n\n### Graph-Based Discovery\nHow does traversal work?\n- Parse links from README/AGENTS.md?\n- Follow relative markdown links?\n- Depth limit?\n\n## Skill Structure (tentative)\n```\nskills/doc-review/\n├── prompt.md # Core review instructions + style guide\n├── scan.sh # Orchestrates: find docs → invoke claude → emit patches\n└── README.md\n```\n\n## Out of Scope (for now)\n- Cross-repo standardization (broader than skills repo)\n- CI integration\n- Auto-apply without human review","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-04T14:01:43.305653729-08:00","updated_at":"2025-12-04T16:44:03.468118288-08:00","closed_at":"2025-12-04T16:44:03.468118288-08:00","dependencies":[{"issue_id":"skills-bcu","depends_on_id":"skills-1ig","type":"blocks","created_at":"2025-12-04T14:02:17.144414636-08:00","created_by":"daemon","metadata":"{}"},{"issue_id":"skills-bcu","depends_on_id":"skills-53k","type":"blocks","created_at":"2025-12-04T14:02:17.164968463-08:00","created_by":"daemon","metadata":"{}"}]} {"id":"skills-be3","title":"Define trace security and redaction policy","description":"Wisps will leak secrets without explicit policy.\n\nRequired:\n- Default-deny for env vars (allowlist: PROJECT, USER, etc.)\n- Redaction rules for sensitive fields\n- No file contents by default\n- Classification field: internal|secret|public\n\nImplementation:\n- redact: [\"env.AWS_SECRET_ACCESS_KEY\", \"inputs.token\"]\n- Sanitization before writing to disk\n- Block elevation if classification=secret\n\nFrom consensus: both models flagged as medium-high severity.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T19:49:31.041661947-05:00","updated_at":"2025-12-23T20:55:04.446363188-05:00","closed_at":"2025-12-23T20:55:04.446363188-05:00","close_reason":"ADRs revised with orch consensus feedback"} {"id":"skills-bhto","title":"worker CLI: spawn default branch crash on non-integration repo","status":"open","priority":3,"issue_type":"bug","owner":"dan@delpad","created_at":"2026-01-12T21:03:54.543321889-08:00","created_by":"dan","updated_at":"2026-01-12T21:03:54.543321889-08:00"} @@ -269,7 +270,7 @@ {"id":"skills-s5xl","title":"Add brave-search skill","description":"Create brave-search skill using Brave Search API as fallback to Claude web tools. Include scripts for search and content extraction, SKILL.md, README, and npm dependencies. Document setup (BRAVE_API_KEY + npm install).","status":"open","priority":2,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-20T22:02:57.319575306-08:00","created_by":"dan","updated_at":"2026-01-20T22:02:57.319575306-08:00"} {"id":"skills-s6y","title":"Multi-agent orchestration: Lego brick architecture","description":"Multi-agent orchestration: Lego brick architecture\n\nCoordinate 2-4 AI coding agents with human oversight.\n\nLanguage: Nim (ORC, cligen, tiny_sqlite) - see skills-q40\n\nCore components:\n- Worker state machine (skills-4oj)\n- Message passing layer (skills-ms5) \n- Branch-per-worker isolation (skills-roq)\n- Worker CLI primitives (skills-sse)\n- Human observability (skills-yak)\n- Review-gate integration (skills-byq)\n\nDesign docs: docs/design/\n- mvp-scope.md (v3)\n- message-passing-layer.md (v4)\n- worker-cli-primitives.md (v3)\n- worker-state-machine.md\n- branch-per-worker.md\n- human-observability.md","status":"closed","priority":1,"issue_type":"epic","created_at":"2026-01-10T12:14:16.141746066-08:00","created_by":"dan","updated_at":"2026-01-11T21:23:19.461560217-08:00","closed_at":"2026-01-11T21:23:19.461560217-08:00","close_reason":"MVP complete: worker CLI, state machine, review-gate, branch isolation all implemented. Architecture validated by orch consensus. Unblocking design/research tasks."} {"id":"skills-s92","title":"Add tests for config injection (deploy-skill.sh)","description":"File: bin/deploy-skill.sh (lines 112-137)\n\nCritical logic with NO test coverage:\n- Idempotency (running twice should be safe)\n- Correct brace matching in Nix\n- Syntax validity of injected config\n- Rollback on failure\n\nRisk: MEDIUM-HIGH - can break dotfiles Nix config\n\nFix:\n- Test idempotent injection\n- Validate Nix syntax after injection\n- Test with malformed input\n\nSeverity: MEDIUM","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-24T02:51:01.314513824-05:00","updated_at":"2026-01-06T16:29:18.728097676-08:00","closed_at":"2026-01-06T16:29:18.728097676-08:00","close_reason":"21 tests added covering idempotency, brace preservation, inject_home_file wrapper, edge cases"} -{"id":"skills-s984","title":"ralph-wiggum: iteration counter stuck at 1 due to hasPendingMessages guard","description":"The ralph_done tool's hasPendingMessages() check always triggers during normal agent operation, preventing iteration increment. See docs/work/2026-01-22-ralph-iteration-counter-bug.md for full analysis and proposed fixes.","status":"open","priority":2,"issue_type":"bug","owner":"dan@delpad","created_at":"2026-01-22T14:59:30.935988947-08:00","created_by":"dan","updated_at":"2026-01-22T14:59:30.935988947-08:00"} +{"id":"skills-s984","title":"ralph-wiggum: iteration counter stuck at 1 due to hasPendingMessages guard","description":"The ralph_done tool's hasPendingMessages() check always triggers during normal agent operation, preventing iteration increment. See docs/work/2026-01-22-ralph-iteration-counter-bug.md for full analysis and proposed fixes.","status":"closed","priority":2,"issue_type":"bug","owner":"dan@delpad","created_at":"2026-01-22T14:59:30.935988947-08:00","created_by":"dan","updated_at":"2026-01-23T09:59:52.406429535-08:00","closed_at":"2026-01-23T09:59:52.406429535-08:00","close_reason":"Fixed: iteration now increments before pending-messages check. If pending messages exist, state is saved but prompt delivery is deferred."} {"id":"skills-sh6","title":"Research: OpenHands iterative refinement pattern","description":"Document OpenHands SDK patterns for our architecture.\n\n## Iterative Refinement Loop\n1. Worker agent does work\n2. Critique agent evaluates (correctness, quality, completeness)\n3. If not good → worker tries again with feedback\n4. Repeat until standard met\n\n## Parallel Agent Orchestration\n- Git-based coordination (not direct communication)\n- Each agent works on own branch\n- PRs to intermediate 'rolling branch'\n- Human reviews and merges\n- Agents pull latest, handle conflicts\n\n## Key Quote\n'Don't expect 100% automation—tasks are 80-90% automatable.\nYou need a human who understands full context.'\n\n## Mapping to Our Architecture\n- Worker = their refactoring agent\n- Reviewer = their critique agent\n- review-gate = their quality threshold\n- Human orchestrator = their human on rolling branch\n\n## Sources\n- https://openhands.dev/blog/automating-massive-refactors-with-parallel-agents\n- https://arxiv.org/abs/2511.03690\n- https://docs.openhands.dev/sdk","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-10T12:24:02.368542878-08:00","created_by":"dan","updated_at":"2026-01-15T20:38:14.3084623-08:00","closed_at":"2026-01-15T20:38:14.3084623-08:00","close_reason":"Research captured in issue description. Mapping to our architecture documented.","dependencies":[{"issue_id":"skills-sh6","depends_on_id":"skills-s6y","type":"blocks","created_at":"2026-01-10T12:24:07.013388857-08:00","created_by":"dan"}]} {"id":"skills-sisi","title":"Extract MaxSummaryLen constant for description truncation","description":"[SMELL] LOW worker.nim:101 - Description truncated at magic number 30. Extract: const MaxSummaryLen = 30","status":"open","priority":4,"issue_type":"task","created_at":"2026-01-10T20:12:11.153123047-08:00","created_by":"dan","updated_at":"2026-01-10T20:12:11.153123047-08:00"} {"id":"skills-sse","title":"Design: worker spawn/status primitives","description":"Design: worker spawn/status primitives\n\nImplementation: Nim (cligen, tiny_sqlite)\nDesign doc: docs/design/worker-cli-primitives.md (v3)\n\nCommands:\n- worker spawn \u003ctask-id\u003e - Create workspace\n- worker status [--watch] - Dashboard\n- worker start/done - Agent signals\n- worker approve/request-changes - Review\n- worker merge - Complete cycle\n- worker cancel - Abort\n\nSee: skills-q40 for language decision","design":"docs/design/worker-cli-primitives.md","notes":"Design complete. Consensus from 4 models (gemini, gpt, qwen, sonar): (1) spawn prepares workspace only, doesn't start agent, (2) Python CLI, (3) all commands idempotent, (4) Worker ID = Task ID, (5) SQLite as state truth. Commands: spawn/status/merge (human), start/done/heartbeat (agent). Local .worker-ctx.json for context discovery. Hybrid approach for heartbeats.","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-10T12:14:33.115131833-08:00","created_by":"dan","updated_at":"2026-01-10T21:29:25.69091989-08:00","closed_at":"2026-01-10T21:29:25.69091989-08:00","close_reason":"Implemented in worker CLI - spawn, status, state machine, branch isolation all working","dependencies":[{"issue_id":"skills-sse","depends_on_id":"skills-s6y","type":"blocks","created_at":"2026-01-10T12:15:10.014285119-08:00","created_by":"dan"}]} diff --git a/docs/worklogs/2026-01-23-ralph-wiggum-iteration-counter-fix.md b/docs/worklogs/2026-01-23-ralph-wiggum-iteration-counter-fix.md new file mode 100644 index 0000000..0cec2b0 --- /dev/null +++ b/docs/worklogs/2026-01-23-ralph-wiggum-iteration-counter-fix.md @@ -0,0 +1,147 @@ +--- +title: "Fix ralph-wiggum iteration counter stuck at 1" +date: 2026-01-23 +keywords: [ralph-wiggum, pi-extension, iteration-counter, bug-fix, code-review] +commits: 0 +compression_status: uncompressed +--- + +# Session Summary + +**Date:** 2026-01-23 +**Focus Area:** Fixing ralph-wiggum extension iteration counter bug + +# Accomplishments + +- [x] Diagnosed root cause of iteration counter staying stuck at 1 +- [x] Fixed `ralph_done` tool to increment iteration before `hasPendingMessages()` check +- [x] Applied code review findings (4 issues fixed) +- [x] Closed issue skills-s984 + +# Key Decisions + +## Decision 1: Always increment iteration, defer prompt delivery + +- **Context:** The `hasPendingMessages()` guard was blocking iteration increment entirely, causing counter to stay at 1 +- **Options considered:** + 1. Remove the guard entirely - Risk of duplicate prompts + 2. Increment first, only skip prompt delivery if pending - Counter accurate, prompts still controlled + 3. Check for pending USER messages only - Would require API change + 4. Use re-entry flag - More complex, doesn't solve root cause +- **Rationale:** Option 2 keeps counter accurate while still preventing duplicate prompt delivery +- **Impact:** Iteration counter now reflects actual progress; UI stays accurate; reflection checkpoints work + +## Decision 2: Move reflection calculation before pending check + +- **Context:** If reflection was due but pending messages blocked, reflection was skipped entirely (not deferred) +- **Rationale:** Reflection state should be recorded regardless of prompt delivery +- **Impact:** `lastReflectionAt` is now set even when prompt is deferred; message indicates "(reflection pending)" + +# Problems & Solutions + +| Problem | Solution | Learning | +|---------|----------|----------| +| `hasPendingMessages()` always true during normal operation | Move increment before the check, save state in both branches | Guards can block more than intended; order matters | +| Reflection tracking skipped when prompt deferred | Calculate `needsReflection` before pending check, update `lastReflectionAt` in both paths | Side effects should happen before conditional returns | +| Error message "not active" was ambiguous | Split into two checks: null state vs wrong status, with specific messages | Better error messages help debugging | +| Magic expression `state.iteration - 1` repeated | Extract to `const completedIteration` | DRY applies to expressions too | + +# Technical Details + +## Code Changes + +- Total files modified: 1 +- Key files changed: + - `~/.pi/agent/extensions/ralph-wiggum/index.ts` - Fixed `ralph_done` tool execute function + +## Before (broken) + +```typescript +if (ctx.hasPendingMessages()) { + return { content: [{ type: "text", text: "Pending messages already queued. Skipping ralph_done." }], details: {} }; +} + +// Increment iteration - NEVER REACHED when pending messages exist +state.iteration++; +``` + +## After (fixed) + +```typescript +// Always increment iteration to keep counter accurate +state.iteration++; +const completedIteration = state.iteration - 1; + +// Check max iterations before anything else +if (state.maxIterations > 0 && state.iteration > state.maxIterations) { + completeLoop(ctx, state, ...); + return { ... }; +} + +// Calculate reflection status before pending check (so we don't skip it) +const needsReflection = state.reflectEvery > 0 && completedIteration % state.reflectEvery === 0; +if (needsReflection) state.lastReflectionAt = state.iteration; + +// If there are pending messages, record progress but skip prompt delivery +if (ctx.hasPendingMessages()) { + saveState(ctx, state); + updateUI(ctx); + const reflectNote = needsReflection ? " (reflection pending)" : ""; + return { + content: [{ type: "text", text: `Iteration ${completedIteration} complete.${reflectNote} Prompt deferred - pending messages.` }], + details: {}, + }; +} +``` + +## Code Review Findings Fixed + +1. **[ERROR] MED** - Split null/inactive state checks with specific error messages +2. **[SMELL] LOW** - Extracted `completedIteration` constant +3. **[SMELL] LOW** - Standardized message format ("complete" in both paths) +4. **[STRUCTURAL]** - Moved reflection calculation before pending-messages check + +# Process and Workflow + +## What Worked Well + +- Issue had detailed bug report with root cause analysis already done (docs/work/2026-01-22-ralph-iteration-counter-bug.md) +- Code review lenses caught additional issues beyond the primary fix +- Fixing all issues in one pass + +## What Was Challenging + +- Extension not in git repo (deployed separately), couldn't easily diff changes +- No TypeScript compiler readily available to validate syntax + +# Learning and Insights + +## Technical Insights + +- `ctx.hasPendingMessages()` returns true whenever tool responses are queued - essentially always during multi-tool agent operation +- Guards that block early can prevent important state updates from happening +- The order of operations matters: increment → check limits → calculate side effects → check guards → deliver + +## Architectural Insights + +- The "always save state, optionally defer prompt" pattern is cleaner than "guard blocks everything" +- Error messages should distinguish between different failure modes + +# Context for Future Work + +## Open Questions + +- Should there be a mechanism to force prompt delivery even with pending messages? +- Should deferred reflections trigger on the next non-deferred iteration? + +## Next Steps + +- Test the fix in a real ralph loop session +- Consider adding this pattern to pi extension documentation + +# Session Metrics + +- Commits made: 0 (not yet committed) +- Files touched: 1 +- Lines added/removed: ~30/~20 +- Issues closed: 1 (skills-s984)