docs: worklog for code-review skill creation and worklog cleanup

This commit is contained in:
dan 2025-12-28 00:06:38 -05:00
parent 4b72e6fc2e
commit fb5e3af8e1

View file

@ -0,0 +1,182 @@
#+TITLE: Code Review Skill Creation and Worklog Cleanup
#+DATE: 2025-12-28
#+KEYWORDS: code-review, skill, worklog, refactoring, orch-consensus, lenses
#+COMMITS: 6
#+COMPRESSION_STATUS: uncompressed
* Session Summary
** Date: 2025-12-28 (Continuation from 2025-12-26 session)
** Focus Area: Creating /code-review skill, cleaning up worklog skill
* Accomplishments
- [X] Ran orch consensus on code-review workflow design (gpt + gemini, qwen was flaky)
- [X] Created /code-review skill based on consensus recommendations
- [X] Closed proto skills-fvc and 7 child tasks (replaced by skill)
- [X] Added code-review to dotfiles claudeCodeSkills deployment
- [X] Added code-review to delbaker .skills manifest
- [X] Holistic review of skills repo (50 open issues, 2 blocked epics)
- [X] Completed all 5 worklog cleanup tasks (127 -> 88 lines, -31%)
- [X] Tested updated extract-metrics.sh script
- [X] Ran code-review on updated worklog skill (clean - no issues worth filing)
- [X] Filed 5 issues in dotfiles from code-review of flake.nix
* Key Decisions
** Decision 1: Skill over Proto for code-review workflow
- Context: Had both lenses (prompts) and a beads proto (skills-fvc) for code review
- Options considered:
1. Keep proto as workflow orchestrator - unused, adds complexity
2. Create Claude Code skill as entrypoint - matches actual usage pattern
3. Ad-hoc documentation only - too loose
- Rationale: Consensus from GPT + Gemini agreed skill is the right abstraction. Proto was never actually used (bd pour/wisp commands).
- Impact: Simpler mental model - /code-review is the entrypoint, lenses are prompts it uses
** Decision 2: Interactive by default for code-review
- Context: How much automation for issue filing?
- Options considered:
1. Full automation - file all findings automatically
2. Interactive - present findings, ask before filing
3. Report only - never file, just output
- Rationale: Both models recommended interactive. Prevents issue spam, keeps human in loop.
- Impact: Skill asks "which findings to file?" after presenting summary
** Decision 3: Consolidate worklog skill aggressively
- Context: 5 cleanup tasks from earlier lens review
- Rationale: Quick wins, reduce maintenance burden, test the lens -> issue -> fix cycle
- Impact: 127 -> 88 lines (-31%), cleaner skill prompt
* Problems & Solutions
| Problem | Solution | Learning |
|---------|----------|----------|
| Orch consensus with qwen hanging | Kill and retry with gpt + gemini only | qwen has reliability issues on long prompts |
| Orch consensus timing out | Run models separately with orch chat, synthesize manually | Parallel queries work, consensus command buffers until all complete |
| Proto tasks polluting other repos | Close proto, use skill instead | molecules.jsonl cross-repo loading needs work (bd-k2wg) |
| extract-metrics.sh not showing branch/status | Added BRANCH and STATUS output to script | Script was metrics-focused, now includes full git context |
| Semantic compression references | Already removed when merging Guidelines/Remember | Sometimes cleanup tasks overlap |
* Technical Details
** Code Changes
- Total files modified: 20
- Key files changed:
- =skills/code-review/SKILL.md= - New skill (120 lines)
- =skills/code-review/README.md= - Skill documentation
- =skills/code-review/lenses/*.md= - Bundled lens prompts
- =skills/worklog/SKILL.md= - Refactored (127 -> 88 lines)
- =skills/worklog/scripts/extract-metrics.sh= - Added branch/status output
- =modules/ai-skills.nix= - Added code-review to skills list
- =~/proj/dotfiles/home/claude.nix= - Added code-review to claudeCodeSkills
- =~/proj/delbaker/.skills= - Added code-review to manifest
** New Files Created
- =skills/code-review/SKILL.md= - Main skill prompt
- =skills/code-review/README.md= - Quick reference
- =skills/code-review/lenses/= - Bundled copies of lens prompts
** Commands Used
#+begin_src bash
# Orch consensus (failed with 3 models)
uv run orch consensus --temperature 1.0 "..." gemini gpt qwen3
# Orch chat (worked for individual models)
uv run orch chat "..." --model gpt --temperature 1.0
uv run orch chat "..." --model gemini --temperature 1.0
# Test updated extract-metrics script
./skills/worklog/scripts/extract-metrics.sh
# Update skills flake in dotfiles
cd ~/proj/dotfiles && nix flake update skills
#+end_src
** Architecture Notes
- Skill deployment: home-manager symlinks skills from nix store to ~/.claude/skills/
- Per-repo skills: .skills manifest + use-skills.sh creates repo-local symlinks
- Lenses bundled in skill but also deployed to ~/.config/lenses/ for direct orch use
- Proto/molecules layer deemed overhead - skill is simpler for this use case
* Process and Workflow
** What Worked Well
- Orch consensus (when it worked) provided useful multi-model perspective
- Quick iteration: create skill -> deploy -> test on real target (dotfiles flake.nix)
- TodoWrite for tracking the 5 worklog tasks
- Beads for tracking issues and closing them as work completed
- Running code-review on recently modified code as validation
** What Was Challenging
- Orch reliability: qwen hanging, consensus command timing out
- Remote git server down throughout session (local commits only)
- Context recovery from previous session compaction
* Learning and Insights
** Technical Insights
- orch chat is more reliable than orch consensus for long prompts
- Skills are the right abstraction for Claude Code workflows - simpler than protos
- Shell script changes need home-manager rebuild to deploy
** Process Insights
- Lens -> issue -> fix cycle works well for incremental cleanup
- Running multiple lenses finds overlapping issues (good for synthesis)
- Interactive review prevents over-filing low-value issues
** Architectural Insights
- Skills repo has 3 layers: skills (prompts), lenses (review prompts), workflows (protos)
- Lenses are a subset of skills conceptually - focused single-purpose prompts
- Proto/molecule layer adds complexity without proportional benefit currently
* Context for Future Work
** Open Questions
- Should lenses output JSON for structured parsing?
- How to handle orch reliability issues (qwen, timeouts)?
- Should code-review skill use orch internally or leave it optional?
** Next Steps
- Run code-review on other skills (niri-window-capture has pending review)
- Consider remaining 2 worklog tasks (j2a done, njb done - actually all done now)
- Address dotfiles issues filed this session (5 issues in flake.nix)
- Rebuild home-manager to deploy updated skills
** Related Work
- [[file:2025-12-26-multi-lens-code-review-workflow-testing.org][2025-12-26 Multi-Lens Code Review Testing]] - Created lenses, tested on orch
- [[file:2025-12-24-adr-revision-lsp-research-code-audit.org][2025-12-24 ADR Revision]] - Initial lens creation
- orch-loq: qwen empty responses bug (filed in orch repo)
- bd-k2wg: molecules.jsonl hierarchical loading (filed in beads repo)
* Raw Notes
- Session started from context recovery (previous session compacted)
- GPT recommendation: skill as entrypoint, orch for synthesis only, JSON output, interactive by default
- Gemini recommendation: consolidate into skills/, single agent explores, orch at end for filtering
- Both agreed: delete proto, make skill, interactive review
- Worklog cleanup tasks all from earlier lens review (2025-12-25)
- extract-metrics.sh output changed from "Session Metrics" to "Git Context"
** Orch Consensus Key Points
From GPT:
- Skill = primary workflow entrypoint
- Orch = synthesis/filtering only, not for running every lens
- JSON source of truth, markdown is rendering
- Repo-local beads storage to avoid cross-repo pollution
From Gemini:
- Rename lenses to skills (we kept them separate)
- Single agent explores, orch filters at end
- "Driver" pattern - human approves before filing
- Delete proto as unused complexity
** Commits This Session
1. feat: add /code-review skill with bundled lenses
2. docs: add code-review to skills list (ai-skills.nix)
3. feat: add code-review skill (dotfiles)
4. chore: update skills flake (dotfiles)
5. refactor(worklog): consolidate skill prompt
6. refactor(worklog): consolidate git commands into script
* Session Metrics
- Commits made: 6 (across skills and dotfiles repos)
- Files touched: 20
- Lines added/removed: +829/-70
- Issues filed: 5 (in dotfiles)
- Issues closed: 8 (proto) + 5 (worklog) = 13
- Tests added: 0