skills/docs/worklogs/2026-01-01-ops-review-skill-design-and-skeleton.org

#+TITLE: ops-review Skill Design, Orch Consensus Planning, and Skeleton Implementation
#+DATE: 2026-01-01
#+KEYWORDS: ops-review, skill-design, orch-consensus, lenses, infrastructure-review, nix, shell-safety, secrets
#+COMMITS: 0 (uncommitted work in progress)
#+COMPRESSION_STATUS: uncompressed

* Session Summary
** Date: 2026-01-01
** Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis

* Accomplishments
- [X] Explored dotfiles and prox-setup repos to understand actual ops artifact landscape
- [X] Designed ops-review skill with 10 lenses across 3 phases
- [X] Ran orch consensus (sonar, flash-or, gemini, gpt) on initial plan
- [X] Incorporated consensus feedback: linter-first hybrid architecture, crisp lens boundaries
- [X] Created comprehensive plan.md in specs/ops-review/
- [X] Created bd epic (skills-9cu) with 14 child tasks, proper dependency graph
- [X] Built skill skeleton: SKILL.md, README.md, lenses/README.md
- [X] Drafted secrets.md lens with orch consensus review
- [X] Incorporated Nix store exposure, Docker layer persistence, CI masking feedback
- [X] Filed follow-up issue in dotfiles for gitleaks availability (dotfiles-x2m)
- [ ] Remaining Phase 1 lenses: shell-safety, blast-radius, privilege

* Key Decisions
** Decision 1: Linter-first hybrid architecture
- Context: How should ops-review analyze infrastructure code?
- Options considered:
  1. Pure LLM analysis - flexible but prone to syntax hallucinations
  2. Pure linter - deterministic but misses semantic issues
  3. Hybrid: linters first, LLM interprets - best of both
- Rationale: All 4 consensus models agreed LLMs hallucinate syntax but excel at understanding intent. Static tools catch syntax, LLM finds logic bugs.
- Impact: Each lens integrates with specific tools (shellcheck, statix, gitleaks)

** Decision 2: 10 lenses across 3 phases
- Context: How many lenses and how to prioritize?
- Initial proposal: 8 lenses
- Consensus feedback: Add privilege (least-privilege) and supply-chain (pinning)
- Phase 1 (quick mode): secrets, shell-safety, blast-radius, privilege
- Phase 2: idempotency, supply-chain, observability
- Phase 3: nix-hygiene, resilience, orchestration

** Decision 3: Crisp lens boundaries to avoid duplicate findings
- Problem: resilience/blast-radius/idempotency overlap
- Solution: Define ownership table
  - idempotency: safe re-run, convergence, atomic writes
  - resilience: runtime fault tolerance, timeouts, retries
  - blast-radius: change safety, dry-run, rollback

** Decision 4: Nix-specific checks as first-class concerns
- Context: Nix has unique security model (world-readable store)
- Insight from consensus: Secrets in .nix strings become readable in /nix/store
- Added to secrets lens: explicit Nix store exposure check
- Remediation: sops-nix/agenix with runtime paths, not embedded strings

* Problems & Solutions
| Problem | Solution | Learning |
|---------+----------+----------|
| Initial lens drafts too long (60+ lines) | Reference existing code-review lenses (~45 lines) | Consistent format matters for usability |
| Overlapping lens scopes | Created "Crisp Boundaries" table in plan | Define ownership explicitly upfront |
| What lenses are actually needed? | Explored real repos (dotfiles, prox-setup) | Ground design in actual artifacts |
| False positive risk in secrets lens | Added explicit exemptions (Nix hashes, public keys) | Two-signal rule for generic matches |

* Technical Details

** Code Changes
- Total files created: 5
- Key files:
  - `specs/ops-review/plan.md` (261 lines) - Comprehensive design document
  - `skills/ops-review/SKILL.md` (188 lines) - Agent workflow instructions
  - `skills/ops-review/README.md` (96 lines) - User documentation
  - `skills/ops-review/lenses/README.md` (85 lines) - Lens index
  - `skills/ops-review/lenses/secrets.md` (53 lines) - First lens

** Commands Used
#+begin_src bash
# Explored actual infrastructure repos
# (via Task tool with Explore subagent)

# Ran multi-model consensus for plan review
uv run orch consensus "Review this ops-review skill design..." sonar flash-or gemini gpt

# Created bd epic with hierarchical children
bd create "ops-review skill" --type=epic -p 1 --description "..."
bd create "Lens: secrets" --parent skills-9cu -p 1 --deps skills-9cu.1

# Visualized dependency graph
bd graph skills-9cu

# Checked available work
bd ready
#+end_src

** Architecture Notes
- Skill follows code-review pattern: lenses as focused prompts
- Lenses deploy to ~/.config/lenses/ops/ via home-manager
- Quick mode (--quick) runs Phase 1 only for CI/pre-commit
- Cross-file awareness via grep-based reference mapping (source, imports)

* Process and Workflow

** What Worked Well
- Exploring real repos first grounded the design in actual needs
- orch consensus with 4 models surfaced gaps (Nix store, Docker layers)
- bd epic with --parent creates clean hierarchical structure
- Dependency graph visualization helped verify task ordering

** What Was Challenging
- Balancing lens completeness with ~45 line target format
- Deciding which checks are linter-backed vs LLM-primary
- Managing context across long design session

* Learning and Insights

** Technical Insights
- Nix store world-readability is a critical security consideration
- Docker ENV/ARG persist in image layers even if later deleted
- CI masking (::add-mask::) is often overlooked
- shellcheck, statix, gitleaks provide structured JSON output for integration

** Process Insights
- orch consensus is valuable for pressure-testing designs
- High temp for brainstorming, low temp for analysis decisions
- bd hierarchical children (.1, .2, etc.) work well for epic breakdown

** Architectural Insights
- Linter-first hybrid is emerging pattern (doc-review also uses this)
- Lens boundaries must be explicit to avoid duplicate findings
- Platform-specific remediation matters (sops-nix vs BuildKit secrets)

* Context for Future Work

** Open Questions
- Should ops-review have its own lens directory or share with code-review?
- How to handle cross-repo awareness (dotfiles uses sops, prox-setup uses passage)?
- Should we run linters in parallel before LLM pass?

** Next Steps
- Complete Phase 1 lenses: shell-safety, blast-radius, privilege
- Integration: add to flake.nix, update ai-skills.nix
- Validation: test on dotfiles and prox-setup repos
- Ensure gitleaks available (dotfiles-x2m)

** Related Work
- [[file:2025-12-28-code-review-skill-creation-worklog-cleanup.org][Code Review Skill Creation]] - Original lens pattern
- [[file:2025-12-04-doc-review-skill-design.org][Doc-Review Skill Design]] - Hybrid architecture precedent
- [[file:2025-12-26-multi-lens-code-review-workflow-testing.org][Multi-Lens Code Review Testing]] - LLM-in-the-loop pattern

* Raw Notes
- Dotfiles repo: 100+ Nix modules, 90+ shell scripts, SOPS secrets, Gitea Actions
- Prox-setup repo: 88 Python scripts (Proxmox API), 41 shell scripts, Docker Compose
- Models consulted: sonar, flash-or, gemini, gpt (all 4 supported the design)
- Key insight from GPT: "Require two signals for MED/HIGH when not using known token format"
- All models emphasized: don't flag Nix hashes (sha256-, narHash, vendorHash)

* Session Metrics
- Commits made: 0 (work in progress)
- Files created: 5
- Lines added: ~683 (plan.md + skill files + lens)
- bd issues created: 16 (1 epic + 14 children + 1 in dotfiles)
- orch consensus runs: 2