#+TITLE: ops-review Skill Design, Orch Consensus Planning, and Skeleton Implementation #+DATE: 2026-01-01 #+KEYWORDS: ops-review, skill-design, orch-consensus, lenses, infrastructure-review, nix, shell-safety, secrets #+COMMITS: 0 (uncommitted work in progress) #+COMPRESSION_STATUS: uncompressed * Session Summary ** Date: 2026-01-01 ** Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis * Accomplishments - [X] Explored dotfiles and prox-setup repos to understand actual ops artifact landscape - [X] Designed ops-review skill with 10 lenses across 3 phases - [X] Ran orch consensus (sonar, flash-or, gemini, gpt) on initial plan - [X] Incorporated consensus feedback: linter-first hybrid architecture, crisp lens boundaries - [X] Created comprehensive plan.md in specs/ops-review/ - [X] Created bd epic (skills-9cu) with 14 child tasks, proper dependency graph - [X] Built skill skeleton: SKILL.md, README.md, lenses/README.md - [X] Drafted secrets.md lens with orch consensus review - [X] Incorporated Nix store exposure, Docker layer persistence, CI masking feedback - [X] Filed follow-up issue in dotfiles for gitleaks availability (dotfiles-x2m) - [ ] Remaining Phase 1 lenses: shell-safety, blast-radius, privilege * Key Decisions ** Decision 1: Linter-first hybrid architecture - Context: How should ops-review analyze infrastructure code? - Options considered: 1. Pure LLM analysis - flexible but prone to syntax hallucinations 2. Pure linter - deterministic but misses semantic issues 3. Hybrid: linters first, LLM interprets - best of both - Rationale: All 4 consensus models agreed LLMs hallucinate syntax but excel at understanding intent. Static tools catch syntax, LLM finds logic bugs. - Impact: Each lens integrates with specific tools (shellcheck, statix, gitleaks) ** Decision 2: 10 lenses across 3 phases - Context: How many lenses and how to prioritize? - Initial proposal: 8 lenses - Consensus feedback: Add privilege (least-privilege) and supply-chain (pinning) - Phase 1 (quick mode): secrets, shell-safety, blast-radius, privilege - Phase 2: idempotency, supply-chain, observability - Phase 3: nix-hygiene, resilience, orchestration ** Decision 3: Crisp lens boundaries to avoid duplicate findings - Problem: resilience/blast-radius/idempotency overlap - Solution: Define ownership table - idempotency: safe re-run, convergence, atomic writes - resilience: runtime fault tolerance, timeouts, retries - blast-radius: change safety, dry-run, rollback ** Decision 4: Nix-specific checks as first-class concerns - Context: Nix has unique security model (world-readable store) - Insight from consensus: Secrets in .nix strings become readable in /nix/store - Added to secrets lens: explicit Nix store exposure check - Remediation: sops-nix/agenix with runtime paths, not embedded strings * Problems & Solutions | Problem | Solution | Learning | |---------+----------+----------| | Initial lens drafts too long (60+ lines) | Reference existing code-review lenses (~45 lines) | Consistent format matters for usability | | Overlapping lens scopes | Created "Crisp Boundaries" table in plan | Define ownership explicitly upfront | | What lenses are actually needed? | Explored real repos (dotfiles, prox-setup) | Ground design in actual artifacts | | False positive risk in secrets lens | Added explicit exemptions (Nix hashes, public keys) | Two-signal rule for generic matches | * Technical Details ** Code Changes - Total files created: 5 - Key files: - `specs/ops-review/plan.md` (261 lines) - Comprehensive design document - `skills/ops-review/SKILL.md` (188 lines) - Agent workflow instructions - `skills/ops-review/README.md` (96 lines) - User documentation - `skills/ops-review/lenses/README.md` (85 lines) - Lens index - `skills/ops-review/lenses/secrets.md` (53 lines) - First lens ** Commands Used #+begin_src bash # Explored actual infrastructure repos # (via Task tool with Explore subagent) # Ran multi-model consensus for plan review uv run orch consensus "Review this ops-review skill design..." sonar flash-or gemini gpt # Created bd epic with hierarchical children bd create "ops-review skill" --type=epic -p 1 --description "..." bd create "Lens: secrets" --parent skills-9cu -p 1 --deps skills-9cu.1 # Visualized dependency graph bd graph skills-9cu # Checked available work bd ready #+end_src ** Architecture Notes - Skill follows code-review pattern: lenses as focused prompts - Lenses deploy to ~/.config/lenses/ops/ via home-manager - Quick mode (--quick) runs Phase 1 only for CI/pre-commit - Cross-file awareness via grep-based reference mapping (source, imports) * Process and Workflow ** What Worked Well - Exploring real repos first grounded the design in actual needs - orch consensus with 4 models surfaced gaps (Nix store, Docker layers) - bd epic with --parent creates clean hierarchical structure - Dependency graph visualization helped verify task ordering ** What Was Challenging - Balancing lens completeness with ~45 line target format - Deciding which checks are linter-backed vs LLM-primary - Managing context across long design session * Learning and Insights ** Technical Insights - Nix store world-readability is a critical security consideration - Docker ENV/ARG persist in image layers even if later deleted - CI masking (::add-mask::) is often overlooked - shellcheck, statix, gitleaks provide structured JSON output for integration ** Process Insights - orch consensus is valuable for pressure-testing designs - High temp for brainstorming, low temp for analysis decisions - bd hierarchical children (.1, .2, etc.) work well for epic breakdown ** Architectural Insights - Linter-first hybrid is emerging pattern (doc-review also uses this) - Lens boundaries must be explicit to avoid duplicate findings - Platform-specific remediation matters (sops-nix vs BuildKit secrets) * Context for Future Work ** Open Questions - Should ops-review have its own lens directory or share with code-review? - How to handle cross-repo awareness (dotfiles uses sops, prox-setup uses passage)? - Should we run linters in parallel before LLM pass? ** Next Steps - Complete Phase 1 lenses: shell-safety, blast-radius, privilege - Integration: add to flake.nix, update ai-skills.nix - Validation: test on dotfiles and prox-setup repos - Ensure gitleaks available (dotfiles-x2m) ** Related Work - [[file:2025-12-28-code-review-skill-creation-worklog-cleanup.org][Code Review Skill Creation]] - Original lens pattern - [[file:2025-12-04-doc-review-skill-design.org][Doc-Review Skill Design]] - Hybrid architecture precedent - [[file:2025-12-26-multi-lens-code-review-workflow-testing.org][Multi-Lens Code Review Testing]] - LLM-in-the-loop pattern * Raw Notes - Dotfiles repo: 100+ Nix modules, 90+ shell scripts, SOPS secrets, Gitea Actions - Prox-setup repo: 88 Python scripts (Proxmox API), 41 shell scripts, Docker Compose - Models consulted: sonar, flash-or, gemini, gpt (all 4 supported the design) - Key insight from GPT: "Require two signals for MED/HIGH when not using known token format" - All models emphasized: don't flag Nix hashes (sha256-, narHash, vendorHash) * Session Metrics - Commits made: 0 (work in progress) - Files created: 5 - Lines added: ~683 (plan.md + skill files + lens) - bd issues created: 16 (1 epic + 14 children + 1 in dotfiles) - orch consensus runs: 2