skills/docs/worklogs/2026-01-01-ops-review-skill-design-and-skeleton.org
dan fb882a9434 feat: add ops-review skill with Phase 1 lenses
Multi-lens review skill for operational infrastructure (Nix, shell,
Docker, CI/CD). Modeled on code-review with linter-first hybrid
architecture.

Phase 1 lenses (core safety):
- secrets: credential exposure, Nix store, Docker layers, CI masking
- shell-safety: shellcheck-backed, temp files, guard snippets
- blast-radius: targeting/scoping, dry-run, rollback
- privilege: least-privilege, containers, systemd sandboxing

Design reviewed via orch consensus (sonar, flash-or, gemini, gpt).
Lenses deploy to ~/.config/lenses/ops/ via home-manager.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 17:36:24 -08:00

7.2 KiB
Raw Blame History

ops-review Skill Design, Orch Consensus Planning, and Skeleton Implementation

Session Summary

Date: 2026-01-01

Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis

Accomplishments

  • Explored dotfiles and prox-setup repos to understand actual ops artifact landscape
  • Designed ops-review skill with 10 lenses across 3 phases
  • Ran orch consensus (sonar, flash-or, gemini, gpt) on initial plan
  • Incorporated consensus feedback: linter-first hybrid architecture, crisp lens boundaries
  • Created comprehensive plan.md in specs/ops-review/
  • Created bd epic (skills-9cu) with 14 child tasks, proper dependency graph
  • Built skill skeleton: SKILL.md, README.md, lenses/README.md
  • Drafted secrets.md lens with orch consensus review
  • Incorporated Nix store exposure, Docker layer persistence, CI masking feedback
  • Filed follow-up issue in dotfiles for gitleaks availability (dotfiles-x2m)
  • Remaining Phase 1 lenses: shell-safety, blast-radius, privilege

Key Decisions

Decision 1: Linter-first hybrid architecture

  • Context: How should ops-review analyze infrastructure code?
  • Options considered:

    1. Pure LLM analysis - flexible but prone to syntax hallucinations
    2. Pure linter - deterministic but misses semantic issues
    3. Hybrid: linters first, LLM interprets - best of both
  • Rationale: All 4 consensus models agreed LLMs hallucinate syntax but excel at understanding intent. Static tools catch syntax, LLM finds logic bugs.
  • Impact: Each lens integrates with specific tools (shellcheck, statix, gitleaks)

Decision 2: 10 lenses across 3 phases

  • Context: How many lenses and how to prioritize?
  • Initial proposal: 8 lenses
  • Consensus feedback: Add privilege (least-privilege) and supply-chain (pinning)
  • Phase 1 (quick mode): secrets, shell-safety, blast-radius, privilege
  • Phase 2: idempotency, supply-chain, observability
  • Phase 3: nix-hygiene, resilience, orchestration

Decision 3: Crisp lens boundaries to avoid duplicate findings

  • Problem: resilience/blast-radius/idempotency overlap
  • Solution: Define ownership table

    • idempotency: safe re-run, convergence, atomic writes
    • resilience: runtime fault tolerance, timeouts, retries
    • blast-radius: change safety, dry-run, rollback

Decision 4: Nix-specific checks as first-class concerns

  • Context: Nix has unique security model (world-readable store)
  • Insight from consensus: Secrets in .nix strings become readable in /nix/store
  • Added to secrets lens: explicit Nix store exposure check
  • Remediation: sops-nix/agenix with runtime paths, not embedded strings

Problems & Solutions

Problem Solution Learning
Initial lens drafts too long (60+ lines) Reference existing code-review lenses (~45 lines) Consistent format matters for usability
Overlapping lens scopes Created "Crisp Boundaries" table in plan Define ownership explicitly upfront
What lenses are actually needed? Explored real repos (dotfiles, prox-setup) Ground design in actual artifacts
False positive risk in secrets lens Added explicit exemptions (Nix hashes, public keys) Two-signal rule for generic matches

Technical Details

Code Changes

  • Total files created: 5
  • Key files:

    • `specs/ops-review/plan.md` (261 lines) - Comprehensive design document
    • `skills/ops-review/SKILL.md` (188 lines) - Agent workflow instructions
    • `skills/ops-review/README.md` (96 lines) - User documentation
    • `skills/ops-review/lenses/README.md` (85 lines) - Lens index
    • `skills/ops-review/lenses/secrets.md` (53 lines) - First lens

Commands Used

# Explored actual infrastructure repos
# (via Task tool with Explore subagent)

# Ran multi-model consensus for plan review
uv run orch consensus "Review this ops-review skill design..." sonar flash-or gemini gpt

# Created bd epic with hierarchical children
bd create "ops-review skill" --type=epic -p 1 --description "..."
bd create "Lens: secrets" --parent skills-9cu -p 1 --deps skills-9cu.1

# Visualized dependency graph
bd graph skills-9cu

# Checked available work
bd ready

Architecture Notes

  • Skill follows code-review pattern: lenses as focused prompts
  • Lenses deploy to ~/.config/lenses/ops/ via home-manager
  • Quick mode (quick) runs Phase 1 only for CI/pre-commit
  • Cross-file awareness via grep-based reference mapping (source, imports)

Process and Workflow

What Worked Well

  • Exploring real repos first grounded the design in actual needs
  • orch consensus with 4 models surfaced gaps (Nix store, Docker layers)
  • bd epic with parent creates clean hierarchical structure
  • Dependency graph visualization helped verify task ordering

What Was Challenging

  • Balancing lens completeness with ~45 line target format
  • Deciding which checks are linter-backed vs LLM-primary
  • Managing context across long design session

Learning and Insights

Technical Insights

  • Nix store world-readability is a critical security consideration
  • Docker ENV/ARG persist in image layers even if later deleted
  • CI masking (::add-mask::) is often overlooked
  • shellcheck, statix, gitleaks provide structured JSON output for integration

Process Insights

  • orch consensus is valuable for pressure-testing designs
  • High temp for brainstorming, low temp for analysis decisions
  • bd hierarchical children (.1, .2, etc.) work well for epic breakdown

Architectural Insights

  • Linter-first hybrid is emerging pattern (doc-review also uses this)
  • Lens boundaries must be explicit to avoid duplicate findings
  • Platform-specific remediation matters (sops-nix vs BuildKit secrets)

Context for Future Work

Open Questions

  • Should ops-review have its own lens directory or share with code-review?
  • How to handle cross-repo awareness (dotfiles uses sops, prox-setup uses passage)?
  • Should we run linters in parallel before LLM pass?

Next Steps

  • Complete Phase 1 lenses: shell-safety, blast-radius, privilege
  • Integration: add to flake.nix, update ai-skills.nix
  • Validation: test on dotfiles and prox-setup repos
  • Ensure gitleaks available (dotfiles-x2m)

Related Work

Raw Notes

  • Dotfiles repo: 100+ Nix modules, 90+ shell scripts, SOPS secrets, Gitea Actions
  • Prox-setup repo: 88 Python scripts (Proxmox API), 41 shell scripts, Docker Compose
  • Models consulted: sonar, flash-or, gemini, gpt (all 4 supported the design)
  • Key insight from GPT: "Require two signals for MED/HIGH when not using known token format"
  • All models emphasized: don't flag Nix hashes (sha256-, narHash, vendorHash)

Session Metrics

  • Commits made: 0 (work in progress)
  • Files created: 5
  • Lines added: ~683 (plan.md + skill files + lens)
  • bd issues created: 16 (1 epic + 14 children + 1 in dotfiles)
  • orch consensus runs: 2