skills/specs/ops-review/plan.md
dan fb882a9434 feat: add ops-review skill with Phase 1 lenses
Multi-lens review skill for operational infrastructure (Nix, shell,
Docker, CI/CD). Modeled on code-review with linter-first hybrid
architecture.

Phase 1 lenses (core safety):
- secrets: credential exposure, Nix store, Docker layers, CI masking
- shell-safety: shellcheck-backed, temp files, guard snippets
- blast-radius: targeting/scoping, dry-run, rollback
- privilege: least-privilege, containers, systemd sandboxing

Design reviewed via orch consensus (sonar, flash-or, gemini, gpt).
Lenses deploy to ~/.config/lenses/ops/ via home-manager.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 17:36:24 -08:00

8.8 KiB

ops-review Skill Design

A multi-lens review skill for operational infrastructure, modeled on code-review.

Problem Statement

Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review.

Target Artifacts

Based on actual infrastructure in dotfiles and prox-setup:

Category Examples
Nix/NixOS flake.nix, modules/*.nix, home-manager configs
Shell Scripts bin/.sh, setup_.sh, fix_*.sh, deploy.sh
Python Automation Proxmox API scripts, multi-stage deployments
Container Configs docker-compose.yml, Dockerfile
CI/CD .gitea/workflows/.yml, .github/actions/.yml
Service Configs systemd units, Ory configs, SOPS files

Architecture: Linter-First Hybrid

Consensus from model review: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis.

Stage 1: Static Tools (fast, deterministic)
├── shellcheck for shell scripts
├── statix + deadnix for Nix
├── hadolint for Dockerfiles
└── yamllint for YAML configs

Stage 2: LLM Analysis (semantic, contextual)
├── Interprets tool output in context
├── Finds logic bugs tools miss
├── Synthesizes cross-file issues
└── Suggests actionable fixes

Why: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics.

Proposed Lenses (10 total)

Core Safety (Phase 1)

1. secrets

Focus: Credential hygiene

  • Hardcoded secrets, API keys, tokens
  • SOPS config issues
  • Secrets in logs or error messages
  • Secrets passed via CLI args (visible in process list)
  • Missing encryption for sensitive data

2. shell-safety

Focus: Shell script robustness (backed by shellcheck)

  • Missing set -euo pipefail
  • Unquoted variables (SC2086)
  • Unsafe command substitution
  • Missing error handling
  • Hardcoded paths that should be parameters

3. blast-radius

Focus: Change safety and risk containment

  • Destructive operations without confirmation
  • Missing dry-run mode
  • No rollback strategy
  • Bulk operations without batching
  • Missing pre-flight checks
  • No canary/progressive approach

4. privilege

Focus: Least privilege violations

  • Unnecessary sudo/root usage
  • Containers running as root
  • Overly permissive file modes (chmod 777)
  • Missing capability drops
  • Docker socket mounting
  • systemd units without sandboxing (ProtectSystem, PrivateTmp)

Reliability (Phase 2)

5. idempotency

Focus: Safe re-execution and convergence

  • Scripts that break on re-run
  • Missing existence checks (create-if-not-exists)
  • Non-atomic operations (partial failure states)
  • Check-then-act race conditions
  • Missing cleanup on failure

6. supply-chain

Focus: Dependency provenance and pinning

  • Unpinned versions (latest tags, floating refs)
  • GitHub/Gitea actions not pinned to SHA
  • Missing Nix flake.lock or SRI hashes
  • Unsigned artifacts
  • Untrusted substituters/registries

7. observability

Focus: Visibility into system state

  • Silent failures (no logging/alerting)
  • Missing health checks (Docker healthcheck, systemd ExecStartPre)
  • Incomplete metrics coverage
  • Missing structured logging
  • No correlation IDs in multi-step scripts

Architecture (Phase 3)

8. nix-hygiene

Focus: Nix-specific quality (backed by statix/deadnix)

  • Dead code (unused let bindings, imports)
  • Anti-patterns (with lib abuse, IFD without justification)
  • Module boundary violations
  • Overlay/override issues
  • Missing type annotations on options

9. resilience

Focus: Runtime fault tolerance

  • Missing timeouts on network calls
  • No retries with backoff/jitter
  • Missing circuit breakers for API calls
  • No graceful shutdown handling (SIGTERM)
  • Missing resource limits (systemd MemoryMax, Docker mem_limit)

10. orchestration

Focus: Execution ordering and coupling (formerly dependency-chains)

  • Unclear prerequisites
  • Missing documentation of execution order
  • Circular dependencies
  • Scripts assuming prior state without checking
  • Implicit coupling between components

Crisp Boundaries

To avoid duplicate findings across overlapping lenses:

Lens Owns Does NOT Own
idempotency Safe re-run, convergence, atomic writes, create-if-exists Rollback (blast-radius), retries (resilience)
resilience Runtime fault tolerance, timeouts, retries, graceful shutdown Change safety (blast-radius), re-run safety (idempotency)
blast-radius Change safety, dry-run, rollback, confirmation gates, batching Runtime behavior (resilience), re-run (idempotency)

Skill Structure

skills/ops-review/
├── SKILL.md           # Agent instructions (workflow)
├── README.md          # User documentation
└── lenses/
    ├── README.md      # Lens index
    ├── secrets.md
    ├── shell-safety.md
    ├── blast-radius.md
    ├── privilege.md
    ├── idempotency.md
    ├── supply-chain.md
    ├── observability.md
    ├── nix-hygiene.md
    ├── resilience.md
    └── orchestration.md

Lenses deploy to ~/.config/lenses/ops/ via home-manager.

Workflow

Standard Mode

  1. Target selection - files/directory to review
  2. Pre-pass - Run static tools (shellcheck, statix, etc.)
  3. Reference mapping - Build lightweight call graph (source, imports, ExecStart)
  4. Lens execution - One pass per lens, tool output in context
  5. Synthesis - Dedupe across lenses, rank by severity
  6. Interactive review - User approves findings
  7. Issue filing - bd create for approved items

Quick Mode (--quick)

Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege. Ideal for pre-commit or CI gates.

Output Format

Per-lens findings:

[LENS-TAG] <severity:HIGH|MED|LOW> <file:line>
Issue: <what's wrong>
Suggest: <how to fix>
Evidence: <why it matters>

Severity Rubric

Severity Criteria
HIGH Exploitable vulnerability, data loss risk, or will break on next run
MED Reliability issue, tech debt, or violation of best practice
LOW Polish, maintainability, or defense-in-depth improvement

Context matters: same issue may be HIGH in production, LOW in homelab.

Cross-File Awareness

Build a simple reference map before review:

  • Shell: source, . includes, invoked scripts
  • Nix: imports, flake inputs
  • CI: referenced scripts, env vars, secrets names
  • Compose: service dependencies, volumes, env files
  • systemd: ExecStart targets, dependencies

This enables finding issues in the seams between components.

Implementation Phases

Phase 1: Safety Net (High ROI, Low Ambiguity)

  1. secrets - Non-negotiable, prevents catastrophes
  2. shell-safety - Most brittle artifact type, shellcheck-backed
  3. blast-radius - Where LLMs shine (understanding implications)
  4. privilege - Highly actionable, high impact

Phase 2: Reliability Layer

  1. idempotency - Essential for setup/deploy scripts
  2. supply-chain - Critical for reproducibility
  3. observability - Easy to check, high debugging value

Phase 3: Architecture Polish

  1. nix-hygiene - statix/deadnix backed, LLM explains
  2. resilience - Needs nuance to avoid bad advice
  3. orchestration - Most complex, needs full context

Design Decisions

  1. Linter-first, LLM-second: Static tools for syntax, LLM for semantics
  2. Crisp lens boundaries: Each rule has one primary owner
  3. Severity tied to impact: Not all violations are equal
  4. Quick mode: Phase 1 for pre-commit/CI
  5. Cross-file awareness: Grep-based reference mapping
  6. Escape hatches: Intentional patterns can be flagged + suppressed

Success Criteria

  • Can review dotfiles/ and find real issues
  • Can review prox-setup/ and find real issues
  • Findings are actionable, not noise
  • Phase 1 lenses have <10% false positive rate
  • Integrates with existing bd issue tracking
  • Quick mode runs in <30 seconds

Open Questions (Resolved)

Question Resolution
Nix: statix/deadnix or pure LLM? Hybrid: Tools first, LLM interprets
Shell: integrate shellcheck? Yes: Treat as compiler, LLM groups/prioritizes
Multi-file dependencies? Grep-based reference map pre-pass
Quick mode? Yes: Phase 1 lenses only
Prioritize across artifact types? By risk: secrets/destructive ops first, not file type

References