Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD). Modeled on code-review with linter-first hybrid architecture. Phase 1 lenses (core safety): - secrets: credential exposure, Nix store, Docker layers, CI masking - shell-safety: shellcheck-backed, temp files, guard snippets - blast-radius: targeting/scoping, dry-run, rollback - privilege: least-privilege, containers, systemd sandboxing Design reviewed via orch consensus (sonar, flash-or, gemini, gpt). Lenses deploy to ~/.config/lenses/ops/ via home-manager. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8.8 KiB
ops-review Skill Design
A multi-lens review skill for operational infrastructure, modeled on code-review.
Problem Statement
Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review.
Target Artifacts
Based on actual infrastructure in dotfiles and prox-setup:
| Category | Examples |
|---|---|
| Nix/NixOS | flake.nix, modules/*.nix, home-manager configs |
| Shell Scripts | bin/.sh, setup_.sh, fix_*.sh, deploy.sh |
| Python Automation | Proxmox API scripts, multi-stage deployments |
| Container Configs | docker-compose.yml, Dockerfile |
| CI/CD | .gitea/workflows/.yml, .github/actions/.yml |
| Service Configs | systemd units, Ory configs, SOPS files |
Architecture: Linter-First Hybrid
Consensus from model review: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis.
Stage 1: Static Tools (fast, deterministic)
├── shellcheck for shell scripts
├── statix + deadnix for Nix
├── hadolint for Dockerfiles
└── yamllint for YAML configs
Stage 2: LLM Analysis (semantic, contextual)
├── Interprets tool output in context
├── Finds logic bugs tools miss
├── Synthesizes cross-file issues
└── Suggests actionable fixes
Why: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics.
Proposed Lenses (10 total)
Core Safety (Phase 1)
1. secrets
Focus: Credential hygiene
- Hardcoded secrets, API keys, tokens
- SOPS config issues
- Secrets in logs or error messages
- Secrets passed via CLI args (visible in process list)
- Missing encryption for sensitive data
2. shell-safety
Focus: Shell script robustness (backed by shellcheck)
- Missing
set -euo pipefail - Unquoted variables (SC2086)
- Unsafe command substitution
- Missing error handling
- Hardcoded paths that should be parameters
3. blast-radius
Focus: Change safety and risk containment
- Destructive operations without confirmation
- Missing dry-run mode
- No rollback strategy
- Bulk operations without batching
- Missing pre-flight checks
- No canary/progressive approach
4. privilege
Focus: Least privilege violations
- Unnecessary sudo/root usage
- Containers running as root
- Overly permissive file modes (chmod 777)
- Missing capability drops
- Docker socket mounting
- systemd units without sandboxing (ProtectSystem, PrivateTmp)
Reliability (Phase 2)
5. idempotency
Focus: Safe re-execution and convergence
- Scripts that break on re-run
- Missing existence checks (create-if-not-exists)
- Non-atomic operations (partial failure states)
- Check-then-act race conditions
- Missing cleanup on failure
6. supply-chain
Focus: Dependency provenance and pinning
- Unpinned versions (
latesttags, floating refs) - GitHub/Gitea actions not pinned to SHA
- Missing Nix flake.lock or SRI hashes
- Unsigned artifacts
- Untrusted substituters/registries
7. observability
Focus: Visibility into system state
- Silent failures (no logging/alerting)
- Missing health checks (Docker healthcheck, systemd ExecStartPre)
- Incomplete metrics coverage
- Missing structured logging
- No correlation IDs in multi-step scripts
Architecture (Phase 3)
8. nix-hygiene
Focus: Nix-specific quality (backed by statix/deadnix)
- Dead code (unused let bindings, imports)
- Anti-patterns (with lib abuse, IFD without justification)
- Module boundary violations
- Overlay/override issues
- Missing type annotations on options
9. resilience
Focus: Runtime fault tolerance
- Missing timeouts on network calls
- No retries with backoff/jitter
- Missing circuit breakers for API calls
- No graceful shutdown handling (SIGTERM)
- Missing resource limits (systemd MemoryMax, Docker mem_limit)
10. orchestration
Focus: Execution ordering and coupling (formerly dependency-chains)
- Unclear prerequisites
- Missing documentation of execution order
- Circular dependencies
- Scripts assuming prior state without checking
- Implicit coupling between components
Crisp Boundaries
To avoid duplicate findings across overlapping lenses:
| Lens | Owns | Does NOT Own |
|---|---|---|
| idempotency | Safe re-run, convergence, atomic writes, create-if-exists | Rollback (blast-radius), retries (resilience) |
| resilience | Runtime fault tolerance, timeouts, retries, graceful shutdown | Change safety (blast-radius), re-run safety (idempotency) |
| blast-radius | Change safety, dry-run, rollback, confirmation gates, batching | Runtime behavior (resilience), re-run (idempotency) |
Skill Structure
skills/ops-review/
├── SKILL.md # Agent instructions (workflow)
├── README.md # User documentation
└── lenses/
├── README.md # Lens index
├── secrets.md
├── shell-safety.md
├── blast-radius.md
├── privilege.md
├── idempotency.md
├── supply-chain.md
├── observability.md
├── nix-hygiene.md
├── resilience.md
└── orchestration.md
Lenses deploy to ~/.config/lenses/ops/ via home-manager.
Workflow
Standard Mode
- Target selection - files/directory to review
- Pre-pass - Run static tools (shellcheck, statix, etc.)
- Reference mapping - Build lightweight call graph (source, imports, ExecStart)
- Lens execution - One pass per lens, tool output in context
- Synthesis - Dedupe across lenses, rank by severity
- Interactive review - User approves findings
- Issue filing -
bd createfor approved items
Quick Mode (--quick)
Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege. Ideal for pre-commit or CI gates.
Output Format
Per-lens findings:
[LENS-TAG] <severity:HIGH|MED|LOW> <file:line>
Issue: <what's wrong>
Suggest: <how to fix>
Evidence: <why it matters>
Severity Rubric
| Severity | Criteria |
|---|---|
| HIGH | Exploitable vulnerability, data loss risk, or will break on next run |
| MED | Reliability issue, tech debt, or violation of best practice |
| LOW | Polish, maintainability, or defense-in-depth improvement |
Context matters: same issue may be HIGH in production, LOW in homelab.
Cross-File Awareness
Build a simple reference map before review:
- Shell:
source,.includes, invoked scripts - Nix: imports, flake inputs
- CI: referenced scripts, env vars, secrets names
- Compose: service dependencies, volumes, env files
- systemd: ExecStart targets, dependencies
This enables finding issues in the seams between components.
Implementation Phases
Phase 1: Safety Net (High ROI, Low Ambiguity)
- secrets - Non-negotiable, prevents catastrophes
- shell-safety - Most brittle artifact type, shellcheck-backed
- blast-radius - Where LLMs shine (understanding implications)
- privilege - Highly actionable, high impact
Phase 2: Reliability Layer
- idempotency - Essential for setup/deploy scripts
- supply-chain - Critical for reproducibility
- observability - Easy to check, high debugging value
Phase 3: Architecture Polish
- nix-hygiene - statix/deadnix backed, LLM explains
- resilience - Needs nuance to avoid bad advice
- orchestration - Most complex, needs full context
Design Decisions
- Linter-first, LLM-second: Static tools for syntax, LLM for semantics
- Crisp lens boundaries: Each rule has one primary owner
- Severity tied to impact: Not all violations are equal
- Quick mode: Phase 1 for pre-commit/CI
- Cross-file awareness: Grep-based reference mapping
- Escape hatches: Intentional patterns can be flagged + suppressed
Success Criteria
- Can review dotfiles/ and find real issues
- Can review prox-setup/ and find real issues
- Findings are actionable, not noise
- Phase 1 lenses have <10% false positive rate
- Integrates with existing bd issue tracking
- Quick mode runs in <30 seconds
Open Questions (Resolved)
| Question | Resolution |
|---|---|
| Nix: statix/deadnix or pure LLM? | Hybrid: Tools first, LLM interprets |
| Shell: integrate shellcheck? | Yes: Treat as compiler, LLM groups/prioritizes |
| Multi-file dependencies? | Grep-based reference map pre-pass |
| Quick mode? | Yes: Phase 1 lenses only |
| Prioritize across artifact types? | By risk: secrets/destructive ops first, not file type |
References
- Google SRE Book
- OWASP Infrastructure Security
- Consensus review: sonar, flash-or, gemini, gpt (2025-01-01)