# ops-review Skill Design A multi-lens review skill for operational infrastructure, modeled on code-review. ## Problem Statement Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review. ## Target Artifacts Based on actual infrastructure in dotfiles and prox-setup: | Category | Examples | |----------|----------| | **Nix/NixOS** | flake.nix, modules/*.nix, home-manager configs | | **Shell Scripts** | bin/*.sh, setup_*.sh, fix_*.sh, deploy.sh | | **Python Automation** | Proxmox API scripts, multi-stage deployments | | **Container Configs** | docker-compose.yml, Dockerfile | | **CI/CD** | .gitea/workflows/*.yml, .github/actions/*.yml | | **Service Configs** | systemd units, Ory configs, SOPS files | ## Architecture: Linter-First Hybrid **Consensus from model review**: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis. ``` Stage 1: Static Tools (fast, deterministic) ├── shellcheck for shell scripts ├── statix + deadnix for Nix ├── hadolint for Dockerfiles └── yamllint for YAML configs Stage 2: LLM Analysis (semantic, contextual) ├── Interprets tool output in context ├── Finds logic bugs tools miss ├── Synthesizes cross-file issues └── Suggests actionable fixes ``` **Why**: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics. ## Proposed Lenses (10 total) ### Core Safety (Phase 1) #### 1. secrets **Focus**: Credential hygiene - Hardcoded secrets, API keys, tokens - SOPS config issues - Secrets in logs or error messages - Secrets passed via CLI args (visible in process list) - Missing encryption for sensitive data #### 2. shell-safety **Focus**: Shell script robustness (backed by shellcheck) - Missing `set -euo pipefail` - Unquoted variables (SC2086) - Unsafe command substitution - Missing error handling - Hardcoded paths that should be parameters #### 3. blast-radius **Focus**: Change safety and risk containment - Destructive operations without confirmation - Missing dry-run mode - No rollback strategy - Bulk operations without batching - Missing pre-flight checks - No canary/progressive approach #### 4. privilege **Focus**: Least privilege violations - Unnecessary sudo/root usage - Containers running as root - Overly permissive file modes (chmod 777) - Missing capability drops - Docker socket mounting - systemd units without sandboxing (ProtectSystem, PrivateTmp) ### Reliability (Phase 2) #### 5. idempotency **Focus**: Safe re-execution and convergence - Scripts that break on re-run - Missing existence checks (create-if-not-exists) - Non-atomic operations (partial failure states) - Check-then-act race conditions - Missing cleanup on failure #### 6. supply-chain **Focus**: Dependency provenance and pinning - Unpinned versions (`latest` tags, floating refs) - GitHub/Gitea actions not pinned to SHA - Missing Nix flake.lock or SRI hashes - Unsigned artifacts - Untrusted substituters/registries #### 7. observability **Focus**: Visibility into system state - Silent failures (no logging/alerting) - Missing health checks (Docker healthcheck, systemd ExecStartPre) - Incomplete metrics coverage - Missing structured logging - No correlation IDs in multi-step scripts ### Architecture (Phase 3) #### 8. nix-hygiene **Focus**: Nix-specific quality (backed by statix/deadnix) - Dead code (unused let bindings, imports) - Anti-patterns (with lib abuse, IFD without justification) - Module boundary violations - Overlay/override issues - Missing type annotations on options #### 9. resilience **Focus**: Runtime fault tolerance - Missing timeouts on network calls - No retries with backoff/jitter - Missing circuit breakers for API calls - No graceful shutdown handling (SIGTERM) - Missing resource limits (systemd MemoryMax, Docker mem_limit) #### 10. orchestration **Focus**: Execution ordering and coupling (formerly dependency-chains) - Unclear prerequisites - Missing documentation of execution order - Circular dependencies - Scripts assuming prior state without checking - Implicit coupling between components ## Crisp Boundaries To avoid duplicate findings across overlapping lenses: | Lens | Owns | Does NOT Own | |------|------|--------------| | **idempotency** | Safe re-run, convergence, atomic writes, create-if-exists | Rollback (blast-radius), retries (resilience) | | **resilience** | Runtime fault tolerance, timeouts, retries, graceful shutdown | Change safety (blast-radius), re-run safety (idempotency) | | **blast-radius** | Change safety, dry-run, rollback, confirmation gates, batching | Runtime behavior (resilience), re-run (idempotency) | ## Skill Structure ``` skills/ops-review/ ├── SKILL.md # Agent instructions (workflow) ├── README.md # User documentation └── lenses/ ├── README.md # Lens index ├── secrets.md ├── shell-safety.md ├── blast-radius.md ├── privilege.md ├── idempotency.md ├── supply-chain.md ├── observability.md ├── nix-hygiene.md ├── resilience.md └── orchestration.md ``` Lenses deploy to `~/.config/lenses/ops/` via home-manager. ## Workflow ### Standard Mode 1. **Target selection** - files/directory to review 2. **Pre-pass** - Run static tools (shellcheck, statix, etc.) 3. **Reference mapping** - Build lightweight call graph (source, imports, ExecStart) 4. **Lens execution** - One pass per lens, tool output in context 5. **Synthesis** - Dedupe across lenses, rank by severity 6. **Interactive review** - User approves findings 7. **Issue filing** - `bd create` for approved items ### Quick Mode (`--quick`) Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege. Ideal for pre-commit or CI gates. ## Output Format Per-lens findings: ``` [LENS-TAG] Issue: Suggest: Evidence: ``` ### Severity Rubric | Severity | Criteria | |----------|----------| | **HIGH** | Exploitable vulnerability, data loss risk, or will break on next run | | **MED** | Reliability issue, tech debt, or violation of best practice | | **LOW** | Polish, maintainability, or defense-in-depth improvement | Context matters: same issue may be HIGH in production, LOW in homelab. ## Cross-File Awareness Build a simple reference map before review: - **Shell**: `source`, `.` includes, invoked scripts - **Nix**: imports, flake inputs - **CI**: referenced scripts, env vars, secrets names - **Compose**: service dependencies, volumes, env files - **systemd**: ExecStart targets, dependencies This enables finding issues in the seams between components. ## Implementation Phases ### Phase 1: Safety Net (High ROI, Low Ambiguity) 1. **secrets** - Non-negotiable, prevents catastrophes 2. **shell-safety** - Most brittle artifact type, shellcheck-backed 3. **blast-radius** - Where LLMs shine (understanding implications) 4. **privilege** - Highly actionable, high impact ### Phase 2: Reliability Layer 5. **idempotency** - Essential for setup/deploy scripts 6. **supply-chain** - Critical for reproducibility 7. **observability** - Easy to check, high debugging value ### Phase 3: Architecture Polish 8. **nix-hygiene** - statix/deadnix backed, LLM explains 9. **resilience** - Needs nuance to avoid bad advice 10. **orchestration** - Most complex, needs full context ## Design Decisions 1. **Linter-first, LLM-second**: Static tools for syntax, LLM for semantics 2. **Crisp lens boundaries**: Each rule has one primary owner 3. **Severity tied to impact**: Not all violations are equal 4. **Quick mode**: Phase 1 for pre-commit/CI 5. **Cross-file awareness**: Grep-based reference mapping 6. **Escape hatches**: Intentional patterns can be flagged + suppressed ## Success Criteria - Can review dotfiles/ and find real issues - Can review prox-setup/ and find real issues - Findings are actionable, not noise - Phase 1 lenses have <10% false positive rate - Integrates with existing bd issue tracking - Quick mode runs in <30 seconds ## Open Questions (Resolved) | Question | Resolution | |----------|------------| | Nix: statix/deadnix or pure LLM? | **Hybrid**: Tools first, LLM interprets | | Shell: integrate shellcheck? | **Yes**: Treat as compiler, LLM groups/prioritizes | | Multi-file dependencies? | **Grep-based reference map** pre-pass | | Quick mode? | **Yes**: Phase 1 lenses only | | Prioritize across artifact types? | **By risk**: secrets/destructive ops first, not file type | ## References - [Google SRE Book](https://sre.google/sre-book/table-of-contents/) - [OWASP Infrastructure Security](https://owasp.org/www-project-devsecops-guideline/) - Consensus review: sonar, flash-or, gemini, gpt (2025-01-01)