skills/specs/ops-review/plan.md
dan fb882a9434 feat: add ops-review skill with Phase 1 lenses
Multi-lens review skill for operational infrastructure (Nix, shell,
Docker, CI/CD). Modeled on code-review with linter-first hybrid
architecture.

Phase 1 lenses (core safety):
- secrets: credential exposure, Nix store, Docker layers, CI masking
- shell-safety: shellcheck-backed, temp files, guard snippets
- blast-radius: targeting/scoping, dry-run, rollback
- privilege: least-privilege, containers, systemd sandboxing

Design reviewed via orch consensus (sonar, flash-or, gemini, gpt).
Lenses deploy to ~/.config/lenses/ops/ via home-manager.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 17:36:24 -08:00

261 lines
8.8 KiB
Markdown

# ops-review Skill Design
A multi-lens review skill for operational infrastructure, modeled on code-review.
## Problem Statement
Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review.
## Target Artifacts
Based on actual infrastructure in dotfiles and prox-setup:
| Category | Examples |
|----------|----------|
| **Nix/NixOS** | flake.nix, modules/*.nix, home-manager configs |
| **Shell Scripts** | bin/*.sh, setup_*.sh, fix_*.sh, deploy.sh |
| **Python Automation** | Proxmox API scripts, multi-stage deployments |
| **Container Configs** | docker-compose.yml, Dockerfile |
| **CI/CD** | .gitea/workflows/*.yml, .github/actions/*.yml |
| **Service Configs** | systemd units, Ory configs, SOPS files |
## Architecture: Linter-First Hybrid
**Consensus from model review**: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis.
```
Stage 1: Static Tools (fast, deterministic)
├── shellcheck for shell scripts
├── statix + deadnix for Nix
├── hadolint for Dockerfiles
└── yamllint for YAML configs
Stage 2: LLM Analysis (semantic, contextual)
├── Interprets tool output in context
├── Finds logic bugs tools miss
├── Synthesizes cross-file issues
└── Suggests actionable fixes
```
**Why**: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics.
## Proposed Lenses (10 total)
### Core Safety (Phase 1)
#### 1. secrets
**Focus**: Credential hygiene
- Hardcoded secrets, API keys, tokens
- SOPS config issues
- Secrets in logs or error messages
- Secrets passed via CLI args (visible in process list)
- Missing encryption for sensitive data
#### 2. shell-safety
**Focus**: Shell script robustness (backed by shellcheck)
- Missing `set -euo pipefail`
- Unquoted variables (SC2086)
- Unsafe command substitution
- Missing error handling
- Hardcoded paths that should be parameters
#### 3. blast-radius
**Focus**: Change safety and risk containment
- Destructive operations without confirmation
- Missing dry-run mode
- No rollback strategy
- Bulk operations without batching
- Missing pre-flight checks
- No canary/progressive approach
#### 4. privilege
**Focus**: Least privilege violations
- Unnecessary sudo/root usage
- Containers running as root
- Overly permissive file modes (chmod 777)
- Missing capability drops
- Docker socket mounting
- systemd units without sandboxing (ProtectSystem, PrivateTmp)
### Reliability (Phase 2)
#### 5. idempotency
**Focus**: Safe re-execution and convergence
- Scripts that break on re-run
- Missing existence checks (create-if-not-exists)
- Non-atomic operations (partial failure states)
- Check-then-act race conditions
- Missing cleanup on failure
#### 6. supply-chain
**Focus**: Dependency provenance and pinning
- Unpinned versions (`latest` tags, floating refs)
- GitHub/Gitea actions not pinned to SHA
- Missing Nix flake.lock or SRI hashes
- Unsigned artifacts
- Untrusted substituters/registries
#### 7. observability
**Focus**: Visibility into system state
- Silent failures (no logging/alerting)
- Missing health checks (Docker healthcheck, systemd ExecStartPre)
- Incomplete metrics coverage
- Missing structured logging
- No correlation IDs in multi-step scripts
### Architecture (Phase 3)
#### 8. nix-hygiene
**Focus**: Nix-specific quality (backed by statix/deadnix)
- Dead code (unused let bindings, imports)
- Anti-patterns (with lib abuse, IFD without justification)
- Module boundary violations
- Overlay/override issues
- Missing type annotations on options
#### 9. resilience
**Focus**: Runtime fault tolerance
- Missing timeouts on network calls
- No retries with backoff/jitter
- Missing circuit breakers for API calls
- No graceful shutdown handling (SIGTERM)
- Missing resource limits (systemd MemoryMax, Docker mem_limit)
#### 10. orchestration
**Focus**: Execution ordering and coupling (formerly dependency-chains)
- Unclear prerequisites
- Missing documentation of execution order
- Circular dependencies
- Scripts assuming prior state without checking
- Implicit coupling between components
## Crisp Boundaries
To avoid duplicate findings across overlapping lenses:
| Lens | Owns | Does NOT Own |
|------|------|--------------|
| **idempotency** | Safe re-run, convergence, atomic writes, create-if-exists | Rollback (blast-radius), retries (resilience) |
| **resilience** | Runtime fault tolerance, timeouts, retries, graceful shutdown | Change safety (blast-radius), re-run safety (idempotency) |
| **blast-radius** | Change safety, dry-run, rollback, confirmation gates, batching | Runtime behavior (resilience), re-run (idempotency) |
## Skill Structure
```
skills/ops-review/
├── SKILL.md # Agent instructions (workflow)
├── README.md # User documentation
└── lenses/
├── README.md # Lens index
├── secrets.md
├── shell-safety.md
├── blast-radius.md
├── privilege.md
├── idempotency.md
├── supply-chain.md
├── observability.md
├── nix-hygiene.md
├── resilience.md
└── orchestration.md
```
Lenses deploy to `~/.config/lenses/ops/` via home-manager.
## Workflow
### Standard Mode
1. **Target selection** - files/directory to review
2. **Pre-pass** - Run static tools (shellcheck, statix, etc.)
3. **Reference mapping** - Build lightweight call graph (source, imports, ExecStart)
4. **Lens execution** - One pass per lens, tool output in context
5. **Synthesis** - Dedupe across lenses, rank by severity
6. **Interactive review** - User approves findings
7. **Issue filing** - `bd create` for approved items
### Quick Mode (`--quick`)
Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege.
Ideal for pre-commit or CI gates.
## Output Format
Per-lens findings:
```
[LENS-TAG] <severity:HIGH|MED|LOW> <file:line>
Issue: <what's wrong>
Suggest: <how to fix>
Evidence: <why it matters>
```
### Severity Rubric
| Severity | Criteria |
|----------|----------|
| **HIGH** | Exploitable vulnerability, data loss risk, or will break on next run |
| **MED** | Reliability issue, tech debt, or violation of best practice |
| **LOW** | Polish, maintainability, or defense-in-depth improvement |
Context matters: same issue may be HIGH in production, LOW in homelab.
## Cross-File Awareness
Build a simple reference map before review:
- **Shell**: `source`, `.` includes, invoked scripts
- **Nix**: imports, flake inputs
- **CI**: referenced scripts, env vars, secrets names
- **Compose**: service dependencies, volumes, env files
- **systemd**: ExecStart targets, dependencies
This enables finding issues in the seams between components.
## Implementation Phases
### Phase 1: Safety Net (High ROI, Low Ambiguity)
1. **secrets** - Non-negotiable, prevents catastrophes
2. **shell-safety** - Most brittle artifact type, shellcheck-backed
3. **blast-radius** - Where LLMs shine (understanding implications)
4. **privilege** - Highly actionable, high impact
### Phase 2: Reliability Layer
5. **idempotency** - Essential for setup/deploy scripts
6. **supply-chain** - Critical for reproducibility
7. **observability** - Easy to check, high debugging value
### Phase 3: Architecture Polish
8. **nix-hygiene** - statix/deadnix backed, LLM explains
9. **resilience** - Needs nuance to avoid bad advice
10. **orchestration** - Most complex, needs full context
## Design Decisions
1. **Linter-first, LLM-second**: Static tools for syntax, LLM for semantics
2. **Crisp lens boundaries**: Each rule has one primary owner
3. **Severity tied to impact**: Not all violations are equal
4. **Quick mode**: Phase 1 for pre-commit/CI
5. **Cross-file awareness**: Grep-based reference mapping
6. **Escape hatches**: Intentional patterns can be flagged + suppressed
## Success Criteria
- Can review dotfiles/ and find real issues
- Can review prox-setup/ and find real issues
- Findings are actionable, not noise
- Phase 1 lenses have <10% false positive rate
- Integrates with existing bd issue tracking
- Quick mode runs in <30 seconds
## Open Questions (Resolved)
| Question | Resolution |
|----------|------------|
| Nix: statix/deadnix or pure LLM? | **Hybrid**: Tools first, LLM interprets |
| Shell: integrate shellcheck? | **Yes**: Treat as compiler, LLM groups/prioritizes |
| Multi-file dependencies? | **Grep-based reference map** pre-pass |
| Quick mode? | **Yes**: Phase 1 lenses only |
| Prioritize across artifact types? | **By risk**: secrets/destructive ops first, not file type |
## References
- [Google SRE Book](https://sre.google/sre-book/table-of-contents/)
- [OWASP Infrastructure Security](https://owasp.org/www-project-devsecops-guideline/)
- Consensus review: sonar, flash-or, gemini, gpt (2025-01-01)