skills/specs/ops-review/plan.md

# ops-review Skill Design

A multi-lens review skill for operational infrastructure, modeled on code-review.

## Problem Statement

Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review.

## Target Artifacts

Based on actual infrastructure in dotfiles and prox-setup:

| Category | Examples |
|----------|----------|
| **Nix/NixOS** | flake.nix, modules/*.nix, home-manager configs |
| **Shell Scripts** | bin/*.sh, setup_*.sh, fix_*.sh, deploy.sh |
| **Python Automation** | Proxmox API scripts, multi-stage deployments |
| **Container Configs** | docker-compose.yml, Dockerfile |
| **CI/CD** | .gitea/workflows/*.yml, .github/actions/*.yml |
| **Service Configs** | systemd units, Ory configs, SOPS files |

## Architecture: Linter-First Hybrid

**Consensus from model review**: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis.

```
Stage 1: Static Tools (fast, deterministic)
├── shellcheck for shell scripts
├── statix + deadnix for Nix
├── hadolint for Dockerfiles
└── yamllint for YAML configs

Stage 2: LLM Analysis (semantic, contextual)
├── Interprets tool output in context
├── Finds logic bugs tools miss
├── Synthesizes cross-file issues
└── Suggests actionable fixes
```

**Why**: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics.

## Proposed Lenses (10 total)

### Core Safety (Phase 1)

#### 1. secrets
**Focus**: Credential hygiene
- Hardcoded secrets, API keys, tokens
- SOPS config issues
- Secrets in logs or error messages
- Secrets passed via CLI args (visible in process list)
- Missing encryption for sensitive data

#### 2. shell-safety
**Focus**: Shell script robustness (backed by shellcheck)
- Missing `set -euo pipefail`
- Unquoted variables (SC2086)
- Unsafe command substitution
- Missing error handling
- Hardcoded paths that should be parameters

#### 3. blast-radius
**Focus**: Change safety and risk containment
- Destructive operations without confirmation
- Missing dry-run mode
- No rollback strategy
- Bulk operations without batching
- Missing pre-flight checks
- No canary/progressive approach

#### 4. privilege
**Focus**: Least privilege violations
- Unnecessary sudo/root usage
- Containers running as root
- Overly permissive file modes (chmod 777)
- Missing capability drops
- Docker socket mounting
- systemd units without sandboxing (ProtectSystem, PrivateTmp)

### Reliability (Phase 2)

#### 5. idempotency
**Focus**: Safe re-execution and convergence
- Scripts that break on re-run
- Missing existence checks (create-if-not-exists)
- Non-atomic operations (partial failure states)
- Check-then-act race conditions
- Missing cleanup on failure

#### 6. supply-chain
**Focus**: Dependency provenance and pinning
- Unpinned versions (`latest` tags, floating refs)
- GitHub/Gitea actions not pinned to SHA
- Missing Nix flake.lock or SRI hashes
- Unsigned artifacts
- Untrusted substituters/registries

#### 7. observability
**Focus**: Visibility into system state
- Silent failures (no logging/alerting)
- Missing health checks (Docker healthcheck, systemd ExecStartPre)
- Incomplete metrics coverage
- Missing structured logging
- No correlation IDs in multi-step scripts

### Architecture (Phase 3)

#### 8. nix-hygiene
**Focus**: Nix-specific quality (backed by statix/deadnix)
- Dead code (unused let bindings, imports)
- Anti-patterns (with lib abuse, IFD without justification)
- Module boundary violations
- Overlay/override issues
- Missing type annotations on options

#### 9. resilience
**Focus**: Runtime fault tolerance
- Missing timeouts on network calls
- No retries with backoff/jitter
- Missing circuit breakers for API calls
- No graceful shutdown handling (SIGTERM)
- Missing resource limits (systemd MemoryMax, Docker mem_limit)

#### 10. orchestration
**Focus**: Execution ordering and coupling (formerly dependency-chains)
- Unclear prerequisites
- Missing documentation of execution order
- Circular dependencies
- Scripts assuming prior state without checking
- Implicit coupling between components

## Crisp Boundaries

To avoid duplicate findings across overlapping lenses:

| Lens | Owns | Does NOT Own |
|------|------|--------------|
| **idempotency** | Safe re-run, convergence, atomic writes, create-if-exists | Rollback (blast-radius), retries (resilience) |
| **resilience** | Runtime fault tolerance, timeouts, retries, graceful shutdown | Change safety (blast-radius), re-run safety (idempotency) |
| **blast-radius** | Change safety, dry-run, rollback, confirmation gates, batching | Runtime behavior (resilience), re-run (idempotency) |

## Skill Structure

```
skills/ops-review/
├── SKILL.md           # Agent instructions (workflow)
├── README.md          # User documentation
└── lenses/
    ├── README.md      # Lens index
    ├── secrets.md
    ├── shell-safety.md
    ├── blast-radius.md
    ├── privilege.md
    ├── idempotency.md
    ├── supply-chain.md
    ├── observability.md
    ├── nix-hygiene.md
    ├── resilience.md
    └── orchestration.md
```

Lenses deploy to `~/.config/lenses/ops/` via home-manager.

## Workflow

### Standard Mode
1. **Target selection** - files/directory to review
2. **Pre-pass** - Run static tools (shellcheck, statix, etc.)
3. **Reference mapping** - Build lightweight call graph (source, imports, ExecStart)
4. **Lens execution** - One pass per lens, tool output in context
5. **Synthesis** - Dedupe across lenses, rank by severity
6. **Interactive review** - User approves findings
7. **Issue filing** - `bd create` for approved items

### Quick Mode (`--quick`)
Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege.
Ideal for pre-commit or CI gates.

## Output Format

Per-lens findings:
```
[LENS-TAG] <severity:HIGH|MED|LOW> <file:line>
Issue: <what's wrong>
Suggest: <how to fix>
Evidence: <why it matters>
```

### Severity Rubric

| Severity | Criteria |
|----------|----------|
| **HIGH** | Exploitable vulnerability, data loss risk, or will break on next run |
| **MED** | Reliability issue, tech debt, or violation of best practice |
| **LOW** | Polish, maintainability, or defense-in-depth improvement |

Context matters: same issue may be HIGH in production, LOW in homelab.

## Cross-File Awareness

Build a simple reference map before review:
- **Shell**: `source`, `.` includes, invoked scripts
- **Nix**: imports, flake inputs
- **CI**: referenced scripts, env vars, secrets names
- **Compose**: service dependencies, volumes, env files
- **systemd**: ExecStart targets, dependencies

This enables finding issues in the seams between components.

## Implementation Phases

### Phase 1: Safety Net (High ROI, Low Ambiguity)
1. **secrets** - Non-negotiable, prevents catastrophes
2. **shell-safety** - Most brittle artifact type, shellcheck-backed
3. **blast-radius** - Where LLMs shine (understanding implications)
4. **privilege** - Highly actionable, high impact

### Phase 2: Reliability Layer
5. **idempotency** - Essential for setup/deploy scripts
6. **supply-chain** - Critical for reproducibility
7. **observability** - Easy to check, high debugging value

### Phase 3: Architecture Polish
8. **nix-hygiene** - statix/deadnix backed, LLM explains
9. **resilience** - Needs nuance to avoid bad advice
10. **orchestration** - Most complex, needs full context

## Design Decisions

1. **Linter-first, LLM-second**: Static tools for syntax, LLM for semantics
2. **Crisp lens boundaries**: Each rule has one primary owner
3. **Severity tied to impact**: Not all violations are equal
4. **Quick mode**: Phase 1 for pre-commit/CI
5. **Cross-file awareness**: Grep-based reference mapping
6. **Escape hatches**: Intentional patterns can be flagged + suppressed

## Success Criteria

- Can review dotfiles/ and find real issues
- Can review prox-setup/ and find real issues
- Findings are actionable, not noise
- Phase 1 lenses have <10% false positive rate
- Integrates with existing bd issue tracking
- Quick mode runs in <30 seconds

## Open Questions (Resolved)

| Question | Resolution |
|----------|------------|
| Nix: statix/deadnix or pure LLM? | **Hybrid**: Tools first, LLM interprets |
| Shell: integrate shellcheck? | **Yes**: Treat as compiler, LLM groups/prioritizes |
| Multi-file dependencies? | **Grep-based reference map** pre-pass |
| Quick mode? | **Yes**: Phase 1 lenses only |
| Prioritize across artifact types? | **By risk**: secrets/destructive ops first, not file type |

## References

- [Google SRE Book](https://sre.google/sre-book/table-of-contents/)
- [OWASP Infrastructure Security](https://owasp.org/www-project-devsecops-guideline/)
- Consensus review: sonar, flash-or, gemini, gpt (2025-01-01)