dan/skills

dan fb882a9434 feat: add ops-review skill with Phase 1 lenses

Multi-lens review skill for operational infrastructure (Nix, shell,
Docker, CI/CD). Modeled on code-review with linter-first hybrid
architecture.

Phase 1 lenses (core safety):
- secrets: credential exposure, Nix store, Docker layers, CI masking
- shell-safety: shellcheck-backed, temp files, guard snippets
- blast-radius: targeting/scoping, dry-run, rollback
- privilege: least-privilege, containers, systemd sandboxing

Design reviewed via orch consensus (sonar, flash-or, gemini, gpt).
Lenses deploy to ~/.config/lenses/ops/ via home-manager.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-01 17:36:24 -08:00

8.8 KiB

Raw Blame History

ops-review Skill Design

A multi-lens review skill for operational infrastructure, modeled on code-review.

Problem Statement

Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review.

Target Artifacts

Based on actual infrastructure in dotfiles and prox-setup:

Category	Examples
Nix/NixOS	flake.nix, modules/*.nix, home-manager configs
Shell Scripts	bin/.sh, setup_.sh, fix_*.sh, deploy.sh
Python Automation	Proxmox API scripts, multi-stage deployments
Container Configs	docker-compose.yml, Dockerfile
CI/CD	.gitea/workflows/.yml, .github/actions/.yml
Service Configs	systemd units, Ory configs, SOPS files

Architecture: Linter-First Hybrid

Consensus from model review: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis.

Stage 1: Static Tools (fast, deterministic)
├── shellcheck for shell scripts
├── statix + deadnix for Nix
├── hadolint for Dockerfiles
└── yamllint for YAML configs

Stage 2: LLM Analysis (semantic, contextual)
├── Interprets tool output in context
├── Finds logic bugs tools miss
├── Synthesizes cross-file issues
└── Suggests actionable fixes

Why: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics.

Proposed Lenses (10 total)

Core Safety (Phase 1)

1. secrets

Focus: Credential hygiene

Hardcoded secrets, API keys, tokens
SOPS config issues
Secrets in logs or error messages
Secrets passed via CLI args (visible in process list)
Missing encryption for sensitive data

2. shell-safety

Focus: Shell script robustness (backed by shellcheck)

Missing set -euo pipefail
Unquoted variables (SC2086)
Unsafe command substitution
Missing error handling
Hardcoded paths that should be parameters

3. blast-radius

Focus: Change safety and risk containment

Destructive operations without confirmation
Missing dry-run mode
No rollback strategy
Bulk operations without batching
Missing pre-flight checks
No canary/progressive approach

4. privilege

Focus: Least privilege violations

Unnecessary sudo/root usage
Containers running as root
Overly permissive file modes (chmod 777)
Missing capability drops
Docker socket mounting
systemd units without sandboxing (ProtectSystem, PrivateTmp)

Reliability (Phase 2)

5. idempotency

Focus: Safe re-execution and convergence

Scripts that break on re-run
Missing existence checks (create-if-not-exists)
Non-atomic operations (partial failure states)
Check-then-act race conditions
Missing cleanup on failure

6. supply-chain

Focus: Dependency provenance and pinning

Unpinned versions (latest tags, floating refs)
GitHub/Gitea actions not pinned to SHA
Missing Nix flake.lock or SRI hashes
Unsigned artifacts
Untrusted substituters/registries

7. observability

Focus: Visibility into system state

Silent failures (no logging/alerting)
Missing health checks (Docker healthcheck, systemd ExecStartPre)
Incomplete metrics coverage
Missing structured logging
No correlation IDs in multi-step scripts

Architecture (Phase 3)

8. nix-hygiene

Focus: Nix-specific quality (backed by statix/deadnix)

Dead code (unused let bindings, imports)
Anti-patterns (with lib abuse, IFD without justification)
Module boundary violations
Overlay/override issues
Missing type annotations on options

9. resilience

Focus: Runtime fault tolerance

Missing timeouts on network calls
No retries with backoff/jitter
Missing circuit breakers for API calls
No graceful shutdown handling (SIGTERM)
Missing resource limits (systemd MemoryMax, Docker mem_limit)

10. orchestration

Focus: Execution ordering and coupling (formerly dependency-chains)

Unclear prerequisites
Missing documentation of execution order
Circular dependencies
Scripts assuming prior state without checking
Implicit coupling between components

Crisp Boundaries

To avoid duplicate findings across overlapping lenses:

Lens	Owns	Does NOT Own
idempotency	Safe re-run, convergence, atomic writes, create-if-exists	Rollback (blast-radius), retries (resilience)
resilience	Runtime fault tolerance, timeouts, retries, graceful shutdown	Change safety (blast-radius), re-run safety (idempotency)
blast-radius	Change safety, dry-run, rollback, confirmation gates, batching	Runtime behavior (resilience), re-run (idempotency)

Skill Structure

skills/ops-review/
├── SKILL.md           # Agent instructions (workflow)
├── README.md          # User documentation
└── lenses/
    ├── README.md      # Lens index
    ├── secrets.md
    ├── shell-safety.md
    ├── blast-radius.md
    ├── privilege.md
    ├── idempotency.md
    ├── supply-chain.md
    ├── observability.md
    ├── nix-hygiene.md
    ├── resilience.md
    └── orchestration.md

Lenses deploy to ~/.config/lenses/ops/ via home-manager.

Workflow

Standard Mode

Target selection - files/directory to review
Pre-pass - Run static tools (shellcheck, statix, etc.)
Reference mapping - Build lightweight call graph (source, imports, ExecStart)
Lens execution - One pass per lens, tool output in context
Synthesis - Dedupe across lenses, rank by severity
Interactive review - User approves findings
Issue filing - bd create for approved items

Quick Mode (`--quick`)

Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege. Ideal for pre-commit or CI gates.

Output Format

Per-lens findings:

[LENS-TAG] <severity:HIGH|MED|LOW> <file:line>
Issue: <what's wrong>
Suggest: <how to fix>
Evidence: <why it matters>

Severity Rubric

Severity	Criteria
HIGH	Exploitable vulnerability, data loss risk, or will break on next run
MED	Reliability issue, tech debt, or violation of best practice
LOW	Polish, maintainability, or defense-in-depth improvement

Context matters: same issue may be HIGH in production, LOW in homelab.

Cross-File Awareness

Build a simple reference map before review:

Shell: source, . includes, invoked scripts
Nix: imports, flake inputs
CI: referenced scripts, env vars, secrets names
Compose: service dependencies, volumes, env files
systemd: ExecStart targets, dependencies

This enables finding issues in the seams between components.

Implementation Phases

Phase 1: Safety Net (High ROI, Low Ambiguity)

secrets - Non-negotiable, prevents catastrophes
shell-safety - Most brittle artifact type, shellcheck-backed
blast-radius - Where LLMs shine (understanding implications)
privilege - Highly actionable, high impact

Phase 2: Reliability Layer

idempotency - Essential for setup/deploy scripts
supply-chain - Critical for reproducibility
observability - Easy to check, high debugging value

Phase 3: Architecture Polish

nix-hygiene - statix/deadnix backed, LLM explains
resilience - Needs nuance to avoid bad advice
orchestration - Most complex, needs full context

Design Decisions

Linter-first, LLM-second: Static tools for syntax, LLM for semantics
Crisp lens boundaries: Each rule has one primary owner
Severity tied to impact: Not all violations are equal
Quick mode: Phase 1 for pre-commit/CI
Cross-file awareness: Grep-based reference mapping
Escape hatches: Intentional patterns can be flagged + suppressed

Success Criteria

Can review dotfiles/ and find real issues
Can review prox-setup/ and find real issues
Findings are actionable, not noise
Phase 1 lenses have <10% false positive rate
Integrates with existing bd issue tracking
Quick mode runs in <30 seconds

Open Questions (Resolved)

Question	Resolution
Nix: statix/deadnix or pure LLM?	Hybrid: Tools first, LLM interprets
Shell: integrate shellcheck?	Yes: Treat as compiler, LLM groups/prioritizes
Multi-file dependencies?	Grep-based reference map pre-pass
Quick mode?	Yes: Phase 1 lenses only
Prioritize across artifact types?	By risk: secrets/destructive ops first, not file type

References

Google SRE Book
OWASP Infrastructure Security
Consensus review: sonar, flash-or, gemini, gpt (2025-01-01)

8.8 KiB Raw Blame History