feat: add ops-review skill with Phase 1 lenses

Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD). Modeled on code-review with linter-first hybrid architecture. Phase 1 lenses (core safety): - secrets: credential exposure, Nix store, Docker layers, CI masking - shell-safety: shellcheck-backed, temp files, guard snippets - blast-radius: targeting/scoping, dry-run, rollback - privilege: least-privilege, containers, systemd sandboxing Design reviewed via orch consensus (sonar, flash-or, gemini, gpt). Lenses deploy to ~/.config/lenses/ops/ via home-manager. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 17:36:24 -08:00 · 2026-01-01 17:36:24 -08:00 · fb882a9434
parent 503053638a
commit fb882a9434
12 changed files with 1175 additions and 0 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
@ -34,6 +34,21 @@
 {"id":"skills-8y6","title":"Define skill versioning strategy","description":"Git SHA alone is insufficient. Need tuple approach:\n\n- skill_source_rev: git SHA (if available)\n- skill_content_hash: hash of SKILL.md + scripts\n- runtime_ref: flake.lock hash or Nix store path\n\nQuestions to resolve:\n- Do Protos pin to versions (stable but maintenance) or float on latest (risky)?\n- How to handle breaking changes in skills?\n- Record in wisp trace vs proto definition?\n\nFrom consensus: both models flagged versioning instability as high severity.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T19:49:30.839064445-05:00","updated_at":"2025-12-23T20:55:04.439779336-05:00","closed_at":"2025-12-23T20:55:04.439779336-05:00","close_reason":"ADRs revised with orch consensus feedback"}
 {"id":"skills-9af","title":"spec-review: Add spike/research task handling","description":"Tasks like 'Investigate X' can linger without clear outcomes.\n\nAdd to REVIEW_TASKS:\n- Flag research/spike tasks\n- Require timebox and concrete outputs (decision record, prototype, risks)\n- Pattern for handling unknowns","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:26.887719136-08:00","updated_at":"2025-12-15T14:08:13.441095034-08:00","closed_at":"2025-12-15T14:08:13.441095034-08:00"}
 {"id":"skills-9bc","title":"Investigate pre-compression hook for worklogs","description":"## Revised Understanding\n\nClaude Code already persists full conversation history in `~/.claude/projects/\u003cproject\u003e/\u003csession-id\u003e.jsonl`. Pre-compact hooks aren't needed for data capture.\n\n## Question\nWhat's the ideal workflow for generating worklogs from session data?\n\n## Options\n\n### 1. Post-session script\n- Run after exiting Claude Code\n- Reads most recent session JSONL\n- Generates worklog from conversation content\n- Pro: Async, doesn't interrupt flow\n- Con: May forget to run it\n\n### 2. On-demand slash command\n- `/worklog-from-session` or similar\n- Reads current session's JSONL file\n- Generates worklog with full context\n- Pro: Explicit control\n- Con: Still need to remember\n\n### 3. Pre-compact reminder\n- Hook prints reminder: \"Consider running /worklog\"\n- Doesn't automate, just nudges\n- Pro: Simple, non-intrusive\n- Con: Easy to dismiss\n\n### 4. Async batch processing\n- Process old sessions whenever\n- All data persists in JSONL files\n- Pro: No urgency, can do later\n- Con: Context may be stale\n\n## Data Format\nSession files contain:\n- User messages with timestamp\n- Assistant responses with model info\n- Tool calls and results\n- Git branch, cwd, version info\n\n## Next Steps\n- Decide preferred workflow\n- Build script to parse session JSONL → worklog format","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-17T14:32:32.568430817-08:00","updated_at":"2025-12-17T15:56:38.864916015-08:00","closed_at":"2025-12-17T15:56:38.864916015-08:00","close_reason":"Pivoted: worklogs may be redundant given full conversation persistence. New approach: make conversations searchable directly."}
+{"id":"skills-9cu","title":"ops-review skill","description":"Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD).\n\nBased on code-review pattern with linter-first hybrid architecture.\n\n## Phases\n- Phase 1: Skeleton + Core Safety (secrets, shell-safety, blast-radius, privilege)\n- Phase 2: Reliability (idempotency, supply-chain, observability)\n- Phase 3: Architecture (nix-hygiene, resilience, orchestration)\n\n## Design\nSee specs/ops-review/plan.md\n\n## Success Criteria\n- Review dotfiles/ and find real issues\n- Review prox-setup/ and find real issues\n- \u003c10% false positive rate on Phase 1\n- Quick mode \u003c30s","status":"open","priority":1,"issue_type":"epic","created_at":"2026-01-01T16:55:15.772440374-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:15.772440374-05:00"}
+{"id":"skills-9cu.1","title":"Create skill skeleton","description":"Create directory structure and base files:\n- skills/ops-review/SKILL.md (workflow, modeled on code-review)\n- skills/ops-review/README.md (user docs)\n- skills/ops-review/lenses/README.md (lens index)\n\nBlocks all lens work.","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:22.084083175-05:00","created_by":"dan","updated_at":"2026-01-01T17:08:20.384800582-05:00","closed_at":"2026-01-01T17:08:20.384800582-05:00","close_reason":"Created skeleton: SKILL.md, README.md, lenses/README.md","dependencies":[{"issue_id":"skills-9cu.1","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:22.095950548-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.10","title":"Lens: resilience","description":"Create resilience.md lens for fault tolerance:\n- Missing timeouts on network calls\n- No retries with backoff\n- Missing circuit breakers\n- No graceful shutdown (SIGTERM)\n- Missing resource limits\n\nBoundary: Owns runtime tolerance, NOT change safety","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.876125632-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.876125632-05:00","dependencies":[{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.878008563-05:00","created_by":"dan"},{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.881250755-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.11","title":"Lens: orchestration","description":"Create orchestration.md lens for execution ordering:\n- Unclear prerequisites\n- Missing order documentation\n- Circular dependencies\n- Assumed prior state\n- Implicit coupling\n\nMost complex - needs cross-file context","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:01.098528225-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:01.098528225-05:00","dependencies":[{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:01.100559128-05:00","created_by":"dan"},{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:01.104046552-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.12","title":"Integration: flake.nix + ai-skills.nix","description":"Add ops-review to deployment:\n- Add to flake.nix availableSkills\n- Update modules/ai-skills.nix for ops lens deployment\n- Deploy to ~/.config/lenses/ops/","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.324752872-05:00","created_by":"dan","updated_at":"2026-01-01T18:34:37.960786687-05:00","closed_at":"2026-01-01T18:34:37.960786687-05:00","close_reason":"Added ops-review to flake.nix availableSkills, updated ai-skills.nix with description and lens deployment to ~/.config/lenses/ops/","dependencies":[{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.339878541-05:00","created_by":"dan"},{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:13.34278836-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.13","title":"Validation: test on dotfiles","description":"Run Phase 1 lenses on ~/proj/dotfiles:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.489473975-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:13.489473975-05:00","dependencies":[{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.490574316-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.492551051-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.494453305-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.496395361-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.49824655-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.14","title":"Validation: test on prox-setup","description":"Run Phase 1 lenses on ~/proj/prox-setup:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.676548941-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:13.676548941-05:00","dependencies":[{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.677846482-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.680528791-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.683748368-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.68689222-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.689241654-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.2","title":"Lens: secrets","description":"Create secrets.md lens for credential hygiene:\n- Hardcoded secrets, API keys, tokens\n- SOPS config issues\n- Secrets in logs/error messages\n- Secrets via CLI args\n- Missing encryption\n\nLinter integration: gitleaks patterns","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.394704404-05:00","created_by":"dan","updated_at":"2026-01-01T17:12:01.063844363-05:00","closed_at":"2026-01-01T17:12:01.063844363-05:00","close_reason":"Created secrets.md lens with Nix store, Docker layer, CI masking checks. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.2","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.400663129-05:00","created_by":"dan"},{"issue_id":"skills-9cu.2","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.404368195-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.3","title":"Lens: shell-safety","description":"Create shell-safety.md lens (shellcheck-backed):\n- Missing set -euo pipefail\n- Unquoted variables (SC2086)\n- Unsafe command substitution\n- Missing error handling\n- Hardcoded paths\n\nLinter integration: shellcheck JSON output","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.596966874-05:00","created_by":"dan","updated_at":"2026-01-01T17:16:27.274701375-05:00","closed_at":"2026-01-01T17:16:27.274701375-05:00","close_reason":"Created shell-safety.md lens with temp file safety, input validation, set -e nuance, guard snippets. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.598340159-05:00","created_by":"dan"},{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.600733142-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.4","title":"Lens: blast-radius","description":"Create blast-radius.md lens for change safety:\n- Destructive ops without confirmation\n- Missing dry-run mode\n- No rollback strategy\n- Bulk ops without batching\n- Missing pre-flight checks\n\nLLM-primary: understanding implications","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.792059661-05:00","created_by":"dan","updated_at":"2026-01-01T17:24:07.972638831-05:00","closed_at":"2026-01-01T17:24:07.972638831-05:00","close_reason":"Created blast-radius.md with targeting/scoping, empty var expansion, env gates, scope in output, mitigation downgrades. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.793564277-05:00","created_by":"dan"},{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.796234701-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.5","title":"Lens: privilege","description":"Create privilege.md lens for least-privilege:\n- Unnecessary sudo/root\n- Containers as root\n- chmod 777 patterns\n- Missing capability drops\n- Docker socket mounting\n- systemd without sandboxing","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.996280533-05:00","created_by":"dan","updated_at":"2026-01-01T18:30:25.980656507-05:00","closed_at":"2026-01-01T18:30:25.980656507-05:00","close_reason":"Created privilege.md with network binding, setuid/setgid, K8s specifics, compensating controls, curl|sudo bash. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.999435334-05:00","created_by":"dan"},{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:36.004010491-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.6","title":"Lens: idempotency","description":"Create idempotency.md lens for safe re-execution:\n- Scripts that break on re-run\n- Missing existence checks\n- Non-atomic operations\n- Check-then-act race conditions\n- Missing cleanup on failure\n\nBoundary: Owns convergence, NOT rollback or retries","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.04397031-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.04397031-05:00","dependencies":[{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.061027066-05:00","created_by":"dan"},{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.065409149-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.7","title":"Lens: supply-chain","description":"Create supply-chain.md lens for provenance:\n- Unpinned versions (latest tags)\n- Actions not pinned to SHA\n- Missing flake.lock/SRI hashes\n- Unsigned artifacts\n- Untrusted registries","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.317966318-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.317966318-05:00","dependencies":[{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.319754113-05:00","created_by":"dan"},{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.322943568-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.8","title":"Lens: observability","description":"Create observability.md lens for visibility:\n- Silent failures\n- Missing health checks\n- Incomplete metrics\n- Missing structured logging\n- No correlation IDs","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.562009474-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.562009474-05:00","dependencies":[{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.564394694-05:00","created_by":"dan"},{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.571005731-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.9","title":"Lens: nix-hygiene","description":"Create nix-hygiene.md lens (statix/deadnix-backed):\n- Dead code (unused bindings)\n- Anti-patterns (with lib abuse, IFD)\n- Module boundary violations\n- Overlay issues\n- Missing option types\n\nLinter integration: statix + deadnix JSON","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.623672452-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.623672452-05:00","dependencies":[{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.638729349-05:00","created_by":"dan"},{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.643063075-05:00","created_by":"dan"}]}
 {"id":"skills-a0x","title":"spec-review: Add traceability requirements across artifacts","description":"Prompts don't enforce spec → plan → tasks linkage. Drift can occur without detection.\n\nAdd:\n- Require trace matrix or linkage in reviews\n- Each plan item should reference spec requirement\n- Each task should reference plan item\n- Flag unmapped items and extra scope","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:25.270581198-08:00","updated_at":"2025-12-15T14:05:48.196356786-08:00","closed_at":"2025-12-15T14:05:48.196356786-08:00"}
 {"id":"skills-a23","title":"Update main README to list all 9 skills","description":"Main README.md 'Skills Included' section only lists worklog and update-spec-kit. Repo actually has 9 skills: template, worklog, update-spec-kit, screenshot-latest, niri-window-capture, tufte-press, update-opencode, web-research, web-search.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:14.042397754-08:00","updated_at":"2025-12-28T22:08:02.074758486-05:00","closed_at":"2025-12-28T22:08:02.074758486-05:00","close_reason":"Updated README with table listing all 14 skills (5 deployed, 8 available, 1 development template)","dependencies":[{"issue_id":"skills-a23","depends_on_id":"skills-4yn","type":"blocks","created_at":"2025-11-30T12:01:30.306742184-08:00","created_by":"daemon","metadata":"{}"}]}
 {"id":"skills-al5","title":"Consider repo-setup-verification skill","description":"The dotfiles repo has a repo-setup-prompt.md verification checklist that could become a skill.\n\n**Source**: ~/proj/dotfiles/docs/repo-setup-prompt.md\n\n**What it does**:\n- Verifies .envrc has use_api_keys and skills loading\n- Checks .skills manifest exists with appropriate skills\n- Optionally checks beads setup\n- Verifies API keys are loaded\n\n**As a skill it could**:\n- Be invoked to audit any repo's agent setup\n- Offer to fix missing pieces\n- Provide consistent onboarding for new repos\n\n**Questions**:\n- Is this better as a skill vs a slash command?\n- Should it auto-fix or just report?\n- Does it belong in skills repo or dotfiles?","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T12:38:32.561337354-08:00","updated_at":"2025-12-28T22:22:57.639520516-05:00","closed_at":"2025-12-28T22:22:57.639520516-05:00","close_reason":"Decided: keep as prompt doc in dotfiles, not a skill. Claude can read it when asked. No wrapper benefit, and it's dotfiles-specific setup (not general skill). ai-tools-doctor handles version checking separately."}
--- a/docs/worklogs/2026-01-01-ops-review-skill-design-and-skeleton.org
+++ b/docs/worklogs/2026-01-01-ops-review-skill-design-and-skeleton.org
@ -0,0 +1,160 @@
+#+TITLE: ops-review Skill Design, Orch Consensus Planning, and Skeleton Implementation
+#+DATE: 2026-01-01
+#+KEYWORDS: ops-review, skill-design, orch-consensus, lenses, infrastructure-review, nix, shell-safety, secrets
+#+COMMITS: 0 (uncommitted work in progress)
+#+COMPRESSION_STATUS: uncompressed
+
+* Session Summary
+** Date: 2026-01-01
+** Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis
+
+* Accomplishments
+- [X] Explored dotfiles and prox-setup repos to understand actual ops artifact landscape
+- [X] Designed ops-review skill with 10 lenses across 3 phases
+- [X] Ran orch consensus (sonar, flash-or, gemini, gpt) on initial plan
+- [X] Incorporated consensus feedback: linter-first hybrid architecture, crisp lens boundaries
+- [X] Created comprehensive plan.md in specs/ops-review/
+- [X] Created bd epic (skills-9cu) with 14 child tasks, proper dependency graph
+- [X] Built skill skeleton: SKILL.md, README.md, lenses/README.md
+- [X] Drafted secrets.md lens with orch consensus review
+- [X] Incorporated Nix store exposure, Docker layer persistence, CI masking feedback
+- [X] Filed follow-up issue in dotfiles for gitleaks availability (dotfiles-x2m)
+- [ ] Remaining Phase 1 lenses: shell-safety, blast-radius, privilege
+
+* Key Decisions
+** Decision 1: Linter-first hybrid architecture
+- Context: How should ops-review analyze infrastructure code?
+- Options considered:
+  1. Pure LLM analysis - flexible but prone to syntax hallucinations
+  2. Pure linter - deterministic but misses semantic issues
+  3. Hybrid: linters first, LLM interprets - best of both
+- Rationale: All 4 consensus models agreed LLMs hallucinate syntax but excel at understanding intent. Static tools catch syntax, LLM finds logic bugs.
+- Impact: Each lens integrates with specific tools (shellcheck, statix, gitleaks)
+
+** Decision 2: 10 lenses across 3 phases
+- Context: How many lenses and how to prioritize?
+- Initial proposal: 8 lenses
+- Consensus feedback: Add privilege (least-privilege) and supply-chain (pinning)
+- Phase 1 (quick mode): secrets, shell-safety, blast-radius, privilege
+- Phase 2: idempotency, supply-chain, observability
+- Phase 3: nix-hygiene, resilience, orchestration
+
+** Decision 3: Crisp lens boundaries to avoid duplicate findings
+- Problem: resilience/blast-radius/idempotency overlap
+- Solution: Define ownership table
+  - idempotency: safe re-run, convergence, atomic writes
+  - resilience: runtime fault tolerance, timeouts, retries
+  - blast-radius: change safety, dry-run, rollback
+
+** Decision 4: Nix-specific checks as first-class concerns
+- Context: Nix has unique security model (world-readable store)
+- Insight from consensus: Secrets in .nix strings become readable in /nix/store
+- Added to secrets lens: explicit Nix store exposure check
+- Remediation: sops-nix/agenix with runtime paths, not embedded strings
+
+* Problems & Solutions
+| Problem | Solution | Learning |
+|---------+----------+----------|
+| Initial lens drafts too long (60+ lines) | Reference existing code-review lenses (~45 lines) | Consistent format matters for usability |
+| Overlapping lens scopes | Created "Crisp Boundaries" table in plan | Define ownership explicitly upfront |
+| What lenses are actually needed? | Explored real repos (dotfiles, prox-setup) | Ground design in actual artifacts |
+| False positive risk in secrets lens | Added explicit exemptions (Nix hashes, public keys) | Two-signal rule for generic matches |
+
+* Technical Details
+
+** Code Changes
+- Total files created: 5
+- Key files:
+  - `specs/ops-review/plan.md` (261 lines) - Comprehensive design document
+  - `skills/ops-review/SKILL.md` (188 lines) - Agent workflow instructions
+  - `skills/ops-review/README.md` (96 lines) - User documentation
+  - `skills/ops-review/lenses/README.md` (85 lines) - Lens index
+  - `skills/ops-review/lenses/secrets.md` (53 lines) - First lens
+
+** Commands Used
+#+begin_src bash
+# Explored actual infrastructure repos
+# (via Task tool with Explore subagent)
+
+# Ran multi-model consensus for plan review
+uv run orch consensus "Review this ops-review skill design..." sonar flash-or gemini gpt
+
+# Created bd epic with hierarchical children
+bd create "ops-review skill" --type=epic -p 1 --description "..."
+bd create "Lens: secrets" --parent skills-9cu -p 1 --deps skills-9cu.1
+
+# Visualized dependency graph
+bd graph skills-9cu
+
+# Checked available work
+bd ready
+#+end_src
+
+** Architecture Notes
+- Skill follows code-review pattern: lenses as focused prompts
+- Lenses deploy to ~/.config/lenses/ops/ via home-manager
+- Quick mode (--quick) runs Phase 1 only for CI/pre-commit
+- Cross-file awareness via grep-based reference mapping (source, imports)
+
+* Process and Workflow
+
+** What Worked Well
+- Exploring real repos first grounded the design in actual needs
+- orch consensus with 4 models surfaced gaps (Nix store, Docker layers)
+- bd epic with --parent creates clean hierarchical structure
+- Dependency graph visualization helped verify task ordering
+
+** What Was Challenging
+- Balancing lens completeness with ~45 line target format
+- Deciding which checks are linter-backed vs LLM-primary
+- Managing context across long design session
+
+* Learning and Insights
+
+** Technical Insights
+- Nix store world-readability is a critical security consideration
+- Docker ENV/ARG persist in image layers even if later deleted
+- CI masking (::add-mask::) is often overlooked
+- shellcheck, statix, gitleaks provide structured JSON output for integration
+
+** Process Insights
+- orch consensus is valuable for pressure-testing designs
+- High temp for brainstorming, low temp for analysis decisions
+- bd hierarchical children (.1, .2, etc.) work well for epic breakdown
+
+** Architectural Insights
+- Linter-first hybrid is emerging pattern (doc-review also uses this)
+- Lens boundaries must be explicit to avoid duplicate findings
+- Platform-specific remediation matters (sops-nix vs BuildKit secrets)
+
+* Context for Future Work
+
+** Open Questions
+- Should ops-review have its own lens directory or share with code-review?
+- How to handle cross-repo awareness (dotfiles uses sops, prox-setup uses passage)?
+- Should we run linters in parallel before LLM pass?
+
+** Next Steps
+- Complete Phase 1 lenses: shell-safety, blast-radius, privilege
+- Integration: add to flake.nix, update ai-skills.nix
+- Validation: test on dotfiles and prox-setup repos
+- Ensure gitleaks available (dotfiles-x2m)
+
+** Related Work
+- [[file:2025-12-28-code-review-skill-creation-worklog-cleanup.org][Code Review Skill Creation]] - Original lens pattern
+- [[file:2025-12-04-doc-review-skill-design.org][Doc-Review Skill Design]] - Hybrid architecture precedent
+- [[file:2025-12-26-multi-lens-code-review-workflow-testing.org][Multi-Lens Code Review Testing]] - LLM-in-the-loop pattern
+
+* Raw Notes
+- Dotfiles repo: 100+ Nix modules, 90+ shell scripts, SOPS secrets, Gitea Actions
+- Prox-setup repo: 88 Python scripts (Proxmox API), 41 shell scripts, Docker Compose
+- Models consulted: sonar, flash-or, gemini, gpt (all 4 supported the design)
+- Key insight from GPT: "Require two signals for MED/HIGH when not using known token format"
+- All models emphasized: don't flag Nix hashes (sha256-, narHash, vendorHash)
+
+* Session Metrics
+- Commits made: 0 (work in progress)
+- Files created: 5
+- Lines added: ~683 (plan.md + skill files + lens)
+- bd issues created: 16 (1 epic + 14 children + 1 in dotfiles)
+- orch consensus runs: 2
--- a/flake.nix
+++ b/flake.nix
@ -15,7 +15,9 @@
      availableSkills = [
        "bd-issue-tracking"
        "code-review"
+        "doc-review"
        "niri-window-capture"
+        "ops-review"
        "orch"
        "screenshot-latest"
        "spec-review"
--- a/modules/ai-skills.nix
+++ b/modules/ai-skills.nix
@ -12,6 +12,7 @@ let
    Available skills:
    - code-review: Multi-lens code review with issue filing
    - niri-window-capture: Invisibly capture window screenshots
+    - ops-review: Multi-lens ops/infrastructure review
    - screenshot-latest: Find latest screenshots
    - tufte-press: Generate study card JSON
    - worklog: Create org-mode worklogs
@ -94,6 +95,11 @@ in {
          source = "${cfg.skillsPath}/code-review/lenses";
          recursive = true;
        };
+        # Ops lenses in separate subdirectory
+        ".config/lenses/ops" = {
+          source = "${cfg.skillsPath}/ops-review/lenses";
+          recursive = true;
+        };
      })

      # Workflows (beads protos)
--- a/skills/ops-review/README.md
+++ b/skills/ops-review/README.md
@ -0,0 +1,121 @@
+# ops-review
+
+Multi-lens review for operational infrastructure. Finds security issues, shell script bugs, and reliability problems in your Nix configs, shell scripts, Docker files, and CI/CD pipelines.
+
+## Quick Start
+
+**Claude Code / OpenCode:**
+```
+/ops-review bin/deploy.sh
+```
+
+The agent reviews your ops files and presents findings for approval before filing any issues.
+
+## What It Reviews
+
+| Artifact | Examples |
+|----------|----------|
+| Nix/NixOS | flake.nix, modules/*.nix, home-manager configs |
+| Shell Scripts | bin/*.sh, setup_*.sh, deploy.sh |
+| Containers | Dockerfile, docker-compose.yml |
+| CI/CD | .github/workflows/*.yml, .gitea/workflows/*.yml |
+| Services | systemd units, cron jobs |
+
+## How It Works
+
+**Linter-first hybrid**: Static tools catch syntax issues, LLM finds semantic problems.
+
+```
+shellcheck ──┐
+statix    ───┼──► LLM interprets + finds logic bugs ──► Findings
+hadolint  ───┘
+```
+
+## Available Lenses
+
+### Phase 1: Core Safety (quick mode)
+- **secrets** - Hardcoded credentials, SOPS issues
+- **shell-safety** - set -euo pipefail, quoting, error handling
+- **blast-radius** - Destructive ops, missing dry-run
+- **privilege** - Unnecessary sudo, root containers
+
+### Phase 2: Reliability
+- **idempotency** - Safe re-run, atomic operations
+- **supply-chain** - Unpinned versions, missing hashes
+- **observability** - Silent failures, missing health checks
+
+### Phase 3: Architecture
+- **nix-hygiene** - Dead code, anti-patterns
+- **resilience** - Timeouts, retries, resource limits
+- **orchestration** - Execution order, prerequisites
+
+## Usage Examples
+
+```bash
+# Review a single script
+/ops-review deploy.sh
+
+# Review a directory
+/ops-review bin/
+
+# Quick mode (Phase 1 only, fast)
+/ops-review --quick bin/
+
+# Review recent changes
+/ops-review
+```
+
+## Example Output
+
+```
+## Review Summary: bin/deploy.sh
+
+| Severity | Count |
+|----------|-------|
+| HIGH     | 2     |
+| MED      | 3     |
+
+### Top Issues
+
+1. [SECRETS] HIGH bin/deploy.sh:45
+   Issue: API token passed as CLI argument
+   Suggest: Use environment variable instead
+
+2. [BLAST-RADIUS] HIGH bin/deploy.sh:78
+   Issue: rm -rf with variable that could be empty
+   Suggest: Add guard: [ -n "$DIR" ] || exit 1
+
+Would you like me to file any of these as beads issues?
+```
+
+## Prerequisites
+
+For full functionality, install these linters:
+
+```bash
+# NixOS (add to configuration.nix or home-manager)
+shellcheck
+statix
+deadnix
+hadolint
+```
+
+The skill works without them but provides richer analysis with linter output.
+
+## Configuration
+
+No configuration required. The skill auto-detects file types and applies appropriate lenses.
+
+## Integration
+
+- **Issue Tracking**: Files findings as beads issues (`bd create`)
+- **CI/CD**: Use `--quick` mode for pre-commit/pipeline gates
+
+## See Also
+
+- [code-review](../code-review/README.md) - Application code review
+- [doc-review](../doc-review/README.md) - Documentation quality
+
+## License
+
+MIT
--- a/skills/ops-review/SKILL.md
+++ b/skills/ops-review/SKILL.md
@ -0,0 +1,246 @@
+---
+name: ops-review
+description: Run multi-lens ops review on infrastructure files. Analyzes Nix, shell scripts, Docker, CI/CD for secrets, shell-safety, blast-radius, privilege, idempotency, supply-chain, observability, nix-hygiene, resilience, and orchestration. Interactive - asks before filing issues.
+---
+
+# Ops Review Skill
+
+Run focused infrastructure analysis using multiple review lenses. Uses a linter-first hybrid approach: static tools for syntax, LLM for semantics. Findings are synthesized and presented for your approval before any issues are filed.
+
+## When to Use
+
+Invoke this skill when:
+- "Review my infrastructure"
+- "Run ops review on bin/"
+- "Check this script for issues"
+- "Analyze my Nix configs"
+- `/ops-review`
+
+## Arguments
+
+The skill accepts an optional target:
+- `/ops-review` - Reviews recently changed ops files (git diff)
+- `/ops-review bin/` - Reviews specific directory
+- `/ops-review deploy.sh` - Reviews specific file
+- `/ops-review --quick` - Phase 1 lenses only (fast, <30s)
+
+## Target Artifacts
+
+| Category | File Patterns |
+|----------|---------------|
+| Nix/NixOS | `*.nix`, `flake.nix`, `flake.lock` |
+| Shell Scripts | `*.sh`, files with `#!/bin/bash` shebang |
+| Python Automation | `*.py` in ops contexts (scripts/, setup/, deploy/) |
+| Container Configs | `Dockerfile`, `docker-compose.yml`, `*.dockerfile` |
+| CI/CD | `.github/workflows/*.yml`, `.gitea/workflows/*.yml` |
+| Service Configs | `*.service`, `*.timer`, systemd units |
+| Secrets | `.sops.yaml`, `secrets.yaml`, SOPS-encrypted files |
+
+## Architecture: Linter-First Hybrid
+
+```
+Stage 1: Static Tools (fast, deterministic)
+├── shellcheck for shell scripts
+├── statix + deadnix for Nix
+├── hadolint for Dockerfiles
+└── yamllint for YAML configs
+
+Stage 2: LLM Analysis (semantic, contextual)
+├── Interprets tool output in context
+├── Finds logic bugs tools miss
+├── Synthesizes cross-file issues
+└── Suggests actionable fixes
+```
+
+## Available Lenses
+
+Lenses are focused review prompts located in `~/.config/lenses/ops/`:
+
+### Phase 1: Core Safety (--quick mode)
+
+| Lens | Focus |
+|------|-------|
+| `secrets.md` | Hardcoded credentials, SOPS issues, secrets in logs |
+| `shell-safety.md` | set -euo pipefail, quoting, error handling (shellcheck-backed) |
+| `blast-radius.md` | Destructive ops, missing dry-run, no rollback |
+| `privilege.md` | Unnecessary sudo, root containers, chmod 777 |
+
+### Phase 2: Reliability
+
+| Lens | Focus |
+|------|-------|
+| `idempotency.md` | Safe re-run, existence checks, atomic operations |
+| `supply-chain.md` | Unpinned versions, missing SRI hashes, action SHAs |
+| `observability.md` | Silent failures, missing health checks, no logging |
+
+### Phase 3: Architecture
+
+| Lens | Focus |
+|------|-------|
+| `nix-hygiene.md` | Dead code, anti-patterns, module boundaries (statix-backed) |
+| `resilience.md` | Timeouts, retries, graceful shutdown, resource limits |
+| `orchestration.md` | Execution order, prerequisites, implicit coupling |
+
+## Workflow
+
+### Phase 1: Target Selection
+1. Parse the target argument (default: git diff of uncommitted ops files)
+2. Identify files by category (Nix, shell, Docker, etc.)
+3. Show file list to user for confirmation
+
+### Phase 2: Pre-Pass (Static Tools)
+Run appropriate linters based on file type:
+```bash
+# Shell scripts
+shellcheck --format=json script.sh
+
+# Nix files
+statix check --format=json file.nix
+deadnix --output-format=json file.nix
+
+# Dockerfiles
+hadolint --format json Dockerfile
+```
+
+### Phase 3: Lens Execution
+For each lens, analyze the target files with tool output in context:
+
+1. Read the lens prompt from `~/.config/lenses/ops/{lens}.md`
+2. Include relevant linter output as evidence
+3. Apply the lens to find semantic issues tools miss
+4. Collect findings in structured format
+
+**Finding Format:**
+```
+[TAG] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what's wrong>
+Suggest: <how to fix>
+Evidence: <why it matters>
+```
+
+### Phase 4: Synthesis
+After all lenses complete:
+1. Deduplicate overlapping findings (same issue from multiple lenses)
+2. Group related issues
+3. Rank by severity and confidence
+4. Generate summary report
+
+### Phase 5: Interactive Review
+Present findings to user:
+1. Show executive summary (counts by severity)
+2. List top issues with details
+3. Ask: "Which findings should I file as issues?"
+
+**User can respond:**
+- "File all" - creates beads issues for everything
+- "File HIGH only" - filters by severity
+- "File 1, 3, 5" - specific findings
+- "None" - just keep the report
+- "Let me review first" - show full details
+
+### Phase 6: Issue Filing (if requested)
+For approved findings:
+1. Create beads issues with `bd create`
+2. Include lens tag, severity, file location
+3. Link related issues if applicable
+
+## Output
+
+The skill produces:
+1. **Console summary** - immediate feedback
+2. **Beads issues** - if user approves filing
+
+## Severity Rubric
+
+| Severity | Criteria |
+|----------|----------|
+| **HIGH** | Exploitable vulnerability, data loss risk, will break on next run |
+| **MED** | Reliability issue, tech debt, violation of best practice |
+| **LOW** | Polish, maintainability, defense-in-depth improvement |
+
+Context matters: same issue may be HIGH in production, LOW in homelab.
+
+## Example Session
+
+```
+User: /ops-review bin/deploy.sh
+
+Agent: I'll review bin/deploy.sh with ops lenses.
+
+[Running shellcheck...]
+[Running secrets lens...]
+[Running shell-safety lens...]
+[Running blast-radius lens...]
+[Running privilege lens...]
+
+## Review Summary: bin/deploy.sh
+
+| Severity | Count |
+|----------|-------|
+| HIGH     | 2     |
+| MED      | 3     |
+| LOW      | 1     |
+
+### Top Issues
+
+1. [SECRETS] HIGH bin/deploy.sh:45
+   Issue: API token passed as command-line argument (visible in process list)
+   Suggest: Use environment variable or file with restricted permissions
+
+2. [BLAST-RADIUS] HIGH bin/deploy.sh:78
+   Issue: rm -rf with variable that could be empty
+   Suggest: Add guard: [ -n "$DIR" ] || exit 1
+
+3. [SHELL-SAFETY] MED bin/deploy.sh:12
+   Issue: Missing 'set -euo pipefail'
+   Suggest: Add at top of script for fail-fast behavior
+
+Would you like me to file any of these as beads issues?
+Options: all, HIGH only, specific numbers (1,2,3), or none
+```
+
+## Quick Mode
+
+Use `--quick` for fast pre-commit checks:
+- Runs only Phase 1 lenses (secrets, shell-safety, blast-radius, privilege)
+- Target: <30 seconds
+- Ideal for CI gates
+
+## Cross-File Awareness
+
+Before review, build a reference map:
+- **Shell**: `source`, `.` includes, invoked scripts
+- **Nix**: imports, flake inputs
+- **CI**: referenced scripts, env vars, secrets names
+- **Compose**: service dependencies, volumes, env files
+- **systemd**: ExecStart targets, dependencies
+
+This enables finding issues in the seams between components.
+
+## Guidelines
+
+1. **Linter-First** - Always run static tools before LLM analysis
+2. **Evidence Over Opinion** - Cite linter output and specific lines
+3. **Actionable Suggestions** - Every finding needs a clear fix
+4. **Respect User Time** - Summarize first, details on request
+5. **No Spam** - Don't file issues without explicit approval
+6. **Context Matters** - Homelab ≠ production severity
+
+## Process Checklist
+
+1. [ ] Parse target (files/directory/diff)
+2. [ ] Confirm scope with user if large (>10 files)
+3. [ ] Run static tools (shellcheck, statix, etc.)
+4. [ ] Build reference map for cross-file awareness
+5. [ ] Run each lens, collecting findings
+6. [ ] Deduplicate and rank findings
+7. [ ] Present summary to user
+8. [ ] Ask which findings to file
+9. [ ] Create beads issues for approved findings
+10. [ ] Report issue IDs created
+
+## Integration
+
+- **Lenses**: Read from `~/.config/lenses/ops/*.md`
+- **Issue Tracking**: Uses `bd create` for beads issues
+- **Static Tools**: shellcheck, statix, deadnix, hadolint
--- a/skills/ops-review/lenses/README.md
+++ b/skills/ops-review/lenses/README.md
@ -0,0 +1,76 @@
+# ops-review Lenses
+
+Focused review prompts for operational infrastructure analysis.
+
+## Architecture
+
+**Linter-first hybrid**: Each lens works with static tool output when available.
+
+```
+Static Tools (syntax)     LLM Lens (semantics)
+─────────────────────     ───────────────────
+shellcheck ──────────────► shell-safety.md
+statix + deadnix ────────► nix-hygiene.md
+hadolint ────────────────► (container checks)
+gitleaks patterns ───────► secrets.md
+```
+
+## Available Lenses
+
+### Phase 1: Core Safety
+
+| Lens | Focus | Linter |
+|------|-------|--------|
+| [secrets.md](secrets.md) | Credentials, SOPS, secrets in logs | gitleaks |
+| [shell-safety.md](shell-safety.md) | Error handling, quoting, pipefail | shellcheck |
+| [blast-radius.md](blast-radius.md) | Destructive ops, rollback, dry-run | LLM-primary |
+| [privilege.md](privilege.md) | Least privilege, sudo, capabilities | LLM-primary |
+
+### Phase 2: Reliability
+
+| Lens | Focus | Linter |
+|------|-------|--------|
+| [idempotency.md](idempotency.md) | Safe re-run, atomic ops | LLM-primary |
+| [supply-chain.md](supply-chain.md) | Pinning, SRI hashes, provenance | LLM-primary |
+| [observability.md](observability.md) | Logging, health checks, metrics | LLM-primary |
+
+### Phase 3: Architecture
+
+| Lens | Focus | Linter |
+|------|-------|--------|
+| [nix-hygiene.md](nix-hygiene.md) | Dead code, anti-patterns, modules | statix, deadnix |
+| [resilience.md](resilience.md) | Timeouts, retries, limits | LLM-primary |
+| [orchestration.md](orchestration.md) | Ordering, prerequisites, coupling | LLM-primary |
+
+## Lens Boundaries
+
+To avoid duplicate findings:
+
+| Lens | Owns | Does NOT Own |
+|------|------|--------------|
+| **idempotency** | Safe re-run, convergence, atomic writes | Rollback, retries |
+| **resilience** | Runtime fault tolerance, timeouts, retries | Change safety, re-run |
+| **blast-radius** | Change safety, dry-run, rollback, batching | Runtime behavior |
+
+## Output Format
+
+All lenses use consistent output:
+
+```
+[TAG] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what's wrong>
+Suggest: <how to fix>
+Evidence: <why it matters>
+```
+
+## Severity Guidelines
+
+| Severity | Criteria |
+|----------|----------|
+| **HIGH** | Exploitable vulnerability, data loss, will break on re-run |
+| **MED** | Reliability issue, tech debt, best practice violation |
+| **LOW** | Polish, maintainability, defense-in-depth |
+
+## Deployment
+
+Lenses are deployed to `~/.config/lenses/ops/` via home-manager.
--- a/skills/ops-review/lenses/blast-radius.md
+++ b/skills/ops-review/lenses/blast-radius.md
@ -0,0 +1,67 @@
+# Blast Radius Review Lens
+
+Review operational scripts for **change safety, risk containment, and reversibility**.
+
+## What to Look For
+
+### Targeting & Scoping
+- Wrong or ambient context: relying on current kubectl context, AWS profile, gcloud project
+- Missing explicit flags: `--namespace`, `--context`, `--region`, `--project`
+- No environment gates: prod operations without `CONFIRM_PROD=1` or `--env prod`
+- Hardcoded production targets without verification
+
+### Destructive Operations
+- `rm -rf`, `DROP TABLE`, `docker system prune` without confirmation
+- Empty variable expansion: `rm -rf $DIR/` when DIR could be empty (use `${DIR:?}`)
+- Bulk deletes without limits or batching
+- Operations that cannot be undone without backup/snapshot first
+
+### Missing Dry-Run Mode
+- Scripts that modify state without `--dry-run` or `--check` flag
+- No preview before execution (`kubectl diff`, `terraform plan`)
+- Destructive defaults (should require explicit `--apply` or `--force`)
+
+### Rollback & Recovery
+- No backup/snapshot before risky changes
+- Missing rollback instructions or automation
+- Note: Nix/NixOS has generation rollback - verify scripts use `nixos-rebuild` properly
+- Database migrations without down/rollback path
+
+### Pre-flight Checks
+- Missing connectivity/auth verification before bulk operations
+- No target verification (`kubectl config current-context`, `aws sts get-caller-identity`)
+- Missing dependency checks (required tools, permissions, disk space)
+
+### Bulk Operations
+- All-at-once without batching or progressive rollout
+- No pause/resume capability for long-running operations
+- Missing locking to prevent concurrent runs (`flock`)
+
+## Output Format
+
+```
+[BLAST] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what could go wrong>
+Scope: <single file | service | host | fleet | cluster>
+Suggest: <add dry-run, confirmation, backup, scoping, etc.>
+Evidence: <destructive command or pattern identified>
+```
+
+## Mitigations That Reduce Severity
+
+If these are present, consider downgrading:
+- Explicit backup/snapshot step immediately prior
+- Dry-run/plan output with explicit apply gate
+- Narrow scope (specific namespace, labeled resources)
+- Confirmation prompt for interactive use
+- Running on ephemeral/test resources
+
+## Guidelines
+
+- **HIGH** = data loss or outage, broad scope, no recovery path
+- **MED** = risky operation without safety nets, narrow scope
+- **LOW** = missing best practice, ephemeral/test targets
+- Focus on *implications*: what's the worst case? Can we recover?
+- Context matters: `rm -rf /tmp/cache` is LOW, `rm -rf /data/$VAR` is HIGH
+- Consider: unattended (cron/CI) operations need stricter gates
+- Nix/NixOS: acknowledge generation rollback when applicable
--- a/skills/ops-review/lenses/privilege.md
+++ b/skills/ops-review/lenses/privilege.md
@ -0,0 +1,93 @@
+# Privilege Review Lens
+
+Review operational infrastructure for **least-privilege violations and excessive permissions**.
+
+## What to Look For
+
+### Root & Sudo Usage
+- Scripts running as root when not necessary
+- `sudo` for operations that don't require it
+- `curl ... | sudo bash` - dangerous remote execution pattern
+- `NOPASSWD` sudo rules with broad commands or wildcards
+- Missing privilege drop after initial setup
+
+### Container Privileges
+- Containers running as root (`USER` not set)
+- `privileged: true` in Docker/Compose/Kubernetes
+- Docker socket mounting (`/var/run/docker.sock`)
+- Missing capability drops (`--cap-drop=ALL`)
+- Host namespace usage: `--pid=host`, `--network=host`, `--ipc=host`
+- K8s: `allowPrivilegeEscalation: true`, `hostPath` mounts, missing `runAsNonRoot`
+
+### File & Binary Permissions
+- `chmod 777` or `chmod 666` (world-writable)
+- Secrets/keys with permissions broader than `0600`
+- setuid/setgid bits on custom binaries (`chmod u+s`, `chmod g+s`)
+- Writable paths in root's `$PATH` or systemd unit locations
+
+### Network Binding
+- Services binding `0.0.0.0` when `127.0.0.1` suffices
+- Database/admin ports exposed globally in Docker Compose
+- Binding low ports (<1024) as root instead of using capabilities
+
+### systemd Sandboxing
+- Missing `ProtectSystem=strict` (or `full` if strict breaks app)
+- Missing `ProtectHome=yes`, `PrivateTmp=yes`, `NoNewPrivileges=yes`
+- `User=root` when service could run unprivileged
+- Missing `CapabilityBoundingSet=` restrictions
+- For low ports: use `AmbientCapabilities=CAP_NET_BIND_SERVICE` instead of root
+
+### Nix/NixOS Specific
+- Secrets in Nix store (world-readable!) - use sops-nix/agenix instead
+- Services without `DynamicUser=yes` when applicable
+- Missing `StateDirectory=`, `CacheDirectory=` (proper isolation)
+- Overly permissive `security.sudo.extraRules`
+
+## Output Format
+
+```
+[PRIVILEGE] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what excessive permission exists>
+Suggest: <specific least-privilege alternative>
+Evidence: <permission pattern or config found>
+```
+
+## Compensating Controls
+
+Downgrade severity if these are present:
+- Container: `cap_drop=ALL` + specific `cap_add`, `read_only=true`, `no-new-privileges`
+- systemd: `ProtectSystem`, `PrivateTmp`, capability restrictions
+- Explicit justification comment for necessary privileges
+
+## Common Fixes
+
+```yaml
+# Docker Compose: Least privilege
+user: "1000:1000"
+read_only: true
+security_opt:
+  - no-new-privileges:true
+cap_drop: [ALL]
+cap_add: [NET_BIND_SERVICE]  # only what's needed
+```
+
+```ini
+# systemd: Hardened service
+[Service]
+User=myservice
+ProtectSystem=strict
+ProtectHome=yes
+PrivateTmp=yes
+NoNewPrivileges=yes
+CapabilityBoundingSet=
+AmbientCapabilities=CAP_NET_BIND_SERVICE  # for low ports
+```
+
+## Guidelines
+
+- **HIGH** = root/privileged without justification, docker.sock mount, world-writable sensitive files
+- **MED** = missing sandboxing, broad sudo, root with some restrictions
+- **LOW** = could be tighter but has compensating controls
+- Ask: "What's the minimum permission needed?"
+- Consider compensating controls before flagging HIGH
+- Nix store is world-readable - secrets there are HIGH severity
--- a/skills/ops-review/lenses/secrets.md
+++ b/skills/ops-review/lenses/secrets.md
@ -0,0 +1,53 @@
+# Secrets Review Lens
+
+Review operational infrastructure for **credential exposure and secrets hygiene**.
+
+## What to Look For
+
+### Hardcoded Credentials & Store Leaks
+- API keys, tokens, passwords in source files
+- SSH private keys (`BEGIN PRIVATE KEY`, `BEGIN RSA PRIVATE KEY`)
+- **Nix**: Secrets in `.nix` strings, `writeText`, `environment.etc.*.text` (world-readable in /nix/store)
+- **Docker**: Secrets in `ENV` or `ARG` instructions (persist in image layers/history)
+
+### Secrets in Unsafe Channels
+- Credentials passed as CLI arguments (visible in `ps`)
+- Secrets in `export` statements in shell scripts
+- Tokens in URLs, query parameters, or connection strings
+- Docker `build-arg` for sensitive values
+
+### Logging & CI Exposure
+- `set -x` in scripts that handle credentials
+- Secrets echoed to stdout/stderr or logs
+- Missing CI secret masking (GitHub `::add-mask::`, GitLab masked vars)
+- Debug flags that leak secrets (`curl -v`, `--debug`)
+
+### SOPS & Encryption Issues
+- Plaintext files that should use SOPS (secrets.yaml, credentials.json)
+- Missing `.sops.yaml` when encrypted files present
+- Overly broad SOPS `creation_rules` access
+
+## Linter Integration
+
+```bash
+gitleaks detect --source . --report-format json
+```
+
+## Output Format
+
+```
+[SECRETS] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what credential is exposed and via what channel>
+Suggest: <sops-nix, Docker BuildKit secrets, env file with 0600, etc.>
+Evidence: <pattern match or context>
+```
+
+## Guidelines
+
+- **HIGH** = credential in repo, Nix store, Docker layer, or logs
+- **MED** = credential in risky channel (CLI arg, build arg, unmasked CI)
+- **LOW** = missing encryption best practice
+- **Keywords**: `*_KEY`, `*_TOKEN`, `*_SECRET`, `*_PASSWORD`, `*_CREDENTIAL`
+- **Ignore**: Nix hashes (`sha256-`, `narHash`, `vendorHash`), public keys, checksums, UUIDs, placeholders (`REPLACE_ME`, `changeme`, `example`)
+- **Nix remediation**: Use `sops-nix` or `agenix`, reference via runtime paths not embedded strings
+- **Docker remediation**: Use BuildKit `--mount=type=secret`, avoid `ENV` for secrets
--- a/skills/ops-review/lenses/shell-safety.md
+++ b/skills/ops-review/lenses/shell-safety.md
@ -0,0 +1,76 @@
+# Shell Safety Review Lens
+
+Review shell scripts for **robustness, error handling, and defensive patterns**.
+
+## What to Look For
+
+### Error Handling
+- Missing error strategy: `set -euo pipefail` OR explicit checks per command
+- Note: `set -e` has edge cases (conditionals, `||`, subshells) - explicit checks often safer
+- Unchecked return codes from critical operations (file ops, network, root commands)
+- Missing `trap` for cleanup on exit/error
+- Pipes hiding exit codes without `pipefail` or `PIPESTATUS` checks
+
+### Variable & Input Safety
+- Unquoted variables in commands (SC2086: word splitting)
+- Variables used before assignment or without defaults (`${VAR:-default}`)
+- Missing input validation: required args, file existence, numeric checks
+- `read` without `IFS= read -r` (SC2162: backslash/whitespace bugs)
+
+### Command Safety
+- Unsafe `cd` without checking: use `cd dir || exit 1`
+- `rm -rf` with unguarded variables: use `${VAR:?}` or explicit checks
+- Dangerous primitives: `eval`, `source` of non-constant paths, `curl | sh`
+- Missing `--` to separate options from arguments
+
+### Temp Files & Atomicity
+- Hardcoded temp paths (`/tmp/foo`) instead of `mktemp`
+- Predictable temp names (`/tmp/script.$$`) - use `mktemp -d`
+- Missing cleanup of temp files on exit
+
+## Linter Integration
+
+```bash
+shellcheck -x --format=json --severity=style script.sh
+```
+
+Key codes for ops safety:
+- SC2086: Double quote to prevent splitting
+- SC2164: Use `cd ... || exit`
+- SC2015: `A && B || C` logic error (C runs if B fails too)
+- SC2162: `read` without `-r`
+- SC2155: Declare and assign separately (masked return values)
+
+## Output Format
+
+```
+[SHELL] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what's unsafe and why>
+Suggest: <specific fix with code example>
+Evidence: <shellcheck code or pattern matched>
+```
+
+## Common Fixes
+
+```bash
+# Safe rm pattern
+: "${TARGET:?TARGET must be set}"
+rm -rf -- "$TARGET"
+
+# Safe cd pattern
+cd -- "$dir" || { echo "cd failed: $dir" >&2; exit 1; }
+
+# Safe read loop
+while IFS= read -r line; do
+  ...
+done < "$file"
+```
+
+## Guidelines
+
+- **HIGH** = data loss risk, silent failure, or injection vector
+- **MED** = defensive pattern missing, potential edge-case bugs
+- **LOW** = style, portability, maintainability
+- Respect shell dialect: `local`, `[[ ]]`, `pipefail` are bash-only
+- Prioritize scripts running as root or handling sensitive operations
+- Consider: will this break if run twice? With empty input? As cron job?
--- a/specs/ops-review/plan.md
+++ b/specs/ops-review/plan.md
@ -0,0 +1,260 @@
+# ops-review Skill Design
+
+A multi-lens review skill for operational infrastructure, modeled on code-review.
+
+## Problem Statement
+
+Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review.
+
+## Target Artifacts
+
+Based on actual infrastructure in dotfiles and prox-setup:
+
+| Category | Examples |
+|----------|----------|
+| **Nix/NixOS** | flake.nix, modules/*.nix, home-manager configs |
+| **Shell Scripts** | bin/*.sh, setup_*.sh, fix_*.sh, deploy.sh |
+| **Python Automation** | Proxmox API scripts, multi-stage deployments |
+| **Container Configs** | docker-compose.yml, Dockerfile |
+| **CI/CD** | .gitea/workflows/*.yml, .github/actions/*.yml |
+| **Service Configs** | systemd units, Ory configs, SOPS files |
+
+## Architecture: Linter-First Hybrid
+
+**Consensus from model review**: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis.
+
+```
+Stage 1: Static Tools (fast, deterministic)
+├── shellcheck for shell scripts
+├── statix + deadnix for Nix
+├── hadolint for Dockerfiles
+└── yamllint for YAML configs
+
+Stage 2: LLM Analysis (semantic, contextual)
+├── Interprets tool output in context
+├── Finds logic bugs tools miss
+├── Synthesizes cross-file issues
+└── Suggests actionable fixes
+```
+
+**Why**: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics.
+
+## Proposed Lenses (10 total)
+
+### Core Safety (Phase 1)
+
+#### 1. secrets
+**Focus**: Credential hygiene
+- Hardcoded secrets, API keys, tokens
+- SOPS config issues
+- Secrets in logs or error messages
+- Secrets passed via CLI args (visible in process list)
+- Missing encryption for sensitive data
+
+#### 2. shell-safety
+**Focus**: Shell script robustness (backed by shellcheck)
+- Missing `set -euo pipefail`
+- Unquoted variables (SC2086)
+- Unsafe command substitution
+- Missing error handling
+- Hardcoded paths that should be parameters
+
+#### 3. blast-radius
+**Focus**: Change safety and risk containment
+- Destructive operations without confirmation
+- Missing dry-run mode
+- No rollback strategy
+- Bulk operations without batching
+- Missing pre-flight checks
+- No canary/progressive approach
+
+#### 4. privilege
+**Focus**: Least privilege violations
+- Unnecessary sudo/root usage
+- Containers running as root
+- Overly permissive file modes (chmod 777)
+- Missing capability drops
+- Docker socket mounting
+- systemd units without sandboxing (ProtectSystem, PrivateTmp)
+
+### Reliability (Phase 2)
+
+#### 5. idempotency
+**Focus**: Safe re-execution and convergence
+- Scripts that break on re-run
+- Missing existence checks (create-if-not-exists)
+- Non-atomic operations (partial failure states)
+- Check-then-act race conditions
+- Missing cleanup on failure
+
+#### 6. supply-chain
+**Focus**: Dependency provenance and pinning
+- Unpinned versions (`latest` tags, floating refs)
+- GitHub/Gitea actions not pinned to SHA
+- Missing Nix flake.lock or SRI hashes
+- Unsigned artifacts
+- Untrusted substituters/registries
+
+#### 7. observability
+**Focus**: Visibility into system state
+- Silent failures (no logging/alerting)
+- Missing health checks (Docker healthcheck, systemd ExecStartPre)
+- Incomplete metrics coverage
+- Missing structured logging
+- No correlation IDs in multi-step scripts
+
+### Architecture (Phase 3)
+
+#### 8. nix-hygiene
+**Focus**: Nix-specific quality (backed by statix/deadnix)
+- Dead code (unused let bindings, imports)
+- Anti-patterns (with lib abuse, IFD without justification)
+- Module boundary violations
+- Overlay/override issues
+- Missing type annotations on options
+
+#### 9. resilience
+**Focus**: Runtime fault tolerance
+- Missing timeouts on network calls
+- No retries with backoff/jitter
+- Missing circuit breakers for API calls
+- No graceful shutdown handling (SIGTERM)
+- Missing resource limits (systemd MemoryMax, Docker mem_limit)
+
+#### 10. orchestration
+**Focus**: Execution ordering and coupling (formerly dependency-chains)
+- Unclear prerequisites
+- Missing documentation of execution order
+- Circular dependencies
+- Scripts assuming prior state without checking
+- Implicit coupling between components
+
+## Crisp Boundaries
+
+To avoid duplicate findings across overlapping lenses:
+
+| Lens | Owns | Does NOT Own |
+|------|------|--------------|
+| **idempotency** | Safe re-run, convergence, atomic writes, create-if-exists | Rollback (blast-radius), retries (resilience) |
+| **resilience** | Runtime fault tolerance, timeouts, retries, graceful shutdown | Change safety (blast-radius), re-run safety (idempotency) |
+| **blast-radius** | Change safety, dry-run, rollback, confirmation gates, batching | Runtime behavior (resilience), re-run (idempotency) |
+
+## Skill Structure
+
+```
+skills/ops-review/
+├── SKILL.md           # Agent instructions (workflow)
+├── README.md          # User documentation
+└── lenses/
+    ├── README.md      # Lens index
+    ├── secrets.md
+    ├── shell-safety.md
+    ├── blast-radius.md
+    ├── privilege.md
+    ├── idempotency.md
+    ├── supply-chain.md
+    ├── observability.md
+    ├── nix-hygiene.md
+    ├── resilience.md
+    └── orchestration.md
+```
+
+Lenses deploy to `~/.config/lenses/ops/` via home-manager.
+
+## Workflow
+
+### Standard Mode
+1. **Target selection** - files/directory to review
+2. **Pre-pass** - Run static tools (shellcheck, statix, etc.)
+3. **Reference mapping** - Build lightweight call graph (source, imports, ExecStart)
+4. **Lens execution** - One pass per lens, tool output in context
+5. **Synthesis** - Dedupe across lenses, rank by severity
+6. **Interactive review** - User approves findings
+7. **Issue filing** - `bd create` for approved items
+
+### Quick Mode (`--quick`)
+Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege.
+Ideal for pre-commit or CI gates.
+
+## Output Format
+
+Per-lens findings:
+```
+[LENS-TAG] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what's wrong>
+Suggest: <how to fix>
+Evidence: <why it matters>
+```
+
+### Severity Rubric
+
+| Severity | Criteria |
+|----------|----------|
+| **HIGH** | Exploitable vulnerability, data loss risk, or will break on next run |
+| **MED** | Reliability issue, tech debt, or violation of best practice |
+| **LOW** | Polish, maintainability, or defense-in-depth improvement |
+
+Context matters: same issue may be HIGH in production, LOW in homelab.
+
+## Cross-File Awareness
+
+Build a simple reference map before review:
+- **Shell**: `source`, `.` includes, invoked scripts
+- **Nix**: imports, flake inputs
+- **CI**: referenced scripts, env vars, secrets names
+- **Compose**: service dependencies, volumes, env files
+- **systemd**: ExecStart targets, dependencies
+
+This enables finding issues in the seams between components.
+
+## Implementation Phases
+
+### Phase 1: Safety Net (High ROI, Low Ambiguity)
+1. **secrets** - Non-negotiable, prevents catastrophes
+2. **shell-safety** - Most brittle artifact type, shellcheck-backed
+3. **blast-radius** - Where LLMs shine (understanding implications)
+4. **privilege** - Highly actionable, high impact
+
+### Phase 2: Reliability Layer
+5. **idempotency** - Essential for setup/deploy scripts
+6. **supply-chain** - Critical for reproducibility
+7. **observability** - Easy to check, high debugging value
+
+### Phase 3: Architecture Polish
+8. **nix-hygiene** - statix/deadnix backed, LLM explains
+9. **resilience** - Needs nuance to avoid bad advice
+10. **orchestration** - Most complex, needs full context
+
+## Design Decisions
+
+1. **Linter-first, LLM-second**: Static tools for syntax, LLM for semantics
+2. **Crisp lens boundaries**: Each rule has one primary owner
+3. **Severity tied to impact**: Not all violations are equal
+4. **Quick mode**: Phase 1 for pre-commit/CI
+5. **Cross-file awareness**: Grep-based reference mapping
+6. **Escape hatches**: Intentional patterns can be flagged + suppressed
+
+## Success Criteria
+
+- Can review dotfiles/ and find real issues
+- Can review prox-setup/ and find real issues
+- Findings are actionable, not noise
+- Phase 1 lenses have <10% false positive rate
+- Integrates with existing bd issue tracking
+- Quick mode runs in <30 seconds
+
+## Open Questions (Resolved)
+
+| Question | Resolution |
+|----------|------------|
+| Nix: statix/deadnix or pure LLM? | **Hybrid**: Tools first, LLM interprets |
+| Shell: integrate shellcheck? | **Yes**: Treat as compiler, LLM groups/prioritizes |
+| Multi-file dependencies? | **Grep-based reference map** pre-pass |
+| Quick mode? | **Yes**: Phase 1 lenses only |
+| Prioritize across artifact types? | **By risk**: secrets/destructive ops first, not file type |
+
+## References
+
+- [Google SRE Book](https://sre.google/sre-book/table-of-contents/)
+- [OWASP Infrastructure Security](https://owasp.org/www-project-devsecops-guideline/)
+- Consensus review: sonar, flash-or, gemini, gpt (2025-01-01)