feat: complete ops-review skill with all 10 lenses

Phase 2 lenses (reliability): - idempotency: safe re-run, atomic ops, convergence - supply-chain: pinning, provenance, build-time network - observability: health checks, logging, metrics Phase 3 lenses (architecture): - nix-hygiene: statix/deadnix patterns, module design - resilience: timeouts, retries, resource limits - orchestration: ordering, dependencies, coupling All lenses validated via orch consensus (gemini, gpt, flash-or). Testing delegated to target repos: dotfiles-je5, prox-setup-kqg. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 21:02:39 -08:00 · 2026-01-01 21:02:39 -08:00 · fa97fca041
parent a8ab3c1b1b
commit fa97fca041
8 changed files with 821 additions and 7 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
@ -34,10 +34,10 @@
 {"id":"skills-8y6","title":"Define skill versioning strategy","description":"Git SHA alone is insufficient. Need tuple approach:\n\n- skill_source_rev: git SHA (if available)\n- skill_content_hash: hash of SKILL.md + scripts\n- runtime_ref: flake.lock hash or Nix store path\n\nQuestions to resolve:\n- Do Protos pin to versions (stable but maintenance) or float on latest (risky)?\n- How to handle breaking changes in skills?\n- Record in wisp trace vs proto definition?\n\nFrom consensus: both models flagged versioning instability as high severity.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T19:49:30.839064445-05:00","updated_at":"2025-12-23T20:55:04.439779336-05:00","closed_at":"2025-12-23T20:55:04.439779336-05:00","close_reason":"ADRs revised with orch consensus feedback"}
 {"id":"skills-9af","title":"spec-review: Add spike/research task handling","description":"Tasks like 'Investigate X' can linger without clear outcomes.\n\nAdd to REVIEW_TASKS:\n- Flag research/spike tasks\n- Require timebox and concrete outputs (decision record, prototype, risks)\n- Pattern for handling unknowns","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:26.887719136-08:00","updated_at":"2025-12-15T14:08:13.441095034-08:00","closed_at":"2025-12-15T14:08:13.441095034-08:00"}
 {"id":"skills-9bc","title":"Investigate pre-compression hook for worklogs","description":"## Revised Understanding\n\nClaude Code already persists full conversation history in `~/.claude/projects/\u003cproject\u003e/\u003csession-id\u003e.jsonl`. Pre-compact hooks aren't needed for data capture.\n\n## Question\nWhat's the ideal workflow for generating worklogs from session data?\n\n## Options\n\n### 1. Post-session script\n- Run after exiting Claude Code\n- Reads most recent session JSONL\n- Generates worklog from conversation content\n- Pro: Async, doesn't interrupt flow\n- Con: May forget to run it\n\n### 2. On-demand slash command\n- `/worklog-from-session` or similar\n- Reads current session's JSONL file\n- Generates worklog with full context\n- Pro: Explicit control\n- Con: Still need to remember\n\n### 3. Pre-compact reminder\n- Hook prints reminder: \"Consider running /worklog\"\n- Doesn't automate, just nudges\n- Pro: Simple, non-intrusive\n- Con: Easy to dismiss\n\n### 4. Async batch processing\n- Process old sessions whenever\n- All data persists in JSONL files\n- Pro: No urgency, can do later\n- Con: Context may be stale\n\n## Data Format\nSession files contain:\n- User messages with timestamp\n- Assistant responses with model info\n- Tool calls and results\n- Git branch, cwd, version info\n\n## Next Steps\n- Decide preferred workflow\n- Build script to parse session JSONL → worklog format","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-17T14:32:32.568430817-08:00","updated_at":"2025-12-17T15:56:38.864916015-08:00","closed_at":"2025-12-17T15:56:38.864916015-08:00","close_reason":"Pivoted: worklogs may be redundant given full conversation persistence. New approach: make conversations searchable directly."}
-{"id":"skills-9cu","title":"ops-review skill","description":"Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD).\n\nBased on code-review pattern with linter-first hybrid architecture.\n\n## Phases\n- Phase 1: Skeleton + Core Safety (secrets, shell-safety, blast-radius, privilege)\n- Phase 2: Reliability (idempotency, supply-chain, observability)\n- Phase 3: Architecture (nix-hygiene, resilience, orchestration)\n\n## Design\nSee specs/ops-review/plan.md\n\n## Success Criteria\n- Review dotfiles/ and find real issues\n- Review prox-setup/ and find real issues\n- \u003c10% false positive rate on Phase 1\n- Quick mode \u003c30s","status":"open","priority":1,"issue_type":"epic","created_at":"2026-01-01T16:55:15.772440374-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:15.772440374-05:00"}
+{"id":"skills-9cu","title":"ops-review skill","description":"Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD).\n\nBased on code-review pattern with linter-first hybrid architecture.\n\n## Phases\n- Phase 1: Skeleton + Core Safety (secrets, shell-safety, blast-radius, privilege)\n- Phase 2: Reliability (idempotency, supply-chain, observability)\n- Phase 3: Architecture (nix-hygiene, resilience, orchestration)\n\n## Design\nSee specs/ops-review/plan.md\n\n## Success Criteria\n- Review dotfiles/ and find real issues\n- Review prox-setup/ and find real issues\n- \u003c10% false positive rate on Phase 1\n- Quick mode \u003c30s","status":"closed","priority":1,"issue_type":"epic","created_at":"2026-01-01T16:55:15.772440374-05:00","created_by":"dan","updated_at":"2026-01-02T00:02:23.095920957-05:00","closed_at":"2026-01-02T00:02:23.095920957-05:00","close_reason":"All 10 lenses implemented with orch consensus. Testing delegated to target repos (dotfiles-je5, prox-setup-kqg)."}
 {"id":"skills-9cu.1","title":"Create skill skeleton","description":"Create directory structure and base files:\n- skills/ops-review/SKILL.md (workflow, modeled on code-review)\n- skills/ops-review/README.md (user docs)\n- skills/ops-review/lenses/README.md (lens index)\n\nBlocks all lens work.","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:22.084083175-05:00","created_by":"dan","updated_at":"2026-01-01T17:08:20.384800582-05:00","closed_at":"2026-01-01T17:08:20.384800582-05:00","close_reason":"Created skeleton: SKILL.md, README.md, lenses/README.md","dependencies":[{"issue_id":"skills-9cu.1","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:22.095950548-05:00","created_by":"dan"}]}
-{"id":"skills-9cu.10","title":"Lens: resilience","description":"Create resilience.md lens for fault tolerance:\n- Missing timeouts on network calls\n- No retries with backoff\n- Missing circuit breakers\n- No graceful shutdown (SIGTERM)\n- Missing resource limits\n\nBoundary: Owns runtime tolerance, NOT change safety","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.876125632-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.876125632-05:00","dependencies":[{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.878008563-05:00","created_by":"dan"},{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.881250755-05:00","created_by":"dan"}]}
-{"id":"skills-9cu.11","title":"Lens: orchestration","description":"Create orchestration.md lens for execution ordering:\n- Unclear prerequisites\n- Missing order documentation\n- Circular dependencies\n- Assumed prior state\n- Implicit coupling\n\nMost complex - needs cross-file context","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:01.098528225-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:01.098528225-05:00","dependencies":[{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:01.100559128-05:00","created_by":"dan"},{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:01.104046552-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.10","title":"Lens: resilience","description":"Create resilience.md lens for fault tolerance:\n- Missing timeouts on network calls\n- No retries with backoff\n- Missing circuit breakers\n- No graceful shutdown (SIGTERM)\n- Missing resource limits\n\nBoundary: Owns runtime tolerance, NOT change safety","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.876125632-05:00","created_by":"dan","updated_at":"2026-01-02T00:00:31.02324893-05:00","closed_at":"2026-01-02T00:00:31.02324893-05:00","close_reason":"Lens created with orch consensus: added health checks/liveness, DNS caching, storage/logging, retry safety warning","dependencies":[{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.878008563-05:00","created_by":"dan"},{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.881250755-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.11","title":"Lens: orchestration","description":"Create orchestration.md lens for execution ordering:\n- Unclear prerequisites\n- Missing order documentation\n- Circular dependencies\n- Assumed prior state\n- Implicit coupling\n\nMost complex - needs cross-file context","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:01.098528225-05:00","created_by":"dan","updated_at":"2026-01-02T00:02:09.377316231-05:00","closed_at":"2026-01-02T00:02:09.377316231-05:00","close_reason":"Lens created with orch consensus: added shutdown ordering, CI/CD pipelines, job concurrency, thundering herd","dependencies":[{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:01.100559128-05:00","created_by":"dan"},{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:01.104046552-05:00","created_by":"dan"}]}
 {"id":"skills-9cu.12","title":"Integration: flake.nix + ai-skills.nix","description":"Add ops-review to deployment:\n- Add to flake.nix availableSkills\n- Update modules/ai-skills.nix for ops lens deployment\n- Deploy to ~/.config/lenses/ops/","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.324752872-05:00","created_by":"dan","updated_at":"2026-01-01T18:34:37.960786687-05:00","closed_at":"2026-01-01T18:34:37.960786687-05:00","close_reason":"Added ops-review to flake.nix availableSkills, updated ai-skills.nix with description and lens deployment to ~/.config/lenses/ops/","dependencies":[{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.339878541-05:00","created_by":"dan"},{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:13.34278836-05:00","created_by":"dan"}]}
 {"id":"skills-9cu.13","title":"Validation: test on dotfiles","description":"Run Phase 1 lenses on ~/proj/dotfiles:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.489473975-05:00","created_by":"dan","updated_at":"2026-01-01T20:45:55.525956162-05:00","closed_at":"2026-01-01T20:45:55.525956162-05:00","close_reason":"Tested on dotfiles - found 7 shell-safety issues (SC2155), 1 blast-radius issue (prune without dry-run). Lenses working correctly.","dependencies":[{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.490574316-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.492551051-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.494453305-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.496395361-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.49824655-05:00","created_by":"dan"}]}
 {"id":"skills-9cu.14","title":"Validation: test on prox-setup","description":"Run Phase 1 lenses on ~/proj/prox-setup:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.676548941-05:00","created_by":"dan","updated_at":"2026-01-01T21:46:34.25998-05:00","closed_at":"2026-01-01T21:46:34.25998-05:00","close_reason":"Reassigned to prox-setup repo - repo teams own their own testing","dependencies":[{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.677846482-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.680528791-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.683748368-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.68689222-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.689241654-05:00","created_by":"dan"}]}
@ -45,10 +45,10 @@
 {"id":"skills-9cu.3","title":"Lens: shell-safety","description":"Create shell-safety.md lens (shellcheck-backed):\n- Missing set -euo pipefail\n- Unquoted variables (SC2086)\n- Unsafe command substitution\n- Missing error handling\n- Hardcoded paths\n\nLinter integration: shellcheck JSON output","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.596966874-05:00","created_by":"dan","updated_at":"2026-01-01T17:16:27.274701375-05:00","closed_at":"2026-01-01T17:16:27.274701375-05:00","close_reason":"Created shell-safety.md lens with temp file safety, input validation, set -e nuance, guard snippets. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.598340159-05:00","created_by":"dan"},{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.600733142-05:00","created_by":"dan"}]}
 {"id":"skills-9cu.4","title":"Lens: blast-radius","description":"Create blast-radius.md lens for change safety:\n- Destructive ops without confirmation\n- Missing dry-run mode\n- No rollback strategy\n- Bulk ops without batching\n- Missing pre-flight checks\n\nLLM-primary: understanding implications","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.792059661-05:00","created_by":"dan","updated_at":"2026-01-01T17:24:07.972638831-05:00","closed_at":"2026-01-01T17:24:07.972638831-05:00","close_reason":"Created blast-radius.md with targeting/scoping, empty var expansion, env gates, scope in output, mitigation downgrades. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.793564277-05:00","created_by":"dan"},{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.796234701-05:00","created_by":"dan"}]}
 {"id":"skills-9cu.5","title":"Lens: privilege","description":"Create privilege.md lens for least-privilege:\n- Unnecessary sudo/root\n- Containers as root\n- chmod 777 patterns\n- Missing capability drops\n- Docker socket mounting\n- systemd without sandboxing","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.996280533-05:00","created_by":"dan","updated_at":"2026-01-01T18:30:25.980656507-05:00","closed_at":"2026-01-01T18:30:25.980656507-05:00","close_reason":"Created privilege.md with network binding, setuid/setgid, K8s specifics, compensating controls, curl|sudo bash. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.999435334-05:00","created_by":"dan"},{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:36.004010491-05:00","created_by":"dan"}]}
-{"id":"skills-9cu.6","title":"Lens: idempotency","description":"Create idempotency.md lens for safe re-execution:\n- Scripts that break on re-run\n- Missing existence checks\n- Non-atomic operations\n- Check-then-act race conditions\n- Missing cleanup on failure\n\nBoundary: Owns convergence, NOT rollback or retries","status":"in_progress","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.04397031-05:00","created_by":"dan","updated_at":"2026-01-01T21:45:56.719192669-05:00","dependencies":[{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.061027066-05:00","created_by":"dan"},{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.065409149-05:00","created_by":"dan"}]}
-{"id":"skills-9cu.7","title":"Lens: supply-chain","description":"Create supply-chain.md lens for provenance:\n- Unpinned versions (latest tags)\n- Actions not pinned to SHA\n- Missing flake.lock/SRI hashes\n- Unsigned artifacts\n- Untrusted registries","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.317966318-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.317966318-05:00","dependencies":[{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.319754113-05:00","created_by":"dan"},{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.322943568-05:00","created_by":"dan"}]}
-{"id":"skills-9cu.8","title":"Lens: observability","description":"Create observability.md lens for visibility:\n- Silent failures\n- Missing health checks\n- Incomplete metrics\n- Missing structured logging\n- No correlation IDs","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.562009474-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.562009474-05:00","dependencies":[{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.564394694-05:00","created_by":"dan"},{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.571005731-05:00","created_by":"dan"}]}
-{"id":"skills-9cu.9","title":"Lens: nix-hygiene","description":"Create nix-hygiene.md lens (statix/deadnix-backed):\n- Dead code (unused bindings)\n- Anti-patterns (with lib abuse, IFD)\n- Module boundary violations\n- Overlay issues\n- Missing option types\n\nLinter integration: statix + deadnix JSON","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.623672452-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.623672452-05:00","dependencies":[{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.638729349-05:00","created_by":"dan"},{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.643063075-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.6","title":"Lens: idempotency","description":"Create idempotency.md lens for safe re-execution:\n- Scripts that break on re-run\n- Missing existence checks\n- Non-atomic operations\n- Check-then-act race conditions\n- Missing cleanup on failure\n\nBoundary: Owns convergence, NOT rollback or retries","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.04397031-05:00","created_by":"dan","updated_at":"2026-01-01T22:01:48.652398594-05:00","closed_at":"2026-01-01T22:01:48.652398594-05:00","close_reason":"Lens created with orch consensus feedback: added optimistic locking, non-deterministic naming, delete idempotency, false positive risks","dependencies":[{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.061027066-05:00","created_by":"dan"},{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.065409149-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.7","title":"Lens: supply-chain","description":"Create supply-chain.md lens for provenance:\n- Unpinned versions (latest tags)\n- Actions not pinned to SHA\n- Missing flake.lock/SRI hashes\n- Unsigned artifacts\n- Untrusted registries","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.317966318-05:00","created_by":"dan","updated_at":"2026-01-01T22:03:26.655269107-05:00","closed_at":"2026-01-01T22:03:26.655269107-05:00","close_reason":"Lens created with orch consensus: added Terraform/Tofu, build-time network access, GH Actions permissions, builtins.fetchTarball","dependencies":[{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.319754113-05:00","created_by":"dan"},{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.322943568-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.8","title":"Lens: observability","description":"Create observability.md lens for visibility:\n- Silent failures\n- Missing health checks\n- Incomplete metrics\n- Missing structured logging\n- No correlation IDs","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.562009474-05:00","created_by":"dan","updated_at":"2026-01-01T22:05:03.351508622-05:00","closed_at":"2026-01-01T22:05:03.351508622-05:00","close_reason":"Lens created with orch consensus: added resource visibility, heartbeats, version/build metadata, log rotation","dependencies":[{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.564394694-05:00","created_by":"dan"},{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.571005731-05:00","created_by":"dan"}]}
+{"id":"skills-9cu.9","title":"Lens: nix-hygiene","description":"Create nix-hygiene.md lens (statix/deadnix-backed):\n- Dead code (unused bindings)\n- Anti-patterns (with lib abuse, IFD)\n- Module boundary violations\n- Overlay issues\n- Missing option types\n\nLinter integration: statix + deadnix JSON","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.623672452-05:00","created_by":"dan","updated_at":"2026-01-01T23:58:43.868830539-05:00","closed_at":"2026-01-01T23:58:43.868830539-05:00","close_reason":"Lens created with orch consensus: added lib.mkIf guards, mkDefault/mkForce, reproducibility/purity, build efficiency, expanded false positives","dependencies":[{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.638729349-05:00","created_by":"dan"},{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.643063075-05:00","created_by":"dan"}]}
 {"id":"skills-a0x","title":"spec-review: Add traceability requirements across artifacts","description":"Prompts don't enforce spec → plan → tasks linkage. Drift can occur without detection.\n\nAdd:\n- Require trace matrix or linkage in reviews\n- Each plan item should reference spec requirement\n- Each task should reference plan item\n- Flag unmapped items and extra scope","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:25.270581198-08:00","updated_at":"2025-12-15T14:05:48.196356786-08:00","closed_at":"2025-12-15T14:05:48.196356786-08:00"}
 {"id":"skills-a23","title":"Update main README to list all 9 skills","description":"Main README.md 'Skills Included' section only lists worklog and update-spec-kit. Repo actually has 9 skills: template, worklog, update-spec-kit, screenshot-latest, niri-window-capture, tufte-press, update-opencode, web-research, web-search.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:14.042397754-08:00","updated_at":"2025-12-28T22:08:02.074758486-05:00","closed_at":"2025-12-28T22:08:02.074758486-05:00","close_reason":"Updated README with table listing all 14 skills (5 deployed, 8 available, 1 development template)","dependencies":[{"issue_id":"skills-a23","depends_on_id":"skills-4yn","type":"blocks","created_at":"2025-11-30T12:01:30.306742184-08:00","created_by":"daemon","metadata":"{}"}]}
 {"id":"skills-al5","title":"Consider repo-setup-verification skill","description":"The dotfiles repo has a repo-setup-prompt.md verification checklist that could become a skill.\n\n**Source**: ~/proj/dotfiles/docs/repo-setup-prompt.md\n\n**What it does**:\n- Verifies .envrc has use_api_keys and skills loading\n- Checks .skills manifest exists with appropriate skills\n- Optionally checks beads setup\n- Verifies API keys are loaded\n\n**As a skill it could**:\n- Be invoked to audit any repo's agent setup\n- Offer to fix missing pieces\n- Provide consistent onboarding for new repos\n\n**Questions**:\n- Is this better as a skill vs a slash command?\n- Should it auto-fix or just report?\n- Does it belong in skills repo or dotfiles?","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T12:38:32.561337354-08:00","updated_at":"2025-12-28T22:22:57.639520516-05:00","closed_at":"2025-12-28T22:22:57.639520516-05:00","close_reason":"Decided: keep as prompt doc in dotfiles, not a skill. Claude can read it when asked. No wrapper benefit, and it's dotfiles-specific setup (not general skill). ai-tools-doctor handles version checking separately."}
@ -122,4 +122,5 @@
 {"id":"skills-wm9","title":"Research Steve Yegge's orchestration work","description":"Steve Yegge is working on something new related to AI orchestration. Research what it is and how it might inform our skills+molecules integration design.\n\nBlocks: skills-hin (ADR finalization)","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-24T02:41:47.848905848-05:00","updated_at":"2025-12-24T02:42:24.40239935-05:00","closed_at":"2025-12-24T02:42:24.40239935-05:00","close_reason":"Not needed - just parking the ADR work"}
 {"id":"skills-x2l","title":"Investigate hooks for parallel orch queries","description":"When using orch skill, it would be useful to spin off multiple model queries in parallel automatically (e.g., gemini + gpt simultaneously). Explore if Claude Code hooks can trigger parallel background processes when the orch skill is invoked.","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-06T19:29:00.165752425-08:00","updated_at":"2025-12-29T15:49:43.831970326-05:00","closed_at":"2025-12-29T15:49:43.831970326-05:00","close_reason":"Investigated. Hooks are synchronous with 60s timeout - unsuitable for background orch queries. Alternatives: (1) SessionStart hook for initial consensus, (2) Explicit skill invocation, (3) PostToolUse for validation. orch consensus already runs models in parallel internally."}
 {"id":"skills-x33","title":"Add tests for branch name generation","description":"File: .specify/scripts/bash/create-new-feature.sh (lines 137-181)\n\nCritical logic with NO test coverage:\n- Word filtering with stop-words\n- Acronym detection\n- Unicode/special character handling\n- Max length boundary (244 bytes)\n- Empty/single-word descriptions\n\nRisk: HIGH - affects all branch creation\n\nFix:\n- Create test suite with edge cases\n- Test stop-word filtering accuracy\n- Test boundary conditions\n\nSeverity: HIGH","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-24T02:51:00.311664646-05:00","updated_at":"2025-12-24T02:51:00.311664646-05:00"}
+{"id":"skills-ybq","title":"Reorganize lens directory structure","description":"Current structure puts ops lenses as subdirectory of code-review lenses:\n\n```\n~/.config/lenses/           \u003c- code-review lenses\n~/.config/lenses/ops/       \u003c- ops-review lenses\n```\n\nThis is asymmetric. Consider:\n\nOption A: Separate top-level directories\n```\n~/.config/lenses/code-review/\n~/.config/lenses/ops-review/\n```\n\nOption B: Keep flat but with prefixes\n```\n~/.config/lenses/code-*.md\n~/.config/lenses/ops-*.md\n```\n\nOption C: Per-skill lens directories\n```\n~/.claude/skills/code-review/lenses/\n~/.claude/skills/ops-review/lenses/\n```\n\nRequires updating:\n- modules/ai-skills.nix (deployment paths)\n- skills/code-review/SKILL.md (expected paths)\n- skills/ops-review/SKILL.md (expected paths)","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T21:57:06.726997606-05:00","created_by":"dan","updated_at":"2026-01-01T21:57:06.726997606-05:00"}
 {"id":"skills-yxv","title":"worklog: extract hardcoded path to variable","description":"SKILL.md repeats ~/.claude/skills/worklog/ path 4-5 times. Define SKILL_ROOT once, reference throughout. Found by bloat+smells lens review.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-25T02:03:15.831699081-05:00","updated_at":"2025-12-27T10:05:51.532722628-05:00","closed_at":"2025-12-27T10:05:51.532722628-05:00","close_reason":"Closed"}
--- a/docs/worklogs/2026-01-01-ops-review-phase-2-lenses.org
+++ b/docs/worklogs/2026-01-01-ops-review-phase-2-lenses.org
@ -0,0 +1,148 @@
+#+TITLE: ops-review Phase 2 Lenses: Idempotency, Supply-Chain, Observability
+#+DATE: 2026-01-01
+#+KEYWORDS: ops-review, lenses, idempotency, supply-chain, observability, orch-consensus, phase-2
+#+COMMITS: 3
+#+COMPRESSION_STATUS: uncompressed
+
+* Session Summary
+** Date: 2026-01-01 (Day 2 of ops-review skill)
+** Focus Area: Phase 2 lens implementation with orch consensus validation
+
+* Accomplishments
+- [X] Created testing bead in dotfiles (dotfiles-je5) with expected findings from smoke test
+- [X] Created testing bead in prox-setup (prox-setup-kqg) for repo team validation
+- [X] Reassigned skills-9cu.14 (prox-setup testing) to prox-setup repo - teams own their own testing
+- [X] Implemented idempotency lens with orch consensus review
+- [X] Implemented supply-chain lens with orch consensus review
+- [X] Implemented observability lens with orch consensus review
+- [X] All three lenses enriched with feedback from gemini, gpt, flash-or
+- [ ] Phase 3 lenses remaining: nix-hygiene, resilience, orchestration
+
+* Key Decisions
+** Decision 1: Repo teams own testing beads
+- Context: Originally had skills-9cu.13 and skills-9cu.14 for testing on dotfiles/prox-setup
+- Options considered:
+  1. Test in skills repo, document findings
+  2. Create beads in target repos, let teams run and validate
+- Rationale: Teams know their repos best, creates accountability, avoids duplicate work
+- Impact: Filed dotfiles-je5 and prox-setup-kqg with expected findings for comparison
+
+** Decision 2: Orch consensus for each lens
+- Context: Phase 1 established pattern of using orch consensus for lens validation
+- Rationale: Multiple models catch different edge cases and false positive risks
+- Impact: Each lens enriched with 3-5 additional patterns from consensus
+
+* Problems & Solutions
+| Problem | Solution | Learning |
+|---------|----------|----------|
+| gemini/gpt didn't receive file in first orch call | Used pipe instead of command substitution: ~cat file \| uv run orch consensus "prompt"~ | Pipe is more reliable than ~$(cat file)~ for large content |
+| dotfiles dev branch had no upstream | ~git push --set-upstream origin dev~ | bd sync assumes upstream exists |
+
+* Technical Details
+
+** Lens Additions from Orch Consensus
+
+*** idempotency.md
+From consensus feedback:
+- Optimistic locking (ETags, resourceVersion) for read-modify-write races
+- Non-deterministic naming (random suffixes creating duplicates)
+- Delete idempotency (ensure absent pattern)
+- No-op illusion warning (mkdir -p returns 0 even if path is a file)
+- False positive risks section
+
+*** supply-chain.md
+From consensus feedback:
+- Terraform/Tofu provider and module pinning
+- Build-time network access (__noChroot, RUN curl during build)
+- GitHub Actions permissions block (GITHUB_TOKEN defaults)
+- builtins.fetchTarball without sha256
+
+*** observability.md
+From consensus feedback:
+- Resource visibility (disk, inodes, file descriptors, OOM)
+- Heartbeats/dead-man's-switch for scheduled jobs
+- Version/commit hash in startup logs
+- Config dump on startup (redacted)
+- Log rotation policies
+- K8s ignores Dockerfile HEALTHCHECK note
+
+** Files Created
+- ~skills/ops-review/lenses/idempotency.md~ - Safe re-execution, convergence
+- ~skills/ops-review/lenses/supply-chain.md~ - Dependency provenance, pinning
+- ~skills/ops-review/lenses/observability.md~ - Visibility, monitoring, debuggability
+
+** Commands Used
+```bash
+# Orch consensus pattern (pipe is more reliable)
+cat skills/ops-review/lenses/supply-chain.md | uv run orch consensus "Review this..." gemini gpt flash-or
+
+# Beads in other repos
+cd ~/proj/dotfiles && bd create --title="..." --type=task --body="..."
+bd dep add dotfiles-je5 dotfiles-x2m
+
+# Close with reason
+bd close skills-9cu.6 --reason="Lens created with orch consensus feedback: ..."
+```
+
+* Process and Workflow
+
+** What Worked Well
+- Orch consensus continues to add value - each model catches different issues
+- flash-or consistently fastest with good practical feedback
+- gpt provides most comprehensive lists (sometimes too comprehensive)
+- gemini good at Terraform/IaC patterns
+- Filing testing beads in target repos with expected results creates clear validation criteria
+
+** What Was Challenging
+- Command substitution ~$(cat file)~ didn't work reliably for passing file content to orch
+- Models sometimes provide overlapping suggestions - need to filter to most impactful
+
+* Learning and Insights
+
+** Technical Insights
+- K8s ignores Dockerfile HEALTHCHECK - probes are the real control plane
+- GITHUB_TOKEN has write access by default - need explicit permissions block
+- builtins.fetchTarball is common way to fetch nixpkgs without hash (security gap)
+- Dead-man's-switch pattern essential for cron - detects "job didn't run at all"
+- mkdir -p returning 0 when path is a file is a "no-op illusion"
+
+** Process Insights
+- Piping file content to orch more reliable than command substitution
+- Three models is the sweet spot - more adds diminishing returns
+- gemini + gpt + flash-or covers: IaC, completeness, practical ops
+
+** Lens Design Insights
+- "False Positive Risks" section essential - prevents over-flagging
+- Each lens benefits from "Common Fixes" section with copy-paste solutions
+- Crisp boundaries between lenses reduce duplicate findings
+
+* Context for Future Work
+
+** Open Questions
+- Should Phase 3 lenses also go through orch consensus?
+- How to handle lens overlap when reviewing (priority order?)
+- Metrics for false positive rate validation
+
+** Next Steps
+- Phase 3 lenses: nix-hygiene, resilience, orchestration
+- nix-hygiene backed by statix/deadnix (linter-first)
+- Update lenses/README.md with new lenses
+- Consider closing epic when Phase 3 complete
+
+** Related Work
+- [[file:2026-01-01-ops-review-skill-design-and-skeleton.org][ops-review Skill Design and Skeleton]] - Phase 1 work
+- [[file:2025-12-28-code-review-skill-creation-worklog-cleanup.org][Code Review Skill Creation]] - Original code-review skill
+- specs/ops-review/plan.md - Design document with lens specifications
+
+* Raw Notes
+- Skill deployment is just adding to claudeCodeSkills list in home/claude.nix
+- Lenses auto-deploy via enableLenses = true (default)
+- direnv use_api_keys provides API keys for orch in any repo with .envrc
+- ops-review epic: 11/14 tasks closed, Phase 3 remaining
+
+* Session Metrics
+- Commits made: 3
+- Files touched: 32
+- Lines added/removed: +2014/-263
+- Tests added: 0
+- Lenses created: 3 (idempotency, supply-chain, observability)
--- a/skills/ops-review/lenses/idempotency.md
+++ b/skills/ops-review/lenses/idempotency.md
@ -20,12 +20,23 @@ Review operational scripts for **safe re-execution and convergent behavior**.
 - Time-of-check vs time-of-use (TOCTOU): `if [ ! -f ]; then touch`
 - mkdir/create without atomic flags (-p, IF NOT EXISTS)
 - PID file races without proper locking
+- Missing optimistic locking (ETags, resourceVersion) for read-modify-write
+
+### Non-Deterministic Naming
+- Random suffixes that create duplicates: `resource-$RANDOM`, `$(uuidgen)`
+- Timestamp-based names that vary each run
+- Should use content-based or stable identifiers

 ### State Convergence
 - Scripts that assume clean-slate (fail if state already exists)
 - Missing "desired state" logic (should converge, not just create)
 - Hardcoded values that conflict with existing config

+### Delete Idempotency
+- Delete operations that fail if resource already gone
+- Missing "ensure absent" pattern (should succeed if already deleted)
+- Cleanup scripts that error on missing files: use `rm -f` not `rm`
+
 ### Nix/NixOS Specific
 - Nix is inherently idempotent - `nixos-rebuild` converges to declared state
 - Watch for imperative escape hatches: `system.activationScripts`, `systemd.services.*.preStart`
@ -66,11 +77,18 @@ CREATE TABLE IF NOT EXISTS ...
 INSERT ... ON CONFLICT DO NOTHING
 ```

+## False Positive Risks
+
+- Commands that intentionally run every time (heartbeats, lease renewals)
+- Level-triggered reconciliation that handles "already exists" gracefully
+- Declarative tools (Nix, Terraform) that converge by design
+
 ## Guidelines

 - **HIGH** = breaks on re-run, leaves partial state, data corruption risk
 - **MED** = non-atomic writes, missing locks, TOCTOU races
 - **LOW** = could be more defensive, minor convergence issues
 - Ask: "What happens if this runs twice? What if it fails halfway?"
+- Beware "no-op illusion": `mkdir -p` returns 0 even if path is a file
 - Nix modules are idempotent by design - focus on imperative sections
 - Does NOT own: rollback (blast-radius), retries (resilience)
--- a/skills/ops-review/lenses/nix-hygiene.md
+++ b/skills/ops-review/lenses/nix-hygiene.md
@ -0,0 +1,134 @@
+# Nix Hygiene Review Lens
+
+Review Nix/NixOS configurations for **code quality, anti-patterns, and maintainability**.
+
+## Linter Integration
+
+This lens is backed by static analysis tools. Run first:
+
+```bash
+# Dead code detection
+deadnix --fail .
+
+# Anti-pattern detection
+statix check .
+```
+
+Focus LLM review on semantic issues linters can't catch.
+
+## What to Look For
+
+### Dead Code (deadnix)
+- Unused let bindings
+- Unused function arguments (use `_` prefix if intentional)
+- Unused imports in module args
+- Unreachable code paths
+
+### Anti-Patterns (statix)
+- `with pkgs;` or `with lib;` abuse (prefer explicit references)
+- `rec {}` when `let ... in` would be cleaner
+- Manual `callPackage` instead of overlay
+- IFD (import-from-derivation) without clear justification
+- `builtins.toPath` (deprecated)
+- Empty patterns in conditionals
+
+### Module Design
+- Options without type annotations (`types.str`, `types.listOf`, etc.)
+- Missing `mkEnableOption` for boolean enable flags
+- Options without descriptions or examples
+- `config` references in `options` (evaluation order issues)
+- Circular imports between modules
+- Missing `lib.mkIf cfg.enable` guards around config blocks
+- Misuse of `lib.mkDefault` vs `lib.mkForce` (prefer mkDefault for overridable)
+- Missing `assertions` for invalid configuration detection
+
+### Flake Hygiene
+- Missing `flake.lock` (should be committed)
+- Stale lock file (very old nixpkgs revision)
+- Unused flake inputs
+- `follows` mismatches causing duplicate nixpkgs
+- Missing `nixConfig` for substituters
+
+### Overlay & Override Issues
+- `override` vs `overrideAttrs` confusion
+- Overlays that don't compose (reference `final` vs `prev` correctly)
+- `callPackage` in overlay without proper scoping
+- Fixed-point (overlay) evaluated too early
+
+### Reproducibility & Purity
+- `import <nixpkgs> {}` (impure, non-reproducible)
+- `builtins.getEnv` or `builtins.currentTime` (impure)
+- Missing `hash` on `fetchTarball`, `fetchGit`, `fetchurl`
+- Reliance on `NIX_PATH` in flake-based projects
+
+### Build Efficiency
+- `src = ./.` without filtering (rebuilds on README changes)
+- Missing `lib.cleanSource` or `nix-gitignore`
+- `builtins.readDir` on large trees at eval time
+
+### NixOS-Specific
+- `environment.systemPackages` bloat (prefer per-user or per-service)
+- `system.stateVersion` modifications (should never change after install)
+- `nixpkgs.config.allowUnfree = true` globally (prefer per-package)
+- Imperative state in `system.activationScripts` without guards
+
+## Output Format
+
+```
+[NIX-HYGIENE] <severity:HIGH|MED|LOW> <file:line>
+Issue: <anti-pattern or code quality issue>
+Linter: <deadnix|statix|manual> if applicable
+Suggest: <cleaner alternative>
+Evidence: <pattern found>
+```
+
+## Common Fixes
+
+```nix
+# Bad: with abuse
+{ pkgs, ... }: with pkgs; [ git vim curl ]
+
+# Good: explicit references (no 'with')
+{ pkgs, ... }: [ pkgs.git pkgs.vim pkgs.curl ]
+
+# Bad: missing type
+options.myOption = mkOption { default = ""; };
+
+# Good: typed option
+options.myOption = mkOption {
+  type = types.str;
+  default = "";
+  description = "What this option does";
+};
+
+# Bad: unused binding
+let
+  foo = "unused";  # deadnix will flag
+  bar = "used";
+in bar
+
+# Good: remove or prefix with _
+let
+  _foo = "intentionally unused";
+  bar = "used";
+in bar
+```
+
+## False Positive Risks
+
+- `_` prefixed args are intentionally unused (function signature compatibility)
+- Some `with` usage is acceptable in small scopes (e.g., `meta = with lib; { ... }`)
+- IFD may be justified for code generation or complex derivations
+- Flake inputs may be used transitively via `follows`
+- `rec {}` is standard in `mkDerivation` for referencing `version` in `src`
+- `allowUnfree = true` globally may be deliberate policy for workstations
+- `systemPackages` bloat is acceptable for single-user immutable systems
+
+## Guidelines
+
+- **HIGH** = evaluation errors, circular deps, type mismatches, stale security-critical lock
+- **MED** = anti-patterns, missing types, dead code, module design issues
+- **LOW** = style preferences, minor cleanup, verbose but working code
+- Run deadnix and statix first - don't duplicate their findings
+- Focus on semantic issues: module boundaries, design patterns, maintainability
+- `with` is not inherently evil - context matters
--- a/skills/ops-review/lenses/observability.md
+++ b/skills/ops-review/lenses/observability.md
@ -0,0 +1,116 @@
+# Observability Review Lens
+
+Review operational infrastructure for **visibility, monitoring, and debuggability**.
+
+## What to Look For
+
+### Silent Failures
+- Commands with stderr redirected to /dev/null without reason
+- Missing exit code checks after critical operations
+- catch/except blocks that swallow errors silently
+- Cron jobs without output capture or MAILTO
+- systemd services without logging (StandardOutput/StandardError)
+
+### Health Checks
+- Docker: Missing HEALTHCHECK instruction
+- Compose: Missing `healthcheck:` for services with dependencies
+- Kubernetes: Missing readiness/liveness probes
+- systemd: Missing `ExecStartPre` validation or `Type=notify`
+- No startup verification before declaring "ready"
+
+### Logging Quality
+- Print statements instead of structured logging
+- Missing log levels (debug/info/warn/error)
+- Logs without timestamps
+- No context in error messages (what failed, with what input?)
+- Secrets/PII in log output
+
+### Correlation & Tracing
+- Multi-step scripts without operation IDs
+- Distributed operations without trace/correlation IDs
+- Missing request IDs in API services
+- No way to follow a request across components
+
+### Metrics & Alerting
+- Long-running services without metrics endpoint
+- Missing duration/latency tracking for operations
+- No alerting on critical failures
+- Batch jobs without success/failure metrics
+- Missing SLI/SLO instrumentation
+- Scheduled jobs without heartbeat/dead-man's-switch (job didn't run at all)
+
+### Resource Visibility
+- No monitoring for disk space, inodes, file descriptors
+- Missing memory/OOM visibility (services killed silently)
+- Log directories without rotation policies
+- No resource limit alerts before exhaustion
+
+### Debugging Capability
+- No way to enable verbose/debug mode
+- Missing dry-run for complex operations
+- No state inspection commands
+- Logs that don't include enough context to reproduce issues
+- Missing version/commit hash in startup logs (was the deploy successful?)
+- No config dump on startup (redacted) to verify active configuration
+
+## Output Format
+
+```
+[OBSERVABILITY] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what's invisible or silent>
+Impact: <debugging difficulty, missed failures>
+Suggest: <add logging, health check, metrics>
+Evidence: <pattern found>
+```
+
+## Common Fixes
+
+```dockerfile
+# Docker health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s \
+  CMD curl -f http://localhost:8080/health || exit 1
+```
+
+```yaml
+# Compose health check
+healthcheck:
+  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
+  interval: 30s
+  timeout: 10s
+  retries: 3
+  start_period: 40s
+```
+
+```ini
+# systemd logging and notification
+[Service]
+Type=notify
+StandardOutput=journal
+StandardError=journal
+ExecStartPre=/usr/bin/test -f /etc/myapp/config.yaml
+```
+
+```bash
+# Script with operation ID
+OP_ID="${OP_ID:-$(date +%s)-$$}"
+log() { echo "[$(date -Iseconds)] [$OP_ID] $*" >&2; }
+log "Starting backup operation"
+```
+
+## False Positive Risks
+
+- Intentionally quiet commands in pipelines (intermediate steps)
+- Services with external health monitoring (not self-checks)
+- Development/test environments where full observability is overhead
+- K8s ignores Dockerfile HEALTHCHECK; probes are the real control
+- Cron without MAILTO when jobs emit metrics instead
+- Utility/sidecar containers that don't need full instrumentation
+
+## Guidelines
+
+- **HIGH** = silent failures in production, no health checks on critical services
+- **MED** = missing structured logging, no correlation IDs
+- **LOW** = could improve debugging, missing nice-to-have metrics
+- Ask: "If this fails at 3 AM, how would we know? How would we debug?"
+- Containers should be observable from outside (health, logs, metrics)
+- Every operation should be traceable from start to finish
--- a/skills/ops-review/lenses/orchestration.md
+++ b/skills/ops-review/lenses/orchestration.md
@ -0,0 +1,144 @@
+# Orchestration Review Lens
+
+Review operational infrastructure for **execution ordering, dependencies, and coupling**.
+
+## What to Look For
+
+### Implicit Dependencies
+- Scripts assuming prior state without checking
+- Services started without verifying dependencies are ready
+- Missing `After=`/`Requires=` in systemd units
+- Docker Compose without `depends_on` or health-based waiting
+- Kubernetes without init containers for prerequisites
+
+### Startup Ordering
+- Database migrations run after app starts
+- Config files expected before generation step runs
+- Secrets not available when service initializes
+- Race conditions between parallel starts
+
+### Shutdown Ordering
+- Missing `Before=` in systemd for reverse dependency order
+- Missing `preStop` hooks in Kubernetes
+- Database stops before app finishes flushing
+- No drain period before termination
+
+### Circular Dependencies
+- Service A requires B, B requires A
+- Deadlocks in systemd ordering
+- Flake inputs with circular `follows`
+- Scripts that call each other in loops
+
+### Unclear Prerequisites
+- No documentation of what must run first
+- Missing README or runbook for deployment order
+- Makefile targets without dependency declarations
+- Scripts without pre-flight checks
+
+### Coupling Issues
+- Hard-coded hostnames/ports instead of service discovery
+- Direct database access from multiple services (shared state)
+- File-based coupling (service A writes, B reads)
+- Implicit timing assumptions ("service B is always slower")
+
+### NixOS/systemd Specific
+- Missing `Wants=` for optional dependencies
+- `After=` without corresponding `Requires=` (ordering without guarantee)
+- Activation scripts with implicit ordering
+- Missing `PartOf=` for lifecycle coupling
+
+### Docker/Kubernetes Specific
+- `depends_on` without `condition: service_healthy`
+- Missing `restartPolicy` for transient dependency failures
+- Init containers without proper failure handling
+- Jobs without `ttlSecondsAfterFinished`
+
+### CI/CD Pipelines
+- GitHub Actions jobs without `needs:` for dependencies
+- Deployment before build/test completion
+- Missing artifact upload/download between jobs
+- Parallel jobs accessing same resources without locks
+
+### Job Concurrency
+- Scheduled jobs without concurrency control (flock, forbidConcurrency)
+- Multiple replicas running migrations simultaneously
+- Missing leader election for singleton tasks
+- Thundering herd on dependency recovery (add backoff/jitter)
+
+## Output Format
+
+```
+[ORCHESTRATION] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what ordering/dependency is unclear or broken>
+Impact: <race condition, startup failure, implicit coupling>
+Suggest: <explicit dependency, health check, documentation>
+Evidence: <pattern found>
+```
+
+## Common Fixes
+
+```ini
+# systemd explicit ordering with guarantee
+[Unit]
+After=postgresql.service
+Requires=postgresql.service
+Wants=redis.service  # optional dependency
+```
+
+```yaml
+# Docker Compose health-based dependency
+services:
+  app:
+    depends_on:
+      db:
+        condition: service_healthy
+      cache:
+        condition: service_started
+```
+
+```yaml
+# Kubernetes init container
+initContainers:
+  - name: wait-for-db
+    image: busybox
+    command: ['sh', '-c', 'until nc -z db 5432; do sleep 1; done']
+```
+
+```makefile
+# Makefile with explicit dependencies
+deploy: build test migrate
+	./deploy.sh
+
+migrate: db-ready
+	./run-migrations.sh
+```
+
+```bash
+# Script with pre-flight check
+#!/bin/bash
+set -euo pipefail
+
+# Verify prerequisites
+command -v jq >/dev/null || { echo "jq required"; exit 1; }
+[ -f /etc/app/config.yaml ] || { echo "Config missing"; exit 1; }
+curl -sf http://db:5432/health || { echo "DB not ready"; exit 1; }
+```
+
+## False Positive Risks
+
+- Intentionally loose coupling for flexibility
+- Services with internal retry logic that handle missing dependencies
+- Development environments with simplified ordering
+- Stateless services that can start in any order
+- Service mesh (Istio/Linkerd) handles sidecar injection automatically
+- Event-driven systems designed for eventual consistency
+- Init container `nc -z` redundant if app has robust retry logic
+
+## Guidelines
+
+- **HIGH** = race conditions, circular deps, startup failures in production
+- **MED** = implicit ordering, missing health checks, undocumented prerequisites
+- **LOW** = could be more explicit, minor coupling concerns
+- Ask: "What happens if this starts before its dependency? What's the contract?"
+- Explicit is better than implicit - document and enforce ordering
+- Health checks > timing assumptions
--- a/skills/ops-review/lenses/resilience.md
+++ b/skills/ops-review/lenses/resilience.md
@ -0,0 +1,140 @@
+# Resilience Review Lens
+
+Review operational infrastructure for **runtime fault tolerance and graceful degradation**.
+
+## What to Look For
+
+### Timeouts
+- Network calls without timeout (curl, wget, API clients)
+- Database connections without connect/query timeout
+- HTTP clients with no deadline or infinite timeout
+- Missing `TimeoutStartSec`/`TimeoutStopSec` in systemd
+
+### Retries & Backoff
+- Retry logic without exponential backoff
+- Missing jitter (thundering herd on recovery)
+- Infinite retry loops without circuit breaker
+- No max retry limit
+
+### Circuit Breakers
+- External API calls without failure threshold
+- Database connections that retry forever on outage
+- Missing fallback behavior when dependency unavailable
+- No degraded mode for non-critical features
+
+### Graceful Shutdown
+- No SIGTERM handler (abrupt termination)
+- Missing drain period for in-flight requests
+- Database connections not closed on shutdown
+- systemd `KillMode=control-group` without `ExecStop`
+- Missing `stop_grace_period` in Docker Compose
+
+### Resource Limits
+- systemd: Missing `MemoryMax`, `CPUQuota`, `TasksMax`
+- Docker: Missing `mem_limit`, `cpus`, `pids_limit`
+- Kubernetes: Missing resource requests/limits
+- No `ulimit` for file descriptors in high-connection services
+- Missing `LimitNOFILE` in systemd for network services
+
+### Connection Management
+- No connection pooling for databases
+- Missing connection limits (max connections)
+- No idle timeout for connection pools
+- Connections held across retries (stale connections)
+
+### Rate Limiting
+- No rate limiting on API endpoints
+- Missing backpressure handling for queues
+- Unbounded work queues that grow under load
+
+### Health Checks & Self-Healing
+- Missing liveness probes (deadlocked process not restarted)
+- Missing readiness probes (traffic sent before initialization)
+- Aggressive liveness probes that fail on dependency outages (restart loops)
+- Missing `WatchdogSec` in systemd for self-healing
+- No startup probe/warmup period before traffic
+
+### DNS & Network
+- DNS caching forever (fails on failover/IP changes)
+- No DNS resolution timeout
+- Missing TCP keepalives for detecting dead connections
+
+### Storage & Logging
+- Unbounded logging filling disk (missing log rotation)
+- Docker without `max-size` log option
+- Missing disk space checks before write-heavy operations
+
+## Output Format
+
+```
+[RESILIENCE] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what fails under stress or partial outage>
+Impact: <cascade failure, resource exhaustion, hung process>
+Suggest: <timeout, backoff, circuit breaker, limit>
+Evidence: <pattern found>
+```
+
+## Common Fixes
+
+```bash
+# curl with timeout
+curl --connect-timeout 5 --max-time 30 "$url"
+
+# wget with timeout
+wget --timeout=30 --tries=3 "$url"
+```
+
+```ini
+# systemd resource limits and timeouts
+[Service]
+TimeoutStartSec=30
+TimeoutStopSec=30
+MemoryMax=512M
+CPUQuota=50%
+TasksMax=100
+LimitNOFILE=65535
+```
+
+```yaml
+# Docker Compose limits
+services:
+  app:
+    mem_limit: 512m
+    cpus: 0.5
+    pids_limit: 100
+    stop_grace_period: 30s
+```
+
+```python
+# Python retry with backoff
+import backoff
+
+@backoff.on_exception(
+    backoff.expo,
+    requests.exceptions.RequestException,
+    max_tries=5,
+    jitter=backoff.full_jitter
+)
+def call_api():
+    return requests.get(url, timeout=30)
+```
+
+## False Positive Risks
+
+- Local-only operations don't need network timeouts
+- Batch jobs may intentionally run without time limits
+- Development environments don't need production resource limits
+- Some services are designed to wait indefinitely (message queues)
+- WebSockets/SSE/long-polling have different timeout semantics
+- Service mesh (Istio/Envoy) may handle retries at infrastructure layer
+- systemd default stop behavior may be sufficient (no ExecStop needed)
+
+## Guidelines
+
+- **HIGH** = no timeout on external calls, missing graceful shutdown in production
+- **MED** = no backoff/jitter, missing resource limits, no connection pooling
+- **LOW** = could be more defensive, missing nice-to-have limits
+- Ask: "What happens when the database is slow? When the API is down?"
+- Every external call needs a timeout - no exceptions
+- **Retry safety**: Only suggest retries for read-only or known-idempotent operations
+- Does NOT own: re-run safety (idempotency), change safety (blast-radius)
--- a/skills/ops-review/lenses/supply-chain.md
+++ b/skills/ops-review/lenses/supply-chain.md
@ -0,0 +1,113 @@
+# Supply Chain Review Lens
+
+Review operational infrastructure for **dependency provenance, pinning, and integrity**.
+
+## What to Look For
+
+### Unpinned Dependencies
+- Docker: `FROM image:latest` or no tag (implicit latest)
+- npm/pip: Missing lockfiles, `*` or `^` versions in production
+- GitHub Actions: `uses: org/action@main` instead of SHA
+- Gitea Actions: Same pattern, unpinned branch refs
+- Helm: `version: "*"` or missing version constraints
+
+### Nix-Specific Pinning
+- Missing `flake.lock` (flakes should always have lockfile)
+- `fetchurl`/`fetchzip` without SRI hash (`sha256`, `hash`)
+- `builtins.fetchGit` without `rev` (floating HEAD)
+- `builtins.fetchTarball` without `sha256` (common nixpkgs fetch)
+- `fetchFromGitHub` without `hash` attribute
+- IFD (import-from-derivation) fetching unpinned sources
+
+### Terraform/Tofu Pinning
+- Providers missing `version` constraint in `required_providers`
+- Modules sourcing git URLs without `?ref=<sha>` or `?ref=<tag>`
+- Missing `.terraform.lock.hcl` (provider checksums)
+
+### Container Provenance
+- Base images from untrusted registries
+- Missing digest pinning: `image:tag` vs `image@sha256:...`
+- Multi-stage builds losing provenance in final stage
+- No signature verification (cosign, Notary)
+
+### CI/CD Pipeline Risks
+- Actions from unverified publishers
+- Workflow injection via `${{ github.event.* }}` in run blocks
+- Secrets exposed to untrusted PRs (pull_request_target)
+- Missing OIDC for cloud auth (long-lived credentials instead)
+- Missing `permissions:` block (GITHUB_TOKEN has write access by default)
+
+### Build-Time Network Access
+- Dockerfile `RUN curl/wget` fetching unpinned resources
+- `go get` during Docker build without `go.sum` enforcement
+- Nix `__noChroot = true` allowing network during build
+- npm/pip install without lockfile enforcement in CI
+
+### Binary/Artifact Integrity
+- Downloads without checksum verification
+- curl/wget piped to sh without hash check
+- Missing GPG signature verification
+- Unsigned packages from PPAs/third-party repos
+
+### Substituters & Registries
+- Nix: Untrusted substituters without signature verification
+- Docker: Pulling from HTTP registries
+- npm/pip: Private registries without auth/TLS
+- Missing dependency confusion protections (scoped packages)
+
+## Output Format
+
+```
+[SUPPLY-CHAIN] <severity:HIGH|MED|LOW> <file:line>
+Issue: <what's unpinned or unverified>
+Risk: <what could be injected or changed>
+Suggest: <pin to SHA/hash, add verification>
+Evidence: <unpinned reference found>
+```
+
+## Common Fixes
+
+```dockerfile
+# Docker: Pin to digest
+FROM node:20@sha256:abc123...
+
+# Or at minimum, pin major version
+FROM node:20.10.0-alpine
+```
+
+```yaml
+# GitHub Actions: Pin to SHA
+uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29  # v4.1.6
+```
+
+```nix
+# Nix: Always include hash
+fetchFromGitHub {
+  owner = "...";
+  repo = "...";
+  rev = "v1.2.3";  # or full SHA
+  hash = "sha256-...";
+}
+```
+
+```bash
+# Downloads: Verify before executing
+curl -fsSL https://example.com/install.sh -o install.sh
+echo "expected_sha256  install.sh" | sha256sum -c -
+bash install.sh
+```
+
+## False Positive Risks
+
+- Development/CI images where latest is intentional for testing
+- Internal trusted registries with controlled update policies
+- Nix flakes auto-update workflows with proper review
+
+## Guidelines
+
+- **HIGH** = unpinned production deps, unverified downloads, curl|sh
+- **MED** = unpinned CI actions, missing lockfiles, unverified registries
+- **LOW** = dev dependencies unpinned, internal tooling
+- Every external dependency is a trust decision
+- Prefer: SHA > tag > branch > latest
+- Nix flakes with `flake.lock` are good; verify lock is committed