feat: complete ops-review skill with all 10 lenses

Phase 2 lenses (reliability):
- idempotency: safe re-run, atomic ops, convergence
- supply-chain: pinning, provenance, build-time network
- observability: health checks, logging, metrics

Phase 3 lenses (architecture):
- nix-hygiene: statix/deadnix patterns, module design
- resilience: timeouts, retries, resource limits
- orchestration: ordering, dependencies, coupling

All lenses validated via orch consensus (gemini, gpt, flash-or).
Testing delegated to target repos: dotfiles-je5, prox-setup-kqg.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
dan 2026-01-01 21:02:39 -08:00
parent a8ab3c1b1b
commit fa97fca041
8 changed files with 821 additions and 7 deletions

View file

@ -34,10 +34,10 @@
{"id":"skills-8y6","title":"Define skill versioning strategy","description":"Git SHA alone is insufficient. Need tuple approach:\n\n- skill_source_rev: git SHA (if available)\n- skill_content_hash: hash of SKILL.md + scripts\n- runtime_ref: flake.lock hash or Nix store path\n\nQuestions to resolve:\n- Do Protos pin to versions (stable but maintenance) or float on latest (risky)?\n- How to handle breaking changes in skills?\n- Record in wisp trace vs proto definition?\n\nFrom consensus: both models flagged versioning instability as high severity.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T19:49:30.839064445-05:00","updated_at":"2025-12-23T20:55:04.439779336-05:00","closed_at":"2025-12-23T20:55:04.439779336-05:00","close_reason":"ADRs revised with orch consensus feedback"}
{"id":"skills-9af","title":"spec-review: Add spike/research task handling","description":"Tasks like 'Investigate X' can linger without clear outcomes.\n\nAdd to REVIEW_TASKS:\n- Flag research/spike tasks\n- Require timebox and concrete outputs (decision record, prototype, risks)\n- Pattern for handling unknowns","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:26.887719136-08:00","updated_at":"2025-12-15T14:08:13.441095034-08:00","closed_at":"2025-12-15T14:08:13.441095034-08:00"}
{"id":"skills-9bc","title":"Investigate pre-compression hook for worklogs","description":"## Revised Understanding\n\nClaude Code already persists full conversation history in `~/.claude/projects/\u003cproject\u003e/\u003csession-id\u003e.jsonl`. Pre-compact hooks aren't needed for data capture.\n\n## Question\nWhat's the ideal workflow for generating worklogs from session data?\n\n## Options\n\n### 1. Post-session script\n- Run after exiting Claude Code\n- Reads most recent session JSONL\n- Generates worklog from conversation content\n- Pro: Async, doesn't interrupt flow\n- Con: May forget to run it\n\n### 2. On-demand slash command\n- `/worklog-from-session` or similar\n- Reads current session's JSONL file\n- Generates worklog with full context\n- Pro: Explicit control\n- Con: Still need to remember\n\n### 3. Pre-compact reminder\n- Hook prints reminder: \"Consider running /worklog\"\n- Doesn't automate, just nudges\n- Pro: Simple, non-intrusive\n- Con: Easy to dismiss\n\n### 4. Async batch processing\n- Process old sessions whenever\n- All data persists in JSONL files\n- Pro: No urgency, can do later\n- Con: Context may be stale\n\n## Data Format\nSession files contain:\n- User messages with timestamp\n- Assistant responses with model info\n- Tool calls and results\n- Git branch, cwd, version info\n\n## Next Steps\n- Decide preferred workflow\n- Build script to parse session JSONL → worklog format","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-17T14:32:32.568430817-08:00","updated_at":"2025-12-17T15:56:38.864916015-08:00","closed_at":"2025-12-17T15:56:38.864916015-08:00","close_reason":"Pivoted: worklogs may be redundant given full conversation persistence. New approach: make conversations searchable directly."}
{"id":"skills-9cu","title":"ops-review skill","description":"Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD).\n\nBased on code-review pattern with linter-first hybrid architecture.\n\n## Phases\n- Phase 1: Skeleton + Core Safety (secrets, shell-safety, blast-radius, privilege)\n- Phase 2: Reliability (idempotency, supply-chain, observability)\n- Phase 3: Architecture (nix-hygiene, resilience, orchestration)\n\n## Design\nSee specs/ops-review/plan.md\n\n## Success Criteria\n- Review dotfiles/ and find real issues\n- Review prox-setup/ and find real issues\n- \u003c10% false positive rate on Phase 1\n- Quick mode \u003c30s","status":"open","priority":1,"issue_type":"epic","created_at":"2026-01-01T16:55:15.772440374-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:15.772440374-05:00"}
{"id":"skills-9cu","title":"ops-review skill","description":"Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD).\n\nBased on code-review pattern with linter-first hybrid architecture.\n\n## Phases\n- Phase 1: Skeleton + Core Safety (secrets, shell-safety, blast-radius, privilege)\n- Phase 2: Reliability (idempotency, supply-chain, observability)\n- Phase 3: Architecture (nix-hygiene, resilience, orchestration)\n\n## Design\nSee specs/ops-review/plan.md\n\n## Success Criteria\n- Review dotfiles/ and find real issues\n- Review prox-setup/ and find real issues\n- \u003c10% false positive rate on Phase 1\n- Quick mode \u003c30s","status":"closed","priority":1,"issue_type":"epic","created_at":"2026-01-01T16:55:15.772440374-05:00","created_by":"dan","updated_at":"2026-01-02T00:02:23.095920957-05:00","closed_at":"2026-01-02T00:02:23.095920957-05:00","close_reason":"All 10 lenses implemented with orch consensus. Testing delegated to target repos (dotfiles-je5, prox-setup-kqg)."}
{"id":"skills-9cu.1","title":"Create skill skeleton","description":"Create directory structure and base files:\n- skills/ops-review/SKILL.md (workflow, modeled on code-review)\n- skills/ops-review/README.md (user docs)\n- skills/ops-review/lenses/README.md (lens index)\n\nBlocks all lens work.","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:22.084083175-05:00","created_by":"dan","updated_at":"2026-01-01T17:08:20.384800582-05:00","closed_at":"2026-01-01T17:08:20.384800582-05:00","close_reason":"Created skeleton: SKILL.md, README.md, lenses/README.md","dependencies":[{"issue_id":"skills-9cu.1","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:22.095950548-05:00","created_by":"dan"}]}
{"id":"skills-9cu.10","title":"Lens: resilience","description":"Create resilience.md lens for fault tolerance:\n- Missing timeouts on network calls\n- No retries with backoff\n- Missing circuit breakers\n- No graceful shutdown (SIGTERM)\n- Missing resource limits\n\nBoundary: Owns runtime tolerance, NOT change safety","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.876125632-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.876125632-05:00","dependencies":[{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.878008563-05:00","created_by":"dan"},{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.881250755-05:00","created_by":"dan"}]}
{"id":"skills-9cu.11","title":"Lens: orchestration","description":"Create orchestration.md lens for execution ordering:\n- Unclear prerequisites\n- Missing order documentation\n- Circular dependencies\n- Assumed prior state\n- Implicit coupling\n\nMost complex - needs cross-file context","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:01.098528225-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:01.098528225-05:00","dependencies":[{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:01.100559128-05:00","created_by":"dan"},{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:01.104046552-05:00","created_by":"dan"}]}
{"id":"skills-9cu.10","title":"Lens: resilience","description":"Create resilience.md lens for fault tolerance:\n- Missing timeouts on network calls\n- No retries with backoff\n- Missing circuit breakers\n- No graceful shutdown (SIGTERM)\n- Missing resource limits\n\nBoundary: Owns runtime tolerance, NOT change safety","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.876125632-05:00","created_by":"dan","updated_at":"2026-01-02T00:00:31.02324893-05:00","closed_at":"2026-01-02T00:00:31.02324893-05:00","close_reason":"Lens created with orch consensus: added health checks/liveness, DNS caching, storage/logging, retry safety warning","dependencies":[{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.878008563-05:00","created_by":"dan"},{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.881250755-05:00","created_by":"dan"}]}
{"id":"skills-9cu.11","title":"Lens: orchestration","description":"Create orchestration.md lens for execution ordering:\n- Unclear prerequisites\n- Missing order documentation\n- Circular dependencies\n- Assumed prior state\n- Implicit coupling\n\nMost complex - needs cross-file context","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:01.098528225-05:00","created_by":"dan","updated_at":"2026-01-02T00:02:09.377316231-05:00","closed_at":"2026-01-02T00:02:09.377316231-05:00","close_reason":"Lens created with orch consensus: added shutdown ordering, CI/CD pipelines, job concurrency, thundering herd","dependencies":[{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:01.100559128-05:00","created_by":"dan"},{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:01.104046552-05:00","created_by":"dan"}]}
{"id":"skills-9cu.12","title":"Integration: flake.nix + ai-skills.nix","description":"Add ops-review to deployment:\n- Add to flake.nix availableSkills\n- Update modules/ai-skills.nix for ops lens deployment\n- Deploy to ~/.config/lenses/ops/","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.324752872-05:00","created_by":"dan","updated_at":"2026-01-01T18:34:37.960786687-05:00","closed_at":"2026-01-01T18:34:37.960786687-05:00","close_reason":"Added ops-review to flake.nix availableSkills, updated ai-skills.nix with description and lens deployment to ~/.config/lenses/ops/","dependencies":[{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.339878541-05:00","created_by":"dan"},{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:13.34278836-05:00","created_by":"dan"}]}
{"id":"skills-9cu.13","title":"Validation: test on dotfiles","description":"Run Phase 1 lenses on ~/proj/dotfiles:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.489473975-05:00","created_by":"dan","updated_at":"2026-01-01T20:45:55.525956162-05:00","closed_at":"2026-01-01T20:45:55.525956162-05:00","close_reason":"Tested on dotfiles - found 7 shell-safety issues (SC2155), 1 blast-radius issue (prune without dry-run). Lenses working correctly.","dependencies":[{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.490574316-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.492551051-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.494453305-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.496395361-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.49824655-05:00","created_by":"dan"}]}
{"id":"skills-9cu.14","title":"Validation: test on prox-setup","description":"Run Phase 1 lenses on ~/proj/prox-setup:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.676548941-05:00","created_by":"dan","updated_at":"2026-01-01T21:46:34.25998-05:00","closed_at":"2026-01-01T21:46:34.25998-05:00","close_reason":"Reassigned to prox-setup repo - repo teams own their own testing","dependencies":[{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.677846482-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.680528791-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.683748368-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.68689222-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.689241654-05:00","created_by":"dan"}]}
@ -45,10 +45,10 @@
{"id":"skills-9cu.3","title":"Lens: shell-safety","description":"Create shell-safety.md lens (shellcheck-backed):\n- Missing set -euo pipefail\n- Unquoted variables (SC2086)\n- Unsafe command substitution\n- Missing error handling\n- Hardcoded paths\n\nLinter integration: shellcheck JSON output","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.596966874-05:00","created_by":"dan","updated_at":"2026-01-01T17:16:27.274701375-05:00","closed_at":"2026-01-01T17:16:27.274701375-05:00","close_reason":"Created shell-safety.md lens with temp file safety, input validation, set -e nuance, guard snippets. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.598340159-05:00","created_by":"dan"},{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.600733142-05:00","created_by":"dan"}]}
{"id":"skills-9cu.4","title":"Lens: blast-radius","description":"Create blast-radius.md lens for change safety:\n- Destructive ops without confirmation\n- Missing dry-run mode\n- No rollback strategy\n- Bulk ops without batching\n- Missing pre-flight checks\n\nLLM-primary: understanding implications","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.792059661-05:00","created_by":"dan","updated_at":"2026-01-01T17:24:07.972638831-05:00","closed_at":"2026-01-01T17:24:07.972638831-05:00","close_reason":"Created blast-radius.md with targeting/scoping, empty var expansion, env gates, scope in output, mitigation downgrades. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.793564277-05:00","created_by":"dan"},{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.796234701-05:00","created_by":"dan"}]}
{"id":"skills-9cu.5","title":"Lens: privilege","description":"Create privilege.md lens for least-privilege:\n- Unnecessary sudo/root\n- Containers as root\n- chmod 777 patterns\n- Missing capability drops\n- Docker socket mounting\n- systemd without sandboxing","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.996280533-05:00","created_by":"dan","updated_at":"2026-01-01T18:30:25.980656507-05:00","closed_at":"2026-01-01T18:30:25.980656507-05:00","close_reason":"Created privilege.md with network binding, setuid/setgid, K8s specifics, compensating controls, curl|sudo bash. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.999435334-05:00","created_by":"dan"},{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:36.004010491-05:00","created_by":"dan"}]}
{"id":"skills-9cu.6","title":"Lens: idempotency","description":"Create idempotency.md lens for safe re-execution:\n- Scripts that break on re-run\n- Missing existence checks\n- Non-atomic operations\n- Check-then-act race conditions\n- Missing cleanup on failure\n\nBoundary: Owns convergence, NOT rollback or retries","status":"in_progress","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.04397031-05:00","created_by":"dan","updated_at":"2026-01-01T21:45:56.719192669-05:00","dependencies":[{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.061027066-05:00","created_by":"dan"},{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.065409149-05:00","created_by":"dan"}]}
{"id":"skills-9cu.7","title":"Lens: supply-chain","description":"Create supply-chain.md lens for provenance:\n- Unpinned versions (latest tags)\n- Actions not pinned to SHA\n- Missing flake.lock/SRI hashes\n- Unsigned artifacts\n- Untrusted registries","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.317966318-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.317966318-05:00","dependencies":[{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.319754113-05:00","created_by":"dan"},{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.322943568-05:00","created_by":"dan"}]}
{"id":"skills-9cu.8","title":"Lens: observability","description":"Create observability.md lens for visibility:\n- Silent failures\n- Missing health checks\n- Incomplete metrics\n- Missing structured logging\n- No correlation IDs","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.562009474-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.562009474-05:00","dependencies":[{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.564394694-05:00","created_by":"dan"},{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.571005731-05:00","created_by":"dan"}]}
{"id":"skills-9cu.9","title":"Lens: nix-hygiene","description":"Create nix-hygiene.md lens (statix/deadnix-backed):\n- Dead code (unused bindings)\n- Anti-patterns (with lib abuse, IFD)\n- Module boundary violations\n- Overlay issues\n- Missing option types\n\nLinter integration: statix + deadnix JSON","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.623672452-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.623672452-05:00","dependencies":[{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.638729349-05:00","created_by":"dan"},{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.643063075-05:00","created_by":"dan"}]}
{"id":"skills-9cu.6","title":"Lens: idempotency","description":"Create idempotency.md lens for safe re-execution:\n- Scripts that break on re-run\n- Missing existence checks\n- Non-atomic operations\n- Check-then-act race conditions\n- Missing cleanup on failure\n\nBoundary: Owns convergence, NOT rollback or retries","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.04397031-05:00","created_by":"dan","updated_at":"2026-01-01T22:01:48.652398594-05:00","closed_at":"2026-01-01T22:01:48.652398594-05:00","close_reason":"Lens created with orch consensus feedback: added optimistic locking, non-deterministic naming, delete idempotency, false positive risks","dependencies":[{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.061027066-05:00","created_by":"dan"},{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.065409149-05:00","created_by":"dan"}]}
{"id":"skills-9cu.7","title":"Lens: supply-chain","description":"Create supply-chain.md lens for provenance:\n- Unpinned versions (latest tags)\n- Actions not pinned to SHA\n- Missing flake.lock/SRI hashes\n- Unsigned artifacts\n- Untrusted registries","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.317966318-05:00","created_by":"dan","updated_at":"2026-01-01T22:03:26.655269107-05:00","closed_at":"2026-01-01T22:03:26.655269107-05:00","close_reason":"Lens created with orch consensus: added Terraform/Tofu, build-time network access, GH Actions permissions, builtins.fetchTarball","dependencies":[{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.319754113-05:00","created_by":"dan"},{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.322943568-05:00","created_by":"dan"}]}
{"id":"skills-9cu.8","title":"Lens: observability","description":"Create observability.md lens for visibility:\n- Silent failures\n- Missing health checks\n- Incomplete metrics\n- Missing structured logging\n- No correlation IDs","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.562009474-05:00","created_by":"dan","updated_at":"2026-01-01T22:05:03.351508622-05:00","closed_at":"2026-01-01T22:05:03.351508622-05:00","close_reason":"Lens created with orch consensus: added resource visibility, heartbeats, version/build metadata, log rotation","dependencies":[{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.564394694-05:00","created_by":"dan"},{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.571005731-05:00","created_by":"dan"}]}
{"id":"skills-9cu.9","title":"Lens: nix-hygiene","description":"Create nix-hygiene.md lens (statix/deadnix-backed):\n- Dead code (unused bindings)\n- Anti-patterns (with lib abuse, IFD)\n- Module boundary violations\n- Overlay issues\n- Missing option types\n\nLinter integration: statix + deadnix JSON","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.623672452-05:00","created_by":"dan","updated_at":"2026-01-01T23:58:43.868830539-05:00","closed_at":"2026-01-01T23:58:43.868830539-05:00","close_reason":"Lens created with orch consensus: added lib.mkIf guards, mkDefault/mkForce, reproducibility/purity, build efficiency, expanded false positives","dependencies":[{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.638729349-05:00","created_by":"dan"},{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.643063075-05:00","created_by":"dan"}]}
{"id":"skills-a0x","title":"spec-review: Add traceability requirements across artifacts","description":"Prompts don't enforce spec → plan → tasks linkage. Drift can occur without detection.\n\nAdd:\n- Require trace matrix or linkage in reviews\n- Each plan item should reference spec requirement\n- Each task should reference plan item\n- Flag unmapped items and extra scope","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:25.270581198-08:00","updated_at":"2025-12-15T14:05:48.196356786-08:00","closed_at":"2025-12-15T14:05:48.196356786-08:00"}
{"id":"skills-a23","title":"Update main README to list all 9 skills","description":"Main README.md 'Skills Included' section only lists worklog and update-spec-kit. Repo actually has 9 skills: template, worklog, update-spec-kit, screenshot-latest, niri-window-capture, tufte-press, update-opencode, web-research, web-search.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:14.042397754-08:00","updated_at":"2025-12-28T22:08:02.074758486-05:00","closed_at":"2025-12-28T22:08:02.074758486-05:00","close_reason":"Updated README with table listing all 14 skills (5 deployed, 8 available, 1 development template)","dependencies":[{"issue_id":"skills-a23","depends_on_id":"skills-4yn","type":"blocks","created_at":"2025-11-30T12:01:30.306742184-08:00","created_by":"daemon","metadata":"{}"}]}
{"id":"skills-al5","title":"Consider repo-setup-verification skill","description":"The dotfiles repo has a repo-setup-prompt.md verification checklist that could become a skill.\n\n**Source**: ~/proj/dotfiles/docs/repo-setup-prompt.md\n\n**What it does**:\n- Verifies .envrc has use_api_keys and skills loading\n- Checks .skills manifest exists with appropriate skills\n- Optionally checks beads setup\n- Verifies API keys are loaded\n\n**As a skill it could**:\n- Be invoked to audit any repo's agent setup\n- Offer to fix missing pieces\n- Provide consistent onboarding for new repos\n\n**Questions**:\n- Is this better as a skill vs a slash command?\n- Should it auto-fix or just report?\n- Does it belong in skills repo or dotfiles?","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T12:38:32.561337354-08:00","updated_at":"2025-12-28T22:22:57.639520516-05:00","closed_at":"2025-12-28T22:22:57.639520516-05:00","close_reason":"Decided: keep as prompt doc in dotfiles, not a skill. Claude can read it when asked. No wrapper benefit, and it's dotfiles-specific setup (not general skill). ai-tools-doctor handles version checking separately."}
@ -122,4 +122,5 @@
{"id":"skills-wm9","title":"Research Steve Yegge's orchestration work","description":"Steve Yegge is working on something new related to AI orchestration. Research what it is and how it might inform our skills+molecules integration design.\n\nBlocks: skills-hin (ADR finalization)","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-24T02:41:47.848905848-05:00","updated_at":"2025-12-24T02:42:24.40239935-05:00","closed_at":"2025-12-24T02:42:24.40239935-05:00","close_reason":"Not needed - just parking the ADR work"}
{"id":"skills-x2l","title":"Investigate hooks for parallel orch queries","description":"When using orch skill, it would be useful to spin off multiple model queries in parallel automatically (e.g., gemini + gpt simultaneously). Explore if Claude Code hooks can trigger parallel background processes when the orch skill is invoked.","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-06T19:29:00.165752425-08:00","updated_at":"2025-12-29T15:49:43.831970326-05:00","closed_at":"2025-12-29T15:49:43.831970326-05:00","close_reason":"Investigated. Hooks are synchronous with 60s timeout - unsuitable for background orch queries. Alternatives: (1) SessionStart hook for initial consensus, (2) Explicit skill invocation, (3) PostToolUse for validation. orch consensus already runs models in parallel internally."}
{"id":"skills-x33","title":"Add tests for branch name generation","description":"File: .specify/scripts/bash/create-new-feature.sh (lines 137-181)\n\nCritical logic with NO test coverage:\n- Word filtering with stop-words\n- Acronym detection\n- Unicode/special character handling\n- Max length boundary (244 bytes)\n- Empty/single-word descriptions\n\nRisk: HIGH - affects all branch creation\n\nFix:\n- Create test suite with edge cases\n- Test stop-word filtering accuracy\n- Test boundary conditions\n\nSeverity: HIGH","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-24T02:51:00.311664646-05:00","updated_at":"2025-12-24T02:51:00.311664646-05:00"}
{"id":"skills-ybq","title":"Reorganize lens directory structure","description":"Current structure puts ops lenses as subdirectory of code-review lenses:\n\n```\n~/.config/lenses/ \u003c- code-review lenses\n~/.config/lenses/ops/ \u003c- ops-review lenses\n```\n\nThis is asymmetric. Consider:\n\nOption A: Separate top-level directories\n```\n~/.config/lenses/code-review/\n~/.config/lenses/ops-review/\n```\n\nOption B: Keep flat but with prefixes\n```\n~/.config/lenses/code-*.md\n~/.config/lenses/ops-*.md\n```\n\nOption C: Per-skill lens directories\n```\n~/.claude/skills/code-review/lenses/\n~/.claude/skills/ops-review/lenses/\n```\n\nRequires updating:\n- modules/ai-skills.nix (deployment paths)\n- skills/code-review/SKILL.md (expected paths)\n- skills/ops-review/SKILL.md (expected paths)","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T21:57:06.726997606-05:00","created_by":"dan","updated_at":"2026-01-01T21:57:06.726997606-05:00"}
{"id":"skills-yxv","title":"worklog: extract hardcoded path to variable","description":"SKILL.md repeats ~/.claude/skills/worklog/ path 4-5 times. Define SKILL_ROOT once, reference throughout. Found by bloat+smells lens review.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-25T02:03:15.831699081-05:00","updated_at":"2025-12-27T10:05:51.532722628-05:00","closed_at":"2025-12-27T10:05:51.532722628-05:00","close_reason":"Closed"}

View file

@ -0,0 +1,148 @@
#+TITLE: ops-review Phase 2 Lenses: Idempotency, Supply-Chain, Observability
#+DATE: 2026-01-01
#+KEYWORDS: ops-review, lenses, idempotency, supply-chain, observability, orch-consensus, phase-2
#+COMMITS: 3
#+COMPRESSION_STATUS: uncompressed
* Session Summary
** Date: 2026-01-01 (Day 2 of ops-review skill)
** Focus Area: Phase 2 lens implementation with orch consensus validation
* Accomplishments
- [X] Created testing bead in dotfiles (dotfiles-je5) with expected findings from smoke test
- [X] Created testing bead in prox-setup (prox-setup-kqg) for repo team validation
- [X] Reassigned skills-9cu.14 (prox-setup testing) to prox-setup repo - teams own their own testing
- [X] Implemented idempotency lens with orch consensus review
- [X] Implemented supply-chain lens with orch consensus review
- [X] Implemented observability lens with orch consensus review
- [X] All three lenses enriched with feedback from gemini, gpt, flash-or
- [ ] Phase 3 lenses remaining: nix-hygiene, resilience, orchestration
* Key Decisions
** Decision 1: Repo teams own testing beads
- Context: Originally had skills-9cu.13 and skills-9cu.14 for testing on dotfiles/prox-setup
- Options considered:
1. Test in skills repo, document findings
2. Create beads in target repos, let teams run and validate
- Rationale: Teams know their repos best, creates accountability, avoids duplicate work
- Impact: Filed dotfiles-je5 and prox-setup-kqg with expected findings for comparison
** Decision 2: Orch consensus for each lens
- Context: Phase 1 established pattern of using orch consensus for lens validation
- Rationale: Multiple models catch different edge cases and false positive risks
- Impact: Each lens enriched with 3-5 additional patterns from consensus
* Problems & Solutions
| Problem | Solution | Learning |
|---------|----------|----------|
| gemini/gpt didn't receive file in first orch call | Used pipe instead of command substitution: ~cat file \| uv run orch consensus "prompt"~ | Pipe is more reliable than ~$(cat file)~ for large content |
| dotfiles dev branch had no upstream | ~git push --set-upstream origin dev~ | bd sync assumes upstream exists |
* Technical Details
** Lens Additions from Orch Consensus
*** idempotency.md
From consensus feedback:
- Optimistic locking (ETags, resourceVersion) for read-modify-write races
- Non-deterministic naming (random suffixes creating duplicates)
- Delete idempotency (ensure absent pattern)
- No-op illusion warning (mkdir -p returns 0 even if path is a file)
- False positive risks section
*** supply-chain.md
From consensus feedback:
- Terraform/Tofu provider and module pinning
- Build-time network access (__noChroot, RUN curl during build)
- GitHub Actions permissions block (GITHUB_TOKEN defaults)
- builtins.fetchTarball without sha256
*** observability.md
From consensus feedback:
- Resource visibility (disk, inodes, file descriptors, OOM)
- Heartbeats/dead-man's-switch for scheduled jobs
- Version/commit hash in startup logs
- Config dump on startup (redacted)
- Log rotation policies
- K8s ignores Dockerfile HEALTHCHECK note
** Files Created
- ~skills/ops-review/lenses/idempotency.md~ - Safe re-execution, convergence
- ~skills/ops-review/lenses/supply-chain.md~ - Dependency provenance, pinning
- ~skills/ops-review/lenses/observability.md~ - Visibility, monitoring, debuggability
** Commands Used
```bash
# Orch consensus pattern (pipe is more reliable)
cat skills/ops-review/lenses/supply-chain.md | uv run orch consensus "Review this..." gemini gpt flash-or
# Beads in other repos
cd ~/proj/dotfiles && bd create --title="..." --type=task --body="..."
bd dep add dotfiles-je5 dotfiles-x2m
# Close with reason
bd close skills-9cu.6 --reason="Lens created with orch consensus feedback: ..."
```
* Process and Workflow
** What Worked Well
- Orch consensus continues to add value - each model catches different issues
- flash-or consistently fastest with good practical feedback
- gpt provides most comprehensive lists (sometimes too comprehensive)
- gemini good at Terraform/IaC patterns
- Filing testing beads in target repos with expected results creates clear validation criteria
** What Was Challenging
- Command substitution ~$(cat file)~ didn't work reliably for passing file content to orch
- Models sometimes provide overlapping suggestions - need to filter to most impactful
* Learning and Insights
** Technical Insights
- K8s ignores Dockerfile HEALTHCHECK - probes are the real control plane
- GITHUB_TOKEN has write access by default - need explicit permissions block
- builtins.fetchTarball is common way to fetch nixpkgs without hash (security gap)
- Dead-man's-switch pattern essential for cron - detects "job didn't run at all"
- mkdir -p returning 0 when path is a file is a "no-op illusion"
** Process Insights
- Piping file content to orch more reliable than command substitution
- Three models is the sweet spot - more adds diminishing returns
- gemini + gpt + flash-or covers: IaC, completeness, practical ops
** Lens Design Insights
- "False Positive Risks" section essential - prevents over-flagging
- Each lens benefits from "Common Fixes" section with copy-paste solutions
- Crisp boundaries between lenses reduce duplicate findings
* Context for Future Work
** Open Questions
- Should Phase 3 lenses also go through orch consensus?
- How to handle lens overlap when reviewing (priority order?)
- Metrics for false positive rate validation
** Next Steps
- Phase 3 lenses: nix-hygiene, resilience, orchestration
- nix-hygiene backed by statix/deadnix (linter-first)
- Update lenses/README.md with new lenses
- Consider closing epic when Phase 3 complete
** Related Work
- [[file:2026-01-01-ops-review-skill-design-and-skeleton.org][ops-review Skill Design and Skeleton]] - Phase 1 work
- [[file:2025-12-28-code-review-skill-creation-worklog-cleanup.org][Code Review Skill Creation]] - Original code-review skill
- specs/ops-review/plan.md - Design document with lens specifications
* Raw Notes
- Skill deployment is just adding to claudeCodeSkills list in home/claude.nix
- Lenses auto-deploy via enableLenses = true (default)
- direnv use_api_keys provides API keys for orch in any repo with .envrc
- ops-review epic: 11/14 tasks closed, Phase 3 remaining
* Session Metrics
- Commits made: 3
- Files touched: 32
- Lines added/removed: +2014/-263
- Tests added: 0
- Lenses created: 3 (idempotency, supply-chain, observability)

View file

@ -20,12 +20,23 @@ Review operational scripts for **safe re-execution and convergent behavior**.
- Time-of-check vs time-of-use (TOCTOU): `if [ ! -f ]; then touch`
- mkdir/create without atomic flags (-p, IF NOT EXISTS)
- PID file races without proper locking
- Missing optimistic locking (ETags, resourceVersion) for read-modify-write
### Non-Deterministic Naming
- Random suffixes that create duplicates: `resource-$RANDOM`, `$(uuidgen)`
- Timestamp-based names that vary each run
- Should use content-based or stable identifiers
### State Convergence
- Scripts that assume clean-slate (fail if state already exists)
- Missing "desired state" logic (should converge, not just create)
- Hardcoded values that conflict with existing config
### Delete Idempotency
- Delete operations that fail if resource already gone
- Missing "ensure absent" pattern (should succeed if already deleted)
- Cleanup scripts that error on missing files: use `rm -f` not `rm`
### Nix/NixOS Specific
- Nix is inherently idempotent - `nixos-rebuild` converges to declared state
- Watch for imperative escape hatches: `system.activationScripts`, `systemd.services.*.preStart`
@ -66,11 +77,18 @@ CREATE TABLE IF NOT EXISTS ...
INSERT ... ON CONFLICT DO NOTHING
```
## False Positive Risks
- Commands that intentionally run every time (heartbeats, lease renewals)
- Level-triggered reconciliation that handles "already exists" gracefully
- Declarative tools (Nix, Terraform) that converge by design
## Guidelines
- **HIGH** = breaks on re-run, leaves partial state, data corruption risk
- **MED** = non-atomic writes, missing locks, TOCTOU races
- **LOW** = could be more defensive, minor convergence issues
- Ask: "What happens if this runs twice? What if it fails halfway?"
- Beware "no-op illusion": `mkdir -p` returns 0 even if path is a file
- Nix modules are idempotent by design - focus on imperative sections
- Does NOT own: rollback (blast-radius), retries (resilience)

View file

@ -0,0 +1,134 @@
# Nix Hygiene Review Lens
Review Nix/NixOS configurations for **code quality, anti-patterns, and maintainability**.
## Linter Integration
This lens is backed by static analysis tools. Run first:
```bash
# Dead code detection
deadnix --fail .
# Anti-pattern detection
statix check .
```
Focus LLM review on semantic issues linters can't catch.
## What to Look For
### Dead Code (deadnix)
- Unused let bindings
- Unused function arguments (use `_` prefix if intentional)
- Unused imports in module args
- Unreachable code paths
### Anti-Patterns (statix)
- `with pkgs;` or `with lib;` abuse (prefer explicit references)
- `rec {}` when `let ... in` would be cleaner
- Manual `callPackage` instead of overlay
- IFD (import-from-derivation) without clear justification
- `builtins.toPath` (deprecated)
- Empty patterns in conditionals
### Module Design
- Options without type annotations (`types.str`, `types.listOf`, etc.)
- Missing `mkEnableOption` for boolean enable flags
- Options without descriptions or examples
- `config` references in `options` (evaluation order issues)
- Circular imports between modules
- Missing `lib.mkIf cfg.enable` guards around config blocks
- Misuse of `lib.mkDefault` vs `lib.mkForce` (prefer mkDefault for overridable)
- Missing `assertions` for invalid configuration detection
### Flake Hygiene
- Missing `flake.lock` (should be committed)
- Stale lock file (very old nixpkgs revision)
- Unused flake inputs
- `follows` mismatches causing duplicate nixpkgs
- Missing `nixConfig` for substituters
### Overlay & Override Issues
- `override` vs `overrideAttrs` confusion
- Overlays that don't compose (reference `final` vs `prev` correctly)
- `callPackage` in overlay without proper scoping
- Fixed-point (overlay) evaluated too early
### Reproducibility & Purity
- `import <nixpkgs> {}` (impure, non-reproducible)
- `builtins.getEnv` or `builtins.currentTime` (impure)
- Missing `hash` on `fetchTarball`, `fetchGit`, `fetchurl`
- Reliance on `NIX_PATH` in flake-based projects
### Build Efficiency
- `src = ./.` without filtering (rebuilds on README changes)
- Missing `lib.cleanSource` or `nix-gitignore`
- `builtins.readDir` on large trees at eval time
### NixOS-Specific
- `environment.systemPackages` bloat (prefer per-user or per-service)
- `system.stateVersion` modifications (should never change after install)
- `nixpkgs.config.allowUnfree = true` globally (prefer per-package)
- Imperative state in `system.activationScripts` without guards
## Output Format
```
[NIX-HYGIENE] <severity:HIGH|MED|LOW> <file:line>
Issue: <anti-pattern or code quality issue>
Linter: <deadnix|statix|manual> if applicable
Suggest: <cleaner alternative>
Evidence: <pattern found>
```
## Common Fixes
```nix
# Bad: with abuse
{ pkgs, ... }: with pkgs; [ git vim curl ]
# Good: explicit references (no 'with')
{ pkgs, ... }: [ pkgs.git pkgs.vim pkgs.curl ]
# Bad: missing type
options.myOption = mkOption { default = ""; };
# Good: typed option
options.myOption = mkOption {
type = types.str;
default = "";
description = "What this option does";
};
# Bad: unused binding
let
foo = "unused"; # deadnix will flag
bar = "used";
in bar
# Good: remove or prefix with _
let
_foo = "intentionally unused";
bar = "used";
in bar
```
## False Positive Risks
- `_` prefixed args are intentionally unused (function signature compatibility)
- Some `with` usage is acceptable in small scopes (e.g., `meta = with lib; { ... }`)
- IFD may be justified for code generation or complex derivations
- Flake inputs may be used transitively via `follows`
- `rec {}` is standard in `mkDerivation` for referencing `version` in `src`
- `allowUnfree = true` globally may be deliberate policy for workstations
- `systemPackages` bloat is acceptable for single-user immutable systems
## Guidelines
- **HIGH** = evaluation errors, circular deps, type mismatches, stale security-critical lock
- **MED** = anti-patterns, missing types, dead code, module design issues
- **LOW** = style preferences, minor cleanup, verbose but working code
- Run deadnix and statix first - don't duplicate their findings
- Focus on semantic issues: module boundaries, design patterns, maintainability
- `with` is not inherently evil - context matters

View file

@ -0,0 +1,116 @@
# Observability Review Lens
Review operational infrastructure for **visibility, monitoring, and debuggability**.
## What to Look For
### Silent Failures
- Commands with stderr redirected to /dev/null without reason
- Missing exit code checks after critical operations
- catch/except blocks that swallow errors silently
- Cron jobs without output capture or MAILTO
- systemd services without logging (StandardOutput/StandardError)
### Health Checks
- Docker: Missing HEALTHCHECK instruction
- Compose: Missing `healthcheck:` for services with dependencies
- Kubernetes: Missing readiness/liveness probes
- systemd: Missing `ExecStartPre` validation or `Type=notify`
- No startup verification before declaring "ready"
### Logging Quality
- Print statements instead of structured logging
- Missing log levels (debug/info/warn/error)
- Logs without timestamps
- No context in error messages (what failed, with what input?)
- Secrets/PII in log output
### Correlation & Tracing
- Multi-step scripts without operation IDs
- Distributed operations without trace/correlation IDs
- Missing request IDs in API services
- No way to follow a request across components
### Metrics & Alerting
- Long-running services without metrics endpoint
- Missing duration/latency tracking for operations
- No alerting on critical failures
- Batch jobs without success/failure metrics
- Missing SLI/SLO instrumentation
- Scheduled jobs without heartbeat/dead-man's-switch (job didn't run at all)
### Resource Visibility
- No monitoring for disk space, inodes, file descriptors
- Missing memory/OOM visibility (services killed silently)
- Log directories without rotation policies
- No resource limit alerts before exhaustion
### Debugging Capability
- No way to enable verbose/debug mode
- Missing dry-run for complex operations
- No state inspection commands
- Logs that don't include enough context to reproduce issues
- Missing version/commit hash in startup logs (was the deploy successful?)
- No config dump on startup (redacted) to verify active configuration
## Output Format
```
[OBSERVABILITY] <severity:HIGH|MED|LOW> <file:line>
Issue: <what's invisible or silent>
Impact: <debugging difficulty, missed failures>
Suggest: <add logging, health check, metrics>
Evidence: <pattern found>
```
## Common Fixes
```dockerfile
# Docker health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s \
CMD curl -f http://localhost:8080/health || exit 1
```
```yaml
# Compose health check
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
```ini
# systemd logging and notification
[Service]
Type=notify
StandardOutput=journal
StandardError=journal
ExecStartPre=/usr/bin/test -f /etc/myapp/config.yaml
```
```bash
# Script with operation ID
OP_ID="${OP_ID:-$(date +%s)-$$}"
log() { echo "[$(date -Iseconds)] [$OP_ID] $*" >&2; }
log "Starting backup operation"
```
## False Positive Risks
- Intentionally quiet commands in pipelines (intermediate steps)
- Services with external health monitoring (not self-checks)
- Development/test environments where full observability is overhead
- K8s ignores Dockerfile HEALTHCHECK; probes are the real control
- Cron without MAILTO when jobs emit metrics instead
- Utility/sidecar containers that don't need full instrumentation
## Guidelines
- **HIGH** = silent failures in production, no health checks on critical services
- **MED** = missing structured logging, no correlation IDs
- **LOW** = could improve debugging, missing nice-to-have metrics
- Ask: "If this fails at 3 AM, how would we know? How would we debug?"
- Containers should be observable from outside (health, logs, metrics)
- Every operation should be traceable from start to finish

View file

@ -0,0 +1,144 @@
# Orchestration Review Lens
Review operational infrastructure for **execution ordering, dependencies, and coupling**.
## What to Look For
### Implicit Dependencies
- Scripts assuming prior state without checking
- Services started without verifying dependencies are ready
- Missing `After=`/`Requires=` in systemd units
- Docker Compose without `depends_on` or health-based waiting
- Kubernetes without init containers for prerequisites
### Startup Ordering
- Database migrations run after app starts
- Config files expected before generation step runs
- Secrets not available when service initializes
- Race conditions between parallel starts
### Shutdown Ordering
- Missing `Before=` in systemd for reverse dependency order
- Missing `preStop` hooks in Kubernetes
- Database stops before app finishes flushing
- No drain period before termination
### Circular Dependencies
- Service A requires B, B requires A
- Deadlocks in systemd ordering
- Flake inputs with circular `follows`
- Scripts that call each other in loops
### Unclear Prerequisites
- No documentation of what must run first
- Missing README or runbook for deployment order
- Makefile targets without dependency declarations
- Scripts without pre-flight checks
### Coupling Issues
- Hard-coded hostnames/ports instead of service discovery
- Direct database access from multiple services (shared state)
- File-based coupling (service A writes, B reads)
- Implicit timing assumptions ("service B is always slower")
### NixOS/systemd Specific
- Missing `Wants=` for optional dependencies
- `After=` without corresponding `Requires=` (ordering without guarantee)
- Activation scripts with implicit ordering
- Missing `PartOf=` for lifecycle coupling
### Docker/Kubernetes Specific
- `depends_on` without `condition: service_healthy`
- Missing `restartPolicy` for transient dependency failures
- Init containers without proper failure handling
- Jobs without `ttlSecondsAfterFinished`
### CI/CD Pipelines
- GitHub Actions jobs without `needs:` for dependencies
- Deployment before build/test completion
- Missing artifact upload/download between jobs
- Parallel jobs accessing same resources without locks
### Job Concurrency
- Scheduled jobs without concurrency control (flock, forbidConcurrency)
- Multiple replicas running migrations simultaneously
- Missing leader election for singleton tasks
- Thundering herd on dependency recovery (add backoff/jitter)
## Output Format
```
[ORCHESTRATION] <severity:HIGH|MED|LOW> <file:line>
Issue: <what ordering/dependency is unclear or broken>
Impact: <race condition, startup failure, implicit coupling>
Suggest: <explicit dependency, health check, documentation>
Evidence: <pattern found>
```
## Common Fixes
```ini
# systemd explicit ordering with guarantee
[Unit]
After=postgresql.service
Requires=postgresql.service
Wants=redis.service # optional dependency
```
```yaml
# Docker Compose health-based dependency
services:
app:
depends_on:
db:
condition: service_healthy
cache:
condition: service_started
```
```yaml
# Kubernetes init container
initContainers:
- name: wait-for-db
image: busybox
command: ['sh', '-c', 'until nc -z db 5432; do sleep 1; done']
```
```makefile
# Makefile with explicit dependencies
deploy: build test migrate
./deploy.sh
migrate: db-ready
./run-migrations.sh
```
```bash
# Script with pre-flight check
#!/bin/bash
set -euo pipefail
# Verify prerequisites
command -v jq >/dev/null || { echo "jq required"; exit 1; }
[ -f /etc/app/config.yaml ] || { echo "Config missing"; exit 1; }
curl -sf http://db:5432/health || { echo "DB not ready"; exit 1; }
```
## False Positive Risks
- Intentionally loose coupling for flexibility
- Services with internal retry logic that handle missing dependencies
- Development environments with simplified ordering
- Stateless services that can start in any order
- Service mesh (Istio/Linkerd) handles sidecar injection automatically
- Event-driven systems designed for eventual consistency
- Init container `nc -z` redundant if app has robust retry logic
## Guidelines
- **HIGH** = race conditions, circular deps, startup failures in production
- **MED** = implicit ordering, missing health checks, undocumented prerequisites
- **LOW** = could be more explicit, minor coupling concerns
- Ask: "What happens if this starts before its dependency? What's the contract?"
- Explicit is better than implicit - document and enforce ordering
- Health checks > timing assumptions

View file

@ -0,0 +1,140 @@
# Resilience Review Lens
Review operational infrastructure for **runtime fault tolerance and graceful degradation**.
## What to Look For
### Timeouts
- Network calls without timeout (curl, wget, API clients)
- Database connections without connect/query timeout
- HTTP clients with no deadline or infinite timeout
- Missing `TimeoutStartSec`/`TimeoutStopSec` in systemd
### Retries & Backoff
- Retry logic without exponential backoff
- Missing jitter (thundering herd on recovery)
- Infinite retry loops without circuit breaker
- No max retry limit
### Circuit Breakers
- External API calls without failure threshold
- Database connections that retry forever on outage
- Missing fallback behavior when dependency unavailable
- No degraded mode for non-critical features
### Graceful Shutdown
- No SIGTERM handler (abrupt termination)
- Missing drain period for in-flight requests
- Database connections not closed on shutdown
- systemd `KillMode=control-group` without `ExecStop`
- Missing `stop_grace_period` in Docker Compose
### Resource Limits
- systemd: Missing `MemoryMax`, `CPUQuota`, `TasksMax`
- Docker: Missing `mem_limit`, `cpus`, `pids_limit`
- Kubernetes: Missing resource requests/limits
- No `ulimit` for file descriptors in high-connection services
- Missing `LimitNOFILE` in systemd for network services
### Connection Management
- No connection pooling for databases
- Missing connection limits (max connections)
- No idle timeout for connection pools
- Connections held across retries (stale connections)
### Rate Limiting
- No rate limiting on API endpoints
- Missing backpressure handling for queues
- Unbounded work queues that grow under load
### Health Checks & Self-Healing
- Missing liveness probes (deadlocked process not restarted)
- Missing readiness probes (traffic sent before initialization)
- Aggressive liveness probes that fail on dependency outages (restart loops)
- Missing `WatchdogSec` in systemd for self-healing
- No startup probe/warmup period before traffic
### DNS & Network
- DNS caching forever (fails on failover/IP changes)
- No DNS resolution timeout
- Missing TCP keepalives for detecting dead connections
### Storage & Logging
- Unbounded logging filling disk (missing log rotation)
- Docker without `max-size` log option
- Missing disk space checks before write-heavy operations
## Output Format
```
[RESILIENCE] <severity:HIGH|MED|LOW> <file:line>
Issue: <what fails under stress or partial outage>
Impact: <cascade failure, resource exhaustion, hung process>
Suggest: <timeout, backoff, circuit breaker, limit>
Evidence: <pattern found>
```
## Common Fixes
```bash
# curl with timeout
curl --connect-timeout 5 --max-time 30 "$url"
# wget with timeout
wget --timeout=30 --tries=3 "$url"
```
```ini
# systemd resource limits and timeouts
[Service]
TimeoutStartSec=30
TimeoutStopSec=30
MemoryMax=512M
CPUQuota=50%
TasksMax=100
LimitNOFILE=65535
```
```yaml
# Docker Compose limits
services:
app:
mem_limit: 512m
cpus: 0.5
pids_limit: 100
stop_grace_period: 30s
```
```python
# Python retry with backoff
import backoff
@backoff.on_exception(
backoff.expo,
requests.exceptions.RequestException,
max_tries=5,
jitter=backoff.full_jitter
)
def call_api():
return requests.get(url, timeout=30)
```
## False Positive Risks
- Local-only operations don't need network timeouts
- Batch jobs may intentionally run without time limits
- Development environments don't need production resource limits
- Some services are designed to wait indefinitely (message queues)
- WebSockets/SSE/long-polling have different timeout semantics
- Service mesh (Istio/Envoy) may handle retries at infrastructure layer
- systemd default stop behavior may be sufficient (no ExecStop needed)
## Guidelines
- **HIGH** = no timeout on external calls, missing graceful shutdown in production
- **MED** = no backoff/jitter, missing resource limits, no connection pooling
- **LOW** = could be more defensive, missing nice-to-have limits
- Ask: "What happens when the database is slow? When the API is down?"
- Every external call needs a timeout - no exceptions
- **Retry safety**: Only suggest retries for read-only or known-idempotent operations
- Does NOT own: re-run safety (idempotency), change safety (blast-radius)

View file

@ -0,0 +1,113 @@
# Supply Chain Review Lens
Review operational infrastructure for **dependency provenance, pinning, and integrity**.
## What to Look For
### Unpinned Dependencies
- Docker: `FROM image:latest` or no tag (implicit latest)
- npm/pip: Missing lockfiles, `*` or `^` versions in production
- GitHub Actions: `uses: org/action@main` instead of SHA
- Gitea Actions: Same pattern, unpinned branch refs
- Helm: `version: "*"` or missing version constraints
### Nix-Specific Pinning
- Missing `flake.lock` (flakes should always have lockfile)
- `fetchurl`/`fetchzip` without SRI hash (`sha256`, `hash`)
- `builtins.fetchGit` without `rev` (floating HEAD)
- `builtins.fetchTarball` without `sha256` (common nixpkgs fetch)
- `fetchFromGitHub` without `hash` attribute
- IFD (import-from-derivation) fetching unpinned sources
### Terraform/Tofu Pinning
- Providers missing `version` constraint in `required_providers`
- Modules sourcing git URLs without `?ref=<sha>` or `?ref=<tag>`
- Missing `.terraform.lock.hcl` (provider checksums)
### Container Provenance
- Base images from untrusted registries
- Missing digest pinning: `image:tag` vs `image@sha256:...`
- Multi-stage builds losing provenance in final stage
- No signature verification (cosign, Notary)
### CI/CD Pipeline Risks
- Actions from unverified publishers
- Workflow injection via `${{ github.event.* }}` in run blocks
- Secrets exposed to untrusted PRs (pull_request_target)
- Missing OIDC for cloud auth (long-lived credentials instead)
- Missing `permissions:` block (GITHUB_TOKEN has write access by default)
### Build-Time Network Access
- Dockerfile `RUN curl/wget` fetching unpinned resources
- `go get` during Docker build without `go.sum` enforcement
- Nix `__noChroot = true` allowing network during build
- npm/pip install without lockfile enforcement in CI
### Binary/Artifact Integrity
- Downloads without checksum verification
- curl/wget piped to sh without hash check
- Missing GPG signature verification
- Unsigned packages from PPAs/third-party repos
### Substituters & Registries
- Nix: Untrusted substituters without signature verification
- Docker: Pulling from HTTP registries
- npm/pip: Private registries without auth/TLS
- Missing dependency confusion protections (scoped packages)
## Output Format
```
[SUPPLY-CHAIN] <severity:HIGH|MED|LOW> <file:line>
Issue: <what's unpinned or unverified>
Risk: <what could be injected or changed>
Suggest: <pin to SHA/hash, add verification>
Evidence: <unpinned reference found>
```
## Common Fixes
```dockerfile
# Docker: Pin to digest
FROM node:20@sha256:abc123...
# Or at minimum, pin major version
FROM node:20.10.0-alpine
```
```yaml
# GitHub Actions: Pin to SHA
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4.1.6
```
```nix
# Nix: Always include hash
fetchFromGitHub {
owner = "...";
repo = "...";
rev = "v1.2.3"; # or full SHA
hash = "sha256-...";
}
```
```bash
# Downloads: Verify before executing
curl -fsSL https://example.com/install.sh -o install.sh
echo "expected_sha256 install.sh" | sha256sum -c -
bash install.sh
```
## False Positive Risks
- Development/CI images where latest is intentional for testing
- Internal trusted registries with controlled update policies
- Nix flakes auto-update workflows with proper review
## Guidelines
- **HIGH** = unpinned production deps, unverified downloads, curl|sh
- **MED** = unpinned CI actions, missing lockfiles, unverified registries
- **LOW** = dev dependencies unpinned, internal tooling
- Every external dependency is a trust decision
- Prefer: SHA > tag > branch > latest
- Nix flakes with `flake.lock` are good; verify lock is committed