feat: add ops-review skill with Phase 1 lenses
Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD). Modeled on code-review with linter-first hybrid architecture. Phase 1 lenses (core safety): - secrets: credential exposure, Nix store, Docker layers, CI masking - shell-safety: shellcheck-backed, temp files, guard snippets - blast-radius: targeting/scoping, dry-run, rollback - privilege: least-privilege, containers, systemd sandboxing Design reviewed via orch consensus (sonar, flash-or, gemini, gpt). Lenses deploy to ~/.config/lenses/ops/ via home-manager. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
503053638a
commit
fb882a9434
|
|
@ -34,6 +34,21 @@
|
||||||
{"id":"skills-8y6","title":"Define skill versioning strategy","description":"Git SHA alone is insufficient. Need tuple approach:\n\n- skill_source_rev: git SHA (if available)\n- skill_content_hash: hash of SKILL.md + scripts\n- runtime_ref: flake.lock hash or Nix store path\n\nQuestions to resolve:\n- Do Protos pin to versions (stable but maintenance) or float on latest (risky)?\n- How to handle breaking changes in skills?\n- Record in wisp trace vs proto definition?\n\nFrom consensus: both models flagged versioning instability as high severity.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T19:49:30.839064445-05:00","updated_at":"2025-12-23T20:55:04.439779336-05:00","closed_at":"2025-12-23T20:55:04.439779336-05:00","close_reason":"ADRs revised with orch consensus feedback"}
|
{"id":"skills-8y6","title":"Define skill versioning strategy","description":"Git SHA alone is insufficient. Need tuple approach:\n\n- skill_source_rev: git SHA (if available)\n- skill_content_hash: hash of SKILL.md + scripts\n- runtime_ref: flake.lock hash or Nix store path\n\nQuestions to resolve:\n- Do Protos pin to versions (stable but maintenance) or float on latest (risky)?\n- How to handle breaking changes in skills?\n- Record in wisp trace vs proto definition?\n\nFrom consensus: both models flagged versioning instability as high severity.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T19:49:30.839064445-05:00","updated_at":"2025-12-23T20:55:04.439779336-05:00","closed_at":"2025-12-23T20:55:04.439779336-05:00","close_reason":"ADRs revised with orch consensus feedback"}
|
||||||
{"id":"skills-9af","title":"spec-review: Add spike/research task handling","description":"Tasks like 'Investigate X' can linger without clear outcomes.\n\nAdd to REVIEW_TASKS:\n- Flag research/spike tasks\n- Require timebox and concrete outputs (decision record, prototype, risks)\n- Pattern for handling unknowns","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:26.887719136-08:00","updated_at":"2025-12-15T14:08:13.441095034-08:00","closed_at":"2025-12-15T14:08:13.441095034-08:00"}
|
{"id":"skills-9af","title":"spec-review: Add spike/research task handling","description":"Tasks like 'Investigate X' can linger without clear outcomes.\n\nAdd to REVIEW_TASKS:\n- Flag research/spike tasks\n- Require timebox and concrete outputs (decision record, prototype, risks)\n- Pattern for handling unknowns","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:26.887719136-08:00","updated_at":"2025-12-15T14:08:13.441095034-08:00","closed_at":"2025-12-15T14:08:13.441095034-08:00"}
|
||||||
{"id":"skills-9bc","title":"Investigate pre-compression hook for worklogs","description":"## Revised Understanding\n\nClaude Code already persists full conversation history in `~/.claude/projects/\u003cproject\u003e/\u003csession-id\u003e.jsonl`. Pre-compact hooks aren't needed for data capture.\n\n## Question\nWhat's the ideal workflow for generating worklogs from session data?\n\n## Options\n\n### 1. Post-session script\n- Run after exiting Claude Code\n- Reads most recent session JSONL\n- Generates worklog from conversation content\n- Pro: Async, doesn't interrupt flow\n- Con: May forget to run it\n\n### 2. On-demand slash command\n- `/worklog-from-session` or similar\n- Reads current session's JSONL file\n- Generates worklog with full context\n- Pro: Explicit control\n- Con: Still need to remember\n\n### 3. Pre-compact reminder\n- Hook prints reminder: \"Consider running /worklog\"\n- Doesn't automate, just nudges\n- Pro: Simple, non-intrusive\n- Con: Easy to dismiss\n\n### 4. Async batch processing\n- Process old sessions whenever\n- All data persists in JSONL files\n- Pro: No urgency, can do later\n- Con: Context may be stale\n\n## Data Format\nSession files contain:\n- User messages with timestamp\n- Assistant responses with model info\n- Tool calls and results\n- Git branch, cwd, version info\n\n## Next Steps\n- Decide preferred workflow\n- Build script to parse session JSONL → worklog format","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-17T14:32:32.568430817-08:00","updated_at":"2025-12-17T15:56:38.864916015-08:00","closed_at":"2025-12-17T15:56:38.864916015-08:00","close_reason":"Pivoted: worklogs may be redundant given full conversation persistence. New approach: make conversations searchable directly."}
|
{"id":"skills-9bc","title":"Investigate pre-compression hook for worklogs","description":"## Revised Understanding\n\nClaude Code already persists full conversation history in `~/.claude/projects/\u003cproject\u003e/\u003csession-id\u003e.jsonl`. Pre-compact hooks aren't needed for data capture.\n\n## Question\nWhat's the ideal workflow for generating worklogs from session data?\n\n## Options\n\n### 1. Post-session script\n- Run after exiting Claude Code\n- Reads most recent session JSONL\n- Generates worklog from conversation content\n- Pro: Async, doesn't interrupt flow\n- Con: May forget to run it\n\n### 2. On-demand slash command\n- `/worklog-from-session` or similar\n- Reads current session's JSONL file\n- Generates worklog with full context\n- Pro: Explicit control\n- Con: Still need to remember\n\n### 3. Pre-compact reminder\n- Hook prints reminder: \"Consider running /worklog\"\n- Doesn't automate, just nudges\n- Pro: Simple, non-intrusive\n- Con: Easy to dismiss\n\n### 4. Async batch processing\n- Process old sessions whenever\n- All data persists in JSONL files\n- Pro: No urgency, can do later\n- Con: Context may be stale\n\n## Data Format\nSession files contain:\n- User messages with timestamp\n- Assistant responses with model info\n- Tool calls and results\n- Git branch, cwd, version info\n\n## Next Steps\n- Decide preferred workflow\n- Build script to parse session JSONL → worklog format","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-17T14:32:32.568430817-08:00","updated_at":"2025-12-17T15:56:38.864916015-08:00","closed_at":"2025-12-17T15:56:38.864916015-08:00","close_reason":"Pivoted: worklogs may be redundant given full conversation persistence. New approach: make conversations searchable directly."}
|
||||||
|
{"id":"skills-9cu","title":"ops-review skill","description":"Multi-lens review skill for operational infrastructure (Nix, shell, Docker, CI/CD).\n\nBased on code-review pattern with linter-first hybrid architecture.\n\n## Phases\n- Phase 1: Skeleton + Core Safety (secrets, shell-safety, blast-radius, privilege)\n- Phase 2: Reliability (idempotency, supply-chain, observability)\n- Phase 3: Architecture (nix-hygiene, resilience, orchestration)\n\n## Design\nSee specs/ops-review/plan.md\n\n## Success Criteria\n- Review dotfiles/ and find real issues\n- Review prox-setup/ and find real issues\n- \u003c10% false positive rate on Phase 1\n- Quick mode \u003c30s","status":"open","priority":1,"issue_type":"epic","created_at":"2026-01-01T16:55:15.772440374-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:15.772440374-05:00"}
|
||||||
|
{"id":"skills-9cu.1","title":"Create skill skeleton","description":"Create directory structure and base files:\n- skills/ops-review/SKILL.md (workflow, modeled on code-review)\n- skills/ops-review/README.md (user docs)\n- skills/ops-review/lenses/README.md (lens index)\n\nBlocks all lens work.","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:22.084083175-05:00","created_by":"dan","updated_at":"2026-01-01T17:08:20.384800582-05:00","closed_at":"2026-01-01T17:08:20.384800582-05:00","close_reason":"Created skeleton: SKILL.md, README.md, lenses/README.md","dependencies":[{"issue_id":"skills-9cu.1","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:22.095950548-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.10","title":"Lens: resilience","description":"Create resilience.md lens for fault tolerance:\n- Missing timeouts on network calls\n- No retries with backoff\n- Missing circuit breakers\n- No graceful shutdown (SIGTERM)\n- Missing resource limits\n\nBoundary: Owns runtime tolerance, NOT change safety","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.876125632-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.876125632-05:00","dependencies":[{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.878008563-05:00","created_by":"dan"},{"issue_id":"skills-9cu.10","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.881250755-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.11","title":"Lens: orchestration","description":"Create orchestration.md lens for execution ordering:\n- Unclear prerequisites\n- Missing order documentation\n- Circular dependencies\n- Assumed prior state\n- Implicit coupling\n\nMost complex - needs cross-file context","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:01.098528225-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:01.098528225-05:00","dependencies":[{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:01.100559128-05:00","created_by":"dan"},{"issue_id":"skills-9cu.11","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:01.104046552-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.12","title":"Integration: flake.nix + ai-skills.nix","description":"Add ops-review to deployment:\n- Add to flake.nix availableSkills\n- Update modules/ai-skills.nix for ops lens deployment\n- Deploy to ~/.config/lenses/ops/","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.324752872-05:00","created_by":"dan","updated_at":"2026-01-01T18:34:37.960786687-05:00","closed_at":"2026-01-01T18:34:37.960786687-05:00","close_reason":"Added ops-review to flake.nix availableSkills, updated ai-skills.nix with description and lens deployment to ~/.config/lenses/ops/","dependencies":[{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.339878541-05:00","created_by":"dan"},{"issue_id":"skills-9cu.12","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:13.34278836-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.13","title":"Validation: test on dotfiles","description":"Run Phase 1 lenses on ~/proj/dotfiles:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.489473975-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:13.489473975-05:00","dependencies":[{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.490574316-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.492551051-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.494453305-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.496395361-05:00","created_by":"dan"},{"issue_id":"skills-9cu.13","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.49824655-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.14","title":"Validation: test on prox-setup","description":"Run Phase 1 lenses on ~/proj/prox-setup:\n- Verify findings are real issues\n- Check false positive rate \u003c10%\n- Document any needed lens refinements","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:56:13.676548941-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:13.676548941-05:00","dependencies":[{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:13.677846482-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.2","type":"blocks","created_at":"2026-01-01T16:56:13.680528791-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.3","type":"blocks","created_at":"2026-01-01T16:56:13.683748368-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.4","type":"blocks","created_at":"2026-01-01T16:56:13.68689222-05:00","created_by":"dan"},{"issue_id":"skills-9cu.14","depends_on_id":"skills-9cu.5","type":"blocks","created_at":"2026-01-01T16:56:13.689241654-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.2","title":"Lens: secrets","description":"Create secrets.md lens for credential hygiene:\n- Hardcoded secrets, API keys, tokens\n- SOPS config issues\n- Secrets in logs/error messages\n- Secrets via CLI args\n- Missing encryption\n\nLinter integration: gitleaks patterns","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.394704404-05:00","created_by":"dan","updated_at":"2026-01-01T17:12:01.063844363-05:00","closed_at":"2026-01-01T17:12:01.063844363-05:00","close_reason":"Created secrets.md lens with Nix store, Docker layer, CI masking checks. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.2","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.400663129-05:00","created_by":"dan"},{"issue_id":"skills-9cu.2","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.404368195-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.3","title":"Lens: shell-safety","description":"Create shell-safety.md lens (shellcheck-backed):\n- Missing set -euo pipefail\n- Unquoted variables (SC2086)\n- Unsafe command substitution\n- Missing error handling\n- Hardcoded paths\n\nLinter integration: shellcheck JSON output","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.596966874-05:00","created_by":"dan","updated_at":"2026-01-01T17:16:27.274701375-05:00","closed_at":"2026-01-01T17:16:27.274701375-05:00","close_reason":"Created shell-safety.md lens with temp file safety, input validation, set -e nuance, guard snippets. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.598340159-05:00","created_by":"dan"},{"issue_id":"skills-9cu.3","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.600733142-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.4","title":"Lens: blast-radius","description":"Create blast-radius.md lens for change safety:\n- Destructive ops without confirmation\n- Missing dry-run mode\n- No rollback strategy\n- Bulk ops without batching\n- Missing pre-flight checks\n\nLLM-primary: understanding implications","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.792059661-05:00","created_by":"dan","updated_at":"2026-01-01T17:24:07.972638831-05:00","closed_at":"2026-01-01T17:24:07.972638831-05:00","close_reason":"Created blast-radius.md with targeting/scoping, empty var expansion, env gates, scope in output, mitigation downgrades. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.793564277-05:00","created_by":"dan"},{"issue_id":"skills-9cu.4","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:35.796234701-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.5","title":"Lens: privilege","description":"Create privilege.md lens for least-privilege:\n- Unnecessary sudo/root\n- Containers as root\n- chmod 777 patterns\n- Missing capability drops\n- Docker socket mounting\n- systemd without sandboxing","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-01T16:55:35.996280533-05:00","created_by":"dan","updated_at":"2026-01-01T18:30:25.980656507-05:00","closed_at":"2026-01-01T18:30:25.980656507-05:00","close_reason":"Created privilege.md with network binding, setuid/setgid, K8s specifics, compensating controls, curl|sudo bash. Reviewed via orch consensus.","dependencies":[{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:35.999435334-05:00","created_by":"dan"},{"issue_id":"skills-9cu.5","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:36.004010491-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.6","title":"Lens: idempotency","description":"Create idempotency.md lens for safe re-execution:\n- Scripts that break on re-run\n- Missing existence checks\n- Non-atomic operations\n- Check-then-act race conditions\n- Missing cleanup on failure\n\nBoundary: Owns convergence, NOT rollback or retries","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.04397031-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.04397031-05:00","dependencies":[{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.061027066-05:00","created_by":"dan"},{"issue_id":"skills-9cu.6","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.065409149-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.7","title":"Lens: supply-chain","description":"Create supply-chain.md lens for provenance:\n- Unpinned versions (latest tags)\n- Actions not pinned to SHA\n- Missing flake.lock/SRI hashes\n- Unsigned artifacts\n- Untrusted registries","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.317966318-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.317966318-05:00","dependencies":[{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.319754113-05:00","created_by":"dan"},{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.322943568-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.8","title":"Lens: observability","description":"Create observability.md lens for visibility:\n- Silent failures\n- Missing health checks\n- Incomplete metrics\n- Missing structured logging\n- No correlation IDs","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.562009474-05:00","created_by":"dan","updated_at":"2026-01-01T16:55:49.562009474-05:00","dependencies":[{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.564394694-05:00","created_by":"dan"},{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.571005731-05:00","created_by":"dan"}]}
|
||||||
|
{"id":"skills-9cu.9","title":"Lens: nix-hygiene","description":"Create nix-hygiene.md lens (statix/deadnix-backed):\n- Dead code (unused bindings)\n- Anti-patterns (with lib abuse, IFD)\n- Module boundary violations\n- Overlay issues\n- Missing option types\n\nLinter integration: statix + deadnix JSON","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.623672452-05:00","created_by":"dan","updated_at":"2026-01-01T16:56:00.623672452-05:00","dependencies":[{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.638729349-05:00","created_by":"dan"},{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.643063075-05:00","created_by":"dan"}]}
|
||||||
{"id":"skills-a0x","title":"spec-review: Add traceability requirements across artifacts","description":"Prompts don't enforce spec → plan → tasks linkage. Drift can occur without detection.\n\nAdd:\n- Require trace matrix or linkage in reviews\n- Each plan item should reference spec requirement\n- Each task should reference plan item\n- Flag unmapped items and extra scope","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:25.270581198-08:00","updated_at":"2025-12-15T14:05:48.196356786-08:00","closed_at":"2025-12-15T14:05:48.196356786-08:00"}
|
{"id":"skills-a0x","title":"spec-review: Add traceability requirements across artifacts","description":"Prompts don't enforce spec → plan → tasks linkage. Drift can occur without detection.\n\nAdd:\n- Require trace matrix or linkage in reviews\n- Each plan item should reference spec requirement\n- Each task should reference plan item\n- Flag unmapped items and extra scope","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:25.270581198-08:00","updated_at":"2025-12-15T14:05:48.196356786-08:00","closed_at":"2025-12-15T14:05:48.196356786-08:00"}
|
||||||
{"id":"skills-a23","title":"Update main README to list all 9 skills","description":"Main README.md 'Skills Included' section only lists worklog and update-spec-kit. Repo actually has 9 skills: template, worklog, update-spec-kit, screenshot-latest, niri-window-capture, tufte-press, update-opencode, web-research, web-search.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:14.042397754-08:00","updated_at":"2025-12-28T22:08:02.074758486-05:00","closed_at":"2025-12-28T22:08:02.074758486-05:00","close_reason":"Updated README with table listing all 14 skills (5 deployed, 8 available, 1 development template)","dependencies":[{"issue_id":"skills-a23","depends_on_id":"skills-4yn","type":"blocks","created_at":"2025-11-30T12:01:30.306742184-08:00","created_by":"daemon","metadata":"{}"}]}
|
{"id":"skills-a23","title":"Update main README to list all 9 skills","description":"Main README.md 'Skills Included' section only lists worklog and update-spec-kit. Repo actually has 9 skills: template, worklog, update-spec-kit, screenshot-latest, niri-window-capture, tufte-press, update-opencode, web-research, web-search.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:14.042397754-08:00","updated_at":"2025-12-28T22:08:02.074758486-05:00","closed_at":"2025-12-28T22:08:02.074758486-05:00","close_reason":"Updated README with table listing all 14 skills (5 deployed, 8 available, 1 development template)","dependencies":[{"issue_id":"skills-a23","depends_on_id":"skills-4yn","type":"blocks","created_at":"2025-11-30T12:01:30.306742184-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||||
{"id":"skills-al5","title":"Consider repo-setup-verification skill","description":"The dotfiles repo has a repo-setup-prompt.md verification checklist that could become a skill.\n\n**Source**: ~/proj/dotfiles/docs/repo-setup-prompt.md\n\n**What it does**:\n- Verifies .envrc has use_api_keys and skills loading\n- Checks .skills manifest exists with appropriate skills\n- Optionally checks beads setup\n- Verifies API keys are loaded\n\n**As a skill it could**:\n- Be invoked to audit any repo's agent setup\n- Offer to fix missing pieces\n- Provide consistent onboarding for new repos\n\n**Questions**:\n- Is this better as a skill vs a slash command?\n- Should it auto-fix or just report?\n- Does it belong in skills repo or dotfiles?","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T12:38:32.561337354-08:00","updated_at":"2025-12-28T22:22:57.639520516-05:00","closed_at":"2025-12-28T22:22:57.639520516-05:00","close_reason":"Decided: keep as prompt doc in dotfiles, not a skill. Claude can read it when asked. No wrapper benefit, and it's dotfiles-specific setup (not general skill). ai-tools-doctor handles version checking separately."}
|
{"id":"skills-al5","title":"Consider repo-setup-verification skill","description":"The dotfiles repo has a repo-setup-prompt.md verification checklist that could become a skill.\n\n**Source**: ~/proj/dotfiles/docs/repo-setup-prompt.md\n\n**What it does**:\n- Verifies .envrc has use_api_keys and skills loading\n- Checks .skills manifest exists with appropriate skills\n- Optionally checks beads setup\n- Verifies API keys are loaded\n\n**As a skill it could**:\n- Be invoked to audit any repo's agent setup\n- Offer to fix missing pieces\n- Provide consistent onboarding for new repos\n\n**Questions**:\n- Is this better as a skill vs a slash command?\n- Should it auto-fix or just report?\n- Does it belong in skills repo or dotfiles?","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T12:38:32.561337354-08:00","updated_at":"2025-12-28T22:22:57.639520516-05:00","closed_at":"2025-12-28T22:22:57.639520516-05:00","close_reason":"Decided: keep as prompt doc in dotfiles, not a skill. Claude can read it when asked. No wrapper benefit, and it's dotfiles-specific setup (not general skill). ai-tools-doctor handles version checking separately."}
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,160 @@
|
||||||
|
#+TITLE: ops-review Skill Design, Orch Consensus Planning, and Skeleton Implementation
|
||||||
|
#+DATE: 2026-01-01
|
||||||
|
#+KEYWORDS: ops-review, skill-design, orch-consensus, lenses, infrastructure-review, nix, shell-safety, secrets
|
||||||
|
#+COMMITS: 0 (uncommitted work in progress)
|
||||||
|
#+COMPRESSION_STATUS: uncompressed
|
||||||
|
|
||||||
|
* Session Summary
|
||||||
|
** Date: 2026-01-01
|
||||||
|
** Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis
|
||||||
|
|
||||||
|
* Accomplishments
|
||||||
|
- [X] Explored dotfiles and prox-setup repos to understand actual ops artifact landscape
|
||||||
|
- [X] Designed ops-review skill with 10 lenses across 3 phases
|
||||||
|
- [X] Ran orch consensus (sonar, flash-or, gemini, gpt) on initial plan
|
||||||
|
- [X] Incorporated consensus feedback: linter-first hybrid architecture, crisp lens boundaries
|
||||||
|
- [X] Created comprehensive plan.md in specs/ops-review/
|
||||||
|
- [X] Created bd epic (skills-9cu) with 14 child tasks, proper dependency graph
|
||||||
|
- [X] Built skill skeleton: SKILL.md, README.md, lenses/README.md
|
||||||
|
- [X] Drafted secrets.md lens with orch consensus review
|
||||||
|
- [X] Incorporated Nix store exposure, Docker layer persistence, CI masking feedback
|
||||||
|
- [X] Filed follow-up issue in dotfiles for gitleaks availability (dotfiles-x2m)
|
||||||
|
- [ ] Remaining Phase 1 lenses: shell-safety, blast-radius, privilege
|
||||||
|
|
||||||
|
* Key Decisions
|
||||||
|
** Decision 1: Linter-first hybrid architecture
|
||||||
|
- Context: How should ops-review analyze infrastructure code?
|
||||||
|
- Options considered:
|
||||||
|
1. Pure LLM analysis - flexible but prone to syntax hallucinations
|
||||||
|
2. Pure linter - deterministic but misses semantic issues
|
||||||
|
3. Hybrid: linters first, LLM interprets - best of both
|
||||||
|
- Rationale: All 4 consensus models agreed LLMs hallucinate syntax but excel at understanding intent. Static tools catch syntax, LLM finds logic bugs.
|
||||||
|
- Impact: Each lens integrates with specific tools (shellcheck, statix, gitleaks)
|
||||||
|
|
||||||
|
** Decision 2: 10 lenses across 3 phases
|
||||||
|
- Context: How many lenses and how to prioritize?
|
||||||
|
- Initial proposal: 8 lenses
|
||||||
|
- Consensus feedback: Add privilege (least-privilege) and supply-chain (pinning)
|
||||||
|
- Phase 1 (quick mode): secrets, shell-safety, blast-radius, privilege
|
||||||
|
- Phase 2: idempotency, supply-chain, observability
|
||||||
|
- Phase 3: nix-hygiene, resilience, orchestration
|
||||||
|
|
||||||
|
** Decision 3: Crisp lens boundaries to avoid duplicate findings
|
||||||
|
- Problem: resilience/blast-radius/idempotency overlap
|
||||||
|
- Solution: Define ownership table
|
||||||
|
- idempotency: safe re-run, convergence, atomic writes
|
||||||
|
- resilience: runtime fault tolerance, timeouts, retries
|
||||||
|
- blast-radius: change safety, dry-run, rollback
|
||||||
|
|
||||||
|
** Decision 4: Nix-specific checks as first-class concerns
|
||||||
|
- Context: Nix has unique security model (world-readable store)
|
||||||
|
- Insight from consensus: Secrets in .nix strings become readable in /nix/store
|
||||||
|
- Added to secrets lens: explicit Nix store exposure check
|
||||||
|
- Remediation: sops-nix/agenix with runtime paths, not embedded strings
|
||||||
|
|
||||||
|
* Problems & Solutions
|
||||||
|
| Problem | Solution | Learning |
|
||||||
|
|---------+----------+----------|
|
||||||
|
| Initial lens drafts too long (60+ lines) | Reference existing code-review lenses (~45 lines) | Consistent format matters for usability |
|
||||||
|
| Overlapping lens scopes | Created "Crisp Boundaries" table in plan | Define ownership explicitly upfront |
|
||||||
|
| What lenses are actually needed? | Explored real repos (dotfiles, prox-setup) | Ground design in actual artifacts |
|
||||||
|
| False positive risk in secrets lens | Added explicit exemptions (Nix hashes, public keys) | Two-signal rule for generic matches |
|
||||||
|
|
||||||
|
* Technical Details
|
||||||
|
|
||||||
|
** Code Changes
|
||||||
|
- Total files created: 5
|
||||||
|
- Key files:
|
||||||
|
- `specs/ops-review/plan.md` (261 lines) - Comprehensive design document
|
||||||
|
- `skills/ops-review/SKILL.md` (188 lines) - Agent workflow instructions
|
||||||
|
- `skills/ops-review/README.md` (96 lines) - User documentation
|
||||||
|
- `skills/ops-review/lenses/README.md` (85 lines) - Lens index
|
||||||
|
- `skills/ops-review/lenses/secrets.md` (53 lines) - First lens
|
||||||
|
|
||||||
|
** Commands Used
|
||||||
|
#+begin_src bash
|
||||||
|
# Explored actual infrastructure repos
|
||||||
|
# (via Task tool with Explore subagent)
|
||||||
|
|
||||||
|
# Ran multi-model consensus for plan review
|
||||||
|
uv run orch consensus "Review this ops-review skill design..." sonar flash-or gemini gpt
|
||||||
|
|
||||||
|
# Created bd epic with hierarchical children
|
||||||
|
bd create "ops-review skill" --type=epic -p 1 --description "..."
|
||||||
|
bd create "Lens: secrets" --parent skills-9cu -p 1 --deps skills-9cu.1
|
||||||
|
|
||||||
|
# Visualized dependency graph
|
||||||
|
bd graph skills-9cu
|
||||||
|
|
||||||
|
# Checked available work
|
||||||
|
bd ready
|
||||||
|
#+end_src
|
||||||
|
|
||||||
|
** Architecture Notes
|
||||||
|
- Skill follows code-review pattern: lenses as focused prompts
|
||||||
|
- Lenses deploy to ~/.config/lenses/ops/ via home-manager
|
||||||
|
- Quick mode (--quick) runs Phase 1 only for CI/pre-commit
|
||||||
|
- Cross-file awareness via grep-based reference mapping (source, imports)
|
||||||
|
|
||||||
|
* Process and Workflow
|
||||||
|
|
||||||
|
** What Worked Well
|
||||||
|
- Exploring real repos first grounded the design in actual needs
|
||||||
|
- orch consensus with 4 models surfaced gaps (Nix store, Docker layers)
|
||||||
|
- bd epic with --parent creates clean hierarchical structure
|
||||||
|
- Dependency graph visualization helped verify task ordering
|
||||||
|
|
||||||
|
** What Was Challenging
|
||||||
|
- Balancing lens completeness with ~45 line target format
|
||||||
|
- Deciding which checks are linter-backed vs LLM-primary
|
||||||
|
- Managing context across long design session
|
||||||
|
|
||||||
|
* Learning and Insights
|
||||||
|
|
||||||
|
** Technical Insights
|
||||||
|
- Nix store world-readability is a critical security consideration
|
||||||
|
- Docker ENV/ARG persist in image layers even if later deleted
|
||||||
|
- CI masking (::add-mask::) is often overlooked
|
||||||
|
- shellcheck, statix, gitleaks provide structured JSON output for integration
|
||||||
|
|
||||||
|
** Process Insights
|
||||||
|
- orch consensus is valuable for pressure-testing designs
|
||||||
|
- High temp for brainstorming, low temp for analysis decisions
|
||||||
|
- bd hierarchical children (.1, .2, etc.) work well for epic breakdown
|
||||||
|
|
||||||
|
** Architectural Insights
|
||||||
|
- Linter-first hybrid is emerging pattern (doc-review also uses this)
|
||||||
|
- Lens boundaries must be explicit to avoid duplicate findings
|
||||||
|
- Platform-specific remediation matters (sops-nix vs BuildKit secrets)
|
||||||
|
|
||||||
|
* Context for Future Work
|
||||||
|
|
||||||
|
** Open Questions
|
||||||
|
- Should ops-review have its own lens directory or share with code-review?
|
||||||
|
- How to handle cross-repo awareness (dotfiles uses sops, prox-setup uses passage)?
|
||||||
|
- Should we run linters in parallel before LLM pass?
|
||||||
|
|
||||||
|
** Next Steps
|
||||||
|
- Complete Phase 1 lenses: shell-safety, blast-radius, privilege
|
||||||
|
- Integration: add to flake.nix, update ai-skills.nix
|
||||||
|
- Validation: test on dotfiles and prox-setup repos
|
||||||
|
- Ensure gitleaks available (dotfiles-x2m)
|
||||||
|
|
||||||
|
** Related Work
|
||||||
|
- [[file:2025-12-28-code-review-skill-creation-worklog-cleanup.org][Code Review Skill Creation]] - Original lens pattern
|
||||||
|
- [[file:2025-12-04-doc-review-skill-design.org][Doc-Review Skill Design]] - Hybrid architecture precedent
|
||||||
|
- [[file:2025-12-26-multi-lens-code-review-workflow-testing.org][Multi-Lens Code Review Testing]] - LLM-in-the-loop pattern
|
||||||
|
|
||||||
|
* Raw Notes
|
||||||
|
- Dotfiles repo: 100+ Nix modules, 90+ shell scripts, SOPS secrets, Gitea Actions
|
||||||
|
- Prox-setup repo: 88 Python scripts (Proxmox API), 41 shell scripts, Docker Compose
|
||||||
|
- Models consulted: sonar, flash-or, gemini, gpt (all 4 supported the design)
|
||||||
|
- Key insight from GPT: "Require two signals for MED/HIGH when not using known token format"
|
||||||
|
- All models emphasized: don't flag Nix hashes (sha256-, narHash, vendorHash)
|
||||||
|
|
||||||
|
* Session Metrics
|
||||||
|
- Commits made: 0 (work in progress)
|
||||||
|
- Files created: 5
|
||||||
|
- Lines added: ~683 (plan.md + skill files + lens)
|
||||||
|
- bd issues created: 16 (1 epic + 14 children + 1 in dotfiles)
|
||||||
|
- orch consensus runs: 2
|
||||||
|
|
@ -15,7 +15,9 @@
|
||||||
availableSkills = [
|
availableSkills = [
|
||||||
"bd-issue-tracking"
|
"bd-issue-tracking"
|
||||||
"code-review"
|
"code-review"
|
||||||
|
"doc-review"
|
||||||
"niri-window-capture"
|
"niri-window-capture"
|
||||||
|
"ops-review"
|
||||||
"orch"
|
"orch"
|
||||||
"screenshot-latest"
|
"screenshot-latest"
|
||||||
"spec-review"
|
"spec-review"
|
||||||
|
|
|
||||||
|
|
@ -12,6 +12,7 @@ let
|
||||||
Available skills:
|
Available skills:
|
||||||
- code-review: Multi-lens code review with issue filing
|
- code-review: Multi-lens code review with issue filing
|
||||||
- niri-window-capture: Invisibly capture window screenshots
|
- niri-window-capture: Invisibly capture window screenshots
|
||||||
|
- ops-review: Multi-lens ops/infrastructure review
|
||||||
- screenshot-latest: Find latest screenshots
|
- screenshot-latest: Find latest screenshots
|
||||||
- tufte-press: Generate study card JSON
|
- tufte-press: Generate study card JSON
|
||||||
- worklog: Create org-mode worklogs
|
- worklog: Create org-mode worklogs
|
||||||
|
|
@ -94,6 +95,11 @@ in {
|
||||||
source = "${cfg.skillsPath}/code-review/lenses";
|
source = "${cfg.skillsPath}/code-review/lenses";
|
||||||
recursive = true;
|
recursive = true;
|
||||||
};
|
};
|
||||||
|
# Ops lenses in separate subdirectory
|
||||||
|
".config/lenses/ops" = {
|
||||||
|
source = "${cfg.skillsPath}/ops-review/lenses";
|
||||||
|
recursive = true;
|
||||||
|
};
|
||||||
})
|
})
|
||||||
|
|
||||||
# Workflows (beads protos)
|
# Workflows (beads protos)
|
||||||
|
|
|
||||||
121
skills/ops-review/README.md
Normal file
121
skills/ops-review/README.md
Normal file
|
|
@ -0,0 +1,121 @@
|
||||||
|
# ops-review
|
||||||
|
|
||||||
|
Multi-lens review for operational infrastructure. Finds security issues, shell script bugs, and reliability problems in your Nix configs, shell scripts, Docker files, and CI/CD pipelines.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
**Claude Code / OpenCode:**
|
||||||
|
```
|
||||||
|
/ops-review bin/deploy.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent reviews your ops files and presents findings for approval before filing any issues.
|
||||||
|
|
||||||
|
## What It Reviews
|
||||||
|
|
||||||
|
| Artifact | Examples |
|
||||||
|
|----------|----------|
|
||||||
|
| Nix/NixOS | flake.nix, modules/*.nix, home-manager configs |
|
||||||
|
| Shell Scripts | bin/*.sh, setup_*.sh, deploy.sh |
|
||||||
|
| Containers | Dockerfile, docker-compose.yml |
|
||||||
|
| CI/CD | .github/workflows/*.yml, .gitea/workflows/*.yml |
|
||||||
|
| Services | systemd units, cron jobs |
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
**Linter-first hybrid**: Static tools catch syntax issues, LLM finds semantic problems.
|
||||||
|
|
||||||
|
```
|
||||||
|
shellcheck ──┐
|
||||||
|
statix ───┼──► LLM interprets + finds logic bugs ──► Findings
|
||||||
|
hadolint ───┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Available Lenses
|
||||||
|
|
||||||
|
### Phase 1: Core Safety (quick mode)
|
||||||
|
- **secrets** - Hardcoded credentials, SOPS issues
|
||||||
|
- **shell-safety** - set -euo pipefail, quoting, error handling
|
||||||
|
- **blast-radius** - Destructive ops, missing dry-run
|
||||||
|
- **privilege** - Unnecessary sudo, root containers
|
||||||
|
|
||||||
|
### Phase 2: Reliability
|
||||||
|
- **idempotency** - Safe re-run, atomic operations
|
||||||
|
- **supply-chain** - Unpinned versions, missing hashes
|
||||||
|
- **observability** - Silent failures, missing health checks
|
||||||
|
|
||||||
|
### Phase 3: Architecture
|
||||||
|
- **nix-hygiene** - Dead code, anti-patterns
|
||||||
|
- **resilience** - Timeouts, retries, resource limits
|
||||||
|
- **orchestration** - Execution order, prerequisites
|
||||||
|
|
||||||
|
## Usage Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Review a single script
|
||||||
|
/ops-review deploy.sh
|
||||||
|
|
||||||
|
# Review a directory
|
||||||
|
/ops-review bin/
|
||||||
|
|
||||||
|
# Quick mode (Phase 1 only, fast)
|
||||||
|
/ops-review --quick bin/
|
||||||
|
|
||||||
|
# Review recent changes
|
||||||
|
/ops-review
|
||||||
|
```
|
||||||
|
|
||||||
|
## Example Output
|
||||||
|
|
||||||
|
```
|
||||||
|
## Review Summary: bin/deploy.sh
|
||||||
|
|
||||||
|
| Severity | Count |
|
||||||
|
|----------|-------|
|
||||||
|
| HIGH | 2 |
|
||||||
|
| MED | 3 |
|
||||||
|
|
||||||
|
### Top Issues
|
||||||
|
|
||||||
|
1. [SECRETS] HIGH bin/deploy.sh:45
|
||||||
|
Issue: API token passed as CLI argument
|
||||||
|
Suggest: Use environment variable instead
|
||||||
|
|
||||||
|
2. [BLAST-RADIUS] HIGH bin/deploy.sh:78
|
||||||
|
Issue: rm -rf with variable that could be empty
|
||||||
|
Suggest: Add guard: [ -n "$DIR" ] || exit 1
|
||||||
|
|
||||||
|
Would you like me to file any of these as beads issues?
|
||||||
|
```
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
For full functionality, install these linters:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# NixOS (add to configuration.nix or home-manager)
|
||||||
|
shellcheck
|
||||||
|
statix
|
||||||
|
deadnix
|
||||||
|
hadolint
|
||||||
|
```
|
||||||
|
|
||||||
|
The skill works without them but provides richer analysis with linter output.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
No configuration required. The skill auto-detects file types and applies appropriate lenses.
|
||||||
|
|
||||||
|
## Integration
|
||||||
|
|
||||||
|
- **Issue Tracking**: Files findings as beads issues (`bd create`)
|
||||||
|
- **CI/CD**: Use `--quick` mode for pre-commit/pipeline gates
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [code-review](../code-review/README.md) - Application code review
|
||||||
|
- [doc-review](../doc-review/README.md) - Documentation quality
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
246
skills/ops-review/SKILL.md
Normal file
246
skills/ops-review/SKILL.md
Normal file
|
|
@ -0,0 +1,246 @@
|
||||||
|
---
|
||||||
|
name: ops-review
|
||||||
|
description: Run multi-lens ops review on infrastructure files. Analyzes Nix, shell scripts, Docker, CI/CD for secrets, shell-safety, blast-radius, privilege, idempotency, supply-chain, observability, nix-hygiene, resilience, and orchestration. Interactive - asks before filing issues.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Ops Review Skill
|
||||||
|
|
||||||
|
Run focused infrastructure analysis using multiple review lenses. Uses a linter-first hybrid approach: static tools for syntax, LLM for semantics. Findings are synthesized and presented for your approval before any issues are filed.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
Invoke this skill when:
|
||||||
|
- "Review my infrastructure"
|
||||||
|
- "Run ops review on bin/"
|
||||||
|
- "Check this script for issues"
|
||||||
|
- "Analyze my Nix configs"
|
||||||
|
- `/ops-review`
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
The skill accepts an optional target:
|
||||||
|
- `/ops-review` - Reviews recently changed ops files (git diff)
|
||||||
|
- `/ops-review bin/` - Reviews specific directory
|
||||||
|
- `/ops-review deploy.sh` - Reviews specific file
|
||||||
|
- `/ops-review --quick` - Phase 1 lenses only (fast, <30s)
|
||||||
|
|
||||||
|
## Target Artifacts
|
||||||
|
|
||||||
|
| Category | File Patterns |
|
||||||
|
|----------|---------------|
|
||||||
|
| Nix/NixOS | `*.nix`, `flake.nix`, `flake.lock` |
|
||||||
|
| Shell Scripts | `*.sh`, files with `#!/bin/bash` shebang |
|
||||||
|
| Python Automation | `*.py` in ops contexts (scripts/, setup/, deploy/) |
|
||||||
|
| Container Configs | `Dockerfile`, `docker-compose.yml`, `*.dockerfile` |
|
||||||
|
| CI/CD | `.github/workflows/*.yml`, `.gitea/workflows/*.yml` |
|
||||||
|
| Service Configs | `*.service`, `*.timer`, systemd units |
|
||||||
|
| Secrets | `.sops.yaml`, `secrets.yaml`, SOPS-encrypted files |
|
||||||
|
|
||||||
|
## Architecture: Linter-First Hybrid
|
||||||
|
|
||||||
|
```
|
||||||
|
Stage 1: Static Tools (fast, deterministic)
|
||||||
|
├── shellcheck for shell scripts
|
||||||
|
├── statix + deadnix for Nix
|
||||||
|
├── hadolint for Dockerfiles
|
||||||
|
└── yamllint for YAML configs
|
||||||
|
|
||||||
|
Stage 2: LLM Analysis (semantic, contextual)
|
||||||
|
├── Interprets tool output in context
|
||||||
|
├── Finds logic bugs tools miss
|
||||||
|
├── Synthesizes cross-file issues
|
||||||
|
└── Suggests actionable fixes
|
||||||
|
```
|
||||||
|
|
||||||
|
## Available Lenses
|
||||||
|
|
||||||
|
Lenses are focused review prompts located in `~/.config/lenses/ops/`:
|
||||||
|
|
||||||
|
### Phase 1: Core Safety (--quick mode)
|
||||||
|
|
||||||
|
| Lens | Focus |
|
||||||
|
|------|-------|
|
||||||
|
| `secrets.md` | Hardcoded credentials, SOPS issues, secrets in logs |
|
||||||
|
| `shell-safety.md` | set -euo pipefail, quoting, error handling (shellcheck-backed) |
|
||||||
|
| `blast-radius.md` | Destructive ops, missing dry-run, no rollback |
|
||||||
|
| `privilege.md` | Unnecessary sudo, root containers, chmod 777 |
|
||||||
|
|
||||||
|
### Phase 2: Reliability
|
||||||
|
|
||||||
|
| Lens | Focus |
|
||||||
|
|------|-------|
|
||||||
|
| `idempotency.md` | Safe re-run, existence checks, atomic operations |
|
||||||
|
| `supply-chain.md` | Unpinned versions, missing SRI hashes, action SHAs |
|
||||||
|
| `observability.md` | Silent failures, missing health checks, no logging |
|
||||||
|
|
||||||
|
### Phase 3: Architecture
|
||||||
|
|
||||||
|
| Lens | Focus |
|
||||||
|
|------|-------|
|
||||||
|
| `nix-hygiene.md` | Dead code, anti-patterns, module boundaries (statix-backed) |
|
||||||
|
| `resilience.md` | Timeouts, retries, graceful shutdown, resource limits |
|
||||||
|
| `orchestration.md` | Execution order, prerequisites, implicit coupling |
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### Phase 1: Target Selection
|
||||||
|
1. Parse the target argument (default: git diff of uncommitted ops files)
|
||||||
|
2. Identify files by category (Nix, shell, Docker, etc.)
|
||||||
|
3. Show file list to user for confirmation
|
||||||
|
|
||||||
|
### Phase 2: Pre-Pass (Static Tools)
|
||||||
|
Run appropriate linters based on file type:
|
||||||
|
```bash
|
||||||
|
# Shell scripts
|
||||||
|
shellcheck --format=json script.sh
|
||||||
|
|
||||||
|
# Nix files
|
||||||
|
statix check --format=json file.nix
|
||||||
|
deadnix --output-format=json file.nix
|
||||||
|
|
||||||
|
# Dockerfiles
|
||||||
|
hadolint --format json Dockerfile
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 3: Lens Execution
|
||||||
|
For each lens, analyze the target files with tool output in context:
|
||||||
|
|
||||||
|
1. Read the lens prompt from `~/.config/lenses/ops/{lens}.md`
|
||||||
|
2. Include relevant linter output as evidence
|
||||||
|
3. Apply the lens to find semantic issues tools miss
|
||||||
|
4. Collect findings in structured format
|
||||||
|
|
||||||
|
**Finding Format:**
|
||||||
|
```
|
||||||
|
[TAG] <severity:HIGH|MED|LOW> <file:line>
|
||||||
|
Issue: <what's wrong>
|
||||||
|
Suggest: <how to fix>
|
||||||
|
Evidence: <why it matters>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 4: Synthesis
|
||||||
|
After all lenses complete:
|
||||||
|
1. Deduplicate overlapping findings (same issue from multiple lenses)
|
||||||
|
2. Group related issues
|
||||||
|
3. Rank by severity and confidence
|
||||||
|
4. Generate summary report
|
||||||
|
|
||||||
|
### Phase 5: Interactive Review
|
||||||
|
Present findings to user:
|
||||||
|
1. Show executive summary (counts by severity)
|
||||||
|
2. List top issues with details
|
||||||
|
3. Ask: "Which findings should I file as issues?"
|
||||||
|
|
||||||
|
**User can respond:**
|
||||||
|
- "File all" - creates beads issues for everything
|
||||||
|
- "File HIGH only" - filters by severity
|
||||||
|
- "File 1, 3, 5" - specific findings
|
||||||
|
- "None" - just keep the report
|
||||||
|
- "Let me review first" - show full details
|
||||||
|
|
||||||
|
### Phase 6: Issue Filing (if requested)
|
||||||
|
For approved findings:
|
||||||
|
1. Create beads issues with `bd create`
|
||||||
|
2. Include lens tag, severity, file location
|
||||||
|
3. Link related issues if applicable
|
||||||
|
|
||||||
|
## Output
|
||||||
|
|
||||||
|
The skill produces:
|
||||||
|
1. **Console summary** - immediate feedback
|
||||||
|
2. **Beads issues** - if user approves filing
|
||||||
|
|
||||||
|
## Severity Rubric
|
||||||
|
|
||||||
|
| Severity | Criteria |
|
||||||
|
|----------|----------|
|
||||||
|
| **HIGH** | Exploitable vulnerability, data loss risk, will break on next run |
|
||||||
|
| **MED** | Reliability issue, tech debt, violation of best practice |
|
||||||
|
| **LOW** | Polish, maintainability, defense-in-depth improvement |
|
||||||
|
|
||||||
|
Context matters: same issue may be HIGH in production, LOW in homelab.
|
||||||
|
|
||||||
|
## Example Session
|
||||||
|
|
||||||
|
```
|
||||||
|
User: /ops-review bin/deploy.sh
|
||||||
|
|
||||||
|
Agent: I'll review bin/deploy.sh with ops lenses.
|
||||||
|
|
||||||
|
[Running shellcheck...]
|
||||||
|
[Running secrets lens...]
|
||||||
|
[Running shell-safety lens...]
|
||||||
|
[Running blast-radius lens...]
|
||||||
|
[Running privilege lens...]
|
||||||
|
|
||||||
|
## Review Summary: bin/deploy.sh
|
||||||
|
|
||||||
|
| Severity | Count |
|
||||||
|
|----------|-------|
|
||||||
|
| HIGH | 2 |
|
||||||
|
| MED | 3 |
|
||||||
|
| LOW | 1 |
|
||||||
|
|
||||||
|
### Top Issues
|
||||||
|
|
||||||
|
1. [SECRETS] HIGH bin/deploy.sh:45
|
||||||
|
Issue: API token passed as command-line argument (visible in process list)
|
||||||
|
Suggest: Use environment variable or file with restricted permissions
|
||||||
|
|
||||||
|
2. [BLAST-RADIUS] HIGH bin/deploy.sh:78
|
||||||
|
Issue: rm -rf with variable that could be empty
|
||||||
|
Suggest: Add guard: [ -n "$DIR" ] || exit 1
|
||||||
|
|
||||||
|
3. [SHELL-SAFETY] MED bin/deploy.sh:12
|
||||||
|
Issue: Missing 'set -euo pipefail'
|
||||||
|
Suggest: Add at top of script for fail-fast behavior
|
||||||
|
|
||||||
|
Would you like me to file any of these as beads issues?
|
||||||
|
Options: all, HIGH only, specific numbers (1,2,3), or none
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Mode
|
||||||
|
|
||||||
|
Use `--quick` for fast pre-commit checks:
|
||||||
|
- Runs only Phase 1 lenses (secrets, shell-safety, blast-radius, privilege)
|
||||||
|
- Target: <30 seconds
|
||||||
|
- Ideal for CI gates
|
||||||
|
|
||||||
|
## Cross-File Awareness
|
||||||
|
|
||||||
|
Before review, build a reference map:
|
||||||
|
- **Shell**: `source`, `.` includes, invoked scripts
|
||||||
|
- **Nix**: imports, flake inputs
|
||||||
|
- **CI**: referenced scripts, env vars, secrets names
|
||||||
|
- **Compose**: service dependencies, volumes, env files
|
||||||
|
- **systemd**: ExecStart targets, dependencies
|
||||||
|
|
||||||
|
This enables finding issues in the seams between components.
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
1. **Linter-First** - Always run static tools before LLM analysis
|
||||||
|
2. **Evidence Over Opinion** - Cite linter output and specific lines
|
||||||
|
3. **Actionable Suggestions** - Every finding needs a clear fix
|
||||||
|
4. **Respect User Time** - Summarize first, details on request
|
||||||
|
5. **No Spam** - Don't file issues without explicit approval
|
||||||
|
6. **Context Matters** - Homelab ≠ production severity
|
||||||
|
|
||||||
|
## Process Checklist
|
||||||
|
|
||||||
|
1. [ ] Parse target (files/directory/diff)
|
||||||
|
2. [ ] Confirm scope with user if large (>10 files)
|
||||||
|
3. [ ] Run static tools (shellcheck, statix, etc.)
|
||||||
|
4. [ ] Build reference map for cross-file awareness
|
||||||
|
5. [ ] Run each lens, collecting findings
|
||||||
|
6. [ ] Deduplicate and rank findings
|
||||||
|
7. [ ] Present summary to user
|
||||||
|
8. [ ] Ask which findings to file
|
||||||
|
9. [ ] Create beads issues for approved findings
|
||||||
|
10. [ ] Report issue IDs created
|
||||||
|
|
||||||
|
## Integration
|
||||||
|
|
||||||
|
- **Lenses**: Read from `~/.config/lenses/ops/*.md`
|
||||||
|
- **Issue Tracking**: Uses `bd create` for beads issues
|
||||||
|
- **Static Tools**: shellcheck, statix, deadnix, hadolint
|
||||||
76
skills/ops-review/lenses/README.md
Normal file
76
skills/ops-review/lenses/README.md
Normal file
|
|
@ -0,0 +1,76 @@
|
||||||
|
# ops-review Lenses
|
||||||
|
|
||||||
|
Focused review prompts for operational infrastructure analysis.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
**Linter-first hybrid**: Each lens works with static tool output when available.
|
||||||
|
|
||||||
|
```
|
||||||
|
Static Tools (syntax) LLM Lens (semantics)
|
||||||
|
───────────────────── ───────────────────
|
||||||
|
shellcheck ──────────────► shell-safety.md
|
||||||
|
statix + deadnix ────────► nix-hygiene.md
|
||||||
|
hadolint ────────────────► (container checks)
|
||||||
|
gitleaks patterns ───────► secrets.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Available Lenses
|
||||||
|
|
||||||
|
### Phase 1: Core Safety
|
||||||
|
|
||||||
|
| Lens | Focus | Linter |
|
||||||
|
|------|-------|--------|
|
||||||
|
| [secrets.md](secrets.md) | Credentials, SOPS, secrets in logs | gitleaks |
|
||||||
|
| [shell-safety.md](shell-safety.md) | Error handling, quoting, pipefail | shellcheck |
|
||||||
|
| [blast-radius.md](blast-radius.md) | Destructive ops, rollback, dry-run | LLM-primary |
|
||||||
|
| [privilege.md](privilege.md) | Least privilege, sudo, capabilities | LLM-primary |
|
||||||
|
|
||||||
|
### Phase 2: Reliability
|
||||||
|
|
||||||
|
| Lens | Focus | Linter |
|
||||||
|
|------|-------|--------|
|
||||||
|
| [idempotency.md](idempotency.md) | Safe re-run, atomic ops | LLM-primary |
|
||||||
|
| [supply-chain.md](supply-chain.md) | Pinning, SRI hashes, provenance | LLM-primary |
|
||||||
|
| [observability.md](observability.md) | Logging, health checks, metrics | LLM-primary |
|
||||||
|
|
||||||
|
### Phase 3: Architecture
|
||||||
|
|
||||||
|
| Lens | Focus | Linter |
|
||||||
|
|------|-------|--------|
|
||||||
|
| [nix-hygiene.md](nix-hygiene.md) | Dead code, anti-patterns, modules | statix, deadnix |
|
||||||
|
| [resilience.md](resilience.md) | Timeouts, retries, limits | LLM-primary |
|
||||||
|
| [orchestration.md](orchestration.md) | Ordering, prerequisites, coupling | LLM-primary |
|
||||||
|
|
||||||
|
## Lens Boundaries
|
||||||
|
|
||||||
|
To avoid duplicate findings:
|
||||||
|
|
||||||
|
| Lens | Owns | Does NOT Own |
|
||||||
|
|------|------|--------------|
|
||||||
|
| **idempotency** | Safe re-run, convergence, atomic writes | Rollback, retries |
|
||||||
|
| **resilience** | Runtime fault tolerance, timeouts, retries | Change safety, re-run |
|
||||||
|
| **blast-radius** | Change safety, dry-run, rollback, batching | Runtime behavior |
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
All lenses use consistent output:
|
||||||
|
|
||||||
|
```
|
||||||
|
[TAG] <severity:HIGH|MED|LOW> <file:line>
|
||||||
|
Issue: <what's wrong>
|
||||||
|
Suggest: <how to fix>
|
||||||
|
Evidence: <why it matters>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Severity Guidelines
|
||||||
|
|
||||||
|
| Severity | Criteria |
|
||||||
|
|----------|----------|
|
||||||
|
| **HIGH** | Exploitable vulnerability, data loss, will break on re-run |
|
||||||
|
| **MED** | Reliability issue, tech debt, best practice violation |
|
||||||
|
| **LOW** | Polish, maintainability, defense-in-depth |
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
Lenses are deployed to `~/.config/lenses/ops/` via home-manager.
|
||||||
67
skills/ops-review/lenses/blast-radius.md
Normal file
67
skills/ops-review/lenses/blast-radius.md
Normal file
|
|
@ -0,0 +1,67 @@
|
||||||
|
# Blast Radius Review Lens
|
||||||
|
|
||||||
|
Review operational scripts for **change safety, risk containment, and reversibility**.
|
||||||
|
|
||||||
|
## What to Look For
|
||||||
|
|
||||||
|
### Targeting & Scoping
|
||||||
|
- Wrong or ambient context: relying on current kubectl context, AWS profile, gcloud project
|
||||||
|
- Missing explicit flags: `--namespace`, `--context`, `--region`, `--project`
|
||||||
|
- No environment gates: prod operations without `CONFIRM_PROD=1` or `--env prod`
|
||||||
|
- Hardcoded production targets without verification
|
||||||
|
|
||||||
|
### Destructive Operations
|
||||||
|
- `rm -rf`, `DROP TABLE`, `docker system prune` without confirmation
|
||||||
|
- Empty variable expansion: `rm -rf $DIR/` when DIR could be empty (use `${DIR:?}`)
|
||||||
|
- Bulk deletes without limits or batching
|
||||||
|
- Operations that cannot be undone without backup/snapshot first
|
||||||
|
|
||||||
|
### Missing Dry-Run Mode
|
||||||
|
- Scripts that modify state without `--dry-run` or `--check` flag
|
||||||
|
- No preview before execution (`kubectl diff`, `terraform plan`)
|
||||||
|
- Destructive defaults (should require explicit `--apply` or `--force`)
|
||||||
|
|
||||||
|
### Rollback & Recovery
|
||||||
|
- No backup/snapshot before risky changes
|
||||||
|
- Missing rollback instructions or automation
|
||||||
|
- Note: Nix/NixOS has generation rollback - verify scripts use `nixos-rebuild` properly
|
||||||
|
- Database migrations without down/rollback path
|
||||||
|
|
||||||
|
### Pre-flight Checks
|
||||||
|
- Missing connectivity/auth verification before bulk operations
|
||||||
|
- No target verification (`kubectl config current-context`, `aws sts get-caller-identity`)
|
||||||
|
- Missing dependency checks (required tools, permissions, disk space)
|
||||||
|
|
||||||
|
### Bulk Operations
|
||||||
|
- All-at-once without batching or progressive rollout
|
||||||
|
- No pause/resume capability for long-running operations
|
||||||
|
- Missing locking to prevent concurrent runs (`flock`)
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
[BLAST] <severity:HIGH|MED|LOW> <file:line>
|
||||||
|
Issue: <what could go wrong>
|
||||||
|
Scope: <single file | service | host | fleet | cluster>
|
||||||
|
Suggest: <add dry-run, confirmation, backup, scoping, etc.>
|
||||||
|
Evidence: <destructive command or pattern identified>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Mitigations That Reduce Severity
|
||||||
|
|
||||||
|
If these are present, consider downgrading:
|
||||||
|
- Explicit backup/snapshot step immediately prior
|
||||||
|
- Dry-run/plan output with explicit apply gate
|
||||||
|
- Narrow scope (specific namespace, labeled resources)
|
||||||
|
- Confirmation prompt for interactive use
|
||||||
|
- Running on ephemeral/test resources
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
- **HIGH** = data loss or outage, broad scope, no recovery path
|
||||||
|
- **MED** = risky operation without safety nets, narrow scope
|
||||||
|
- **LOW** = missing best practice, ephemeral/test targets
|
||||||
|
- Focus on *implications*: what's the worst case? Can we recover?
|
||||||
|
- Context matters: `rm -rf /tmp/cache` is LOW, `rm -rf /data/$VAR` is HIGH
|
||||||
|
- Consider: unattended (cron/CI) operations need stricter gates
|
||||||
|
- Nix/NixOS: acknowledge generation rollback when applicable
|
||||||
93
skills/ops-review/lenses/privilege.md
Normal file
93
skills/ops-review/lenses/privilege.md
Normal file
|
|
@ -0,0 +1,93 @@
|
||||||
|
# Privilege Review Lens
|
||||||
|
|
||||||
|
Review operational infrastructure for **least-privilege violations and excessive permissions**.
|
||||||
|
|
||||||
|
## What to Look For
|
||||||
|
|
||||||
|
### Root & Sudo Usage
|
||||||
|
- Scripts running as root when not necessary
|
||||||
|
- `sudo` for operations that don't require it
|
||||||
|
- `curl ... | sudo bash` - dangerous remote execution pattern
|
||||||
|
- `NOPASSWD` sudo rules with broad commands or wildcards
|
||||||
|
- Missing privilege drop after initial setup
|
||||||
|
|
||||||
|
### Container Privileges
|
||||||
|
- Containers running as root (`USER` not set)
|
||||||
|
- `privileged: true` in Docker/Compose/Kubernetes
|
||||||
|
- Docker socket mounting (`/var/run/docker.sock`)
|
||||||
|
- Missing capability drops (`--cap-drop=ALL`)
|
||||||
|
- Host namespace usage: `--pid=host`, `--network=host`, `--ipc=host`
|
||||||
|
- K8s: `allowPrivilegeEscalation: true`, `hostPath` mounts, missing `runAsNonRoot`
|
||||||
|
|
||||||
|
### File & Binary Permissions
|
||||||
|
- `chmod 777` or `chmod 666` (world-writable)
|
||||||
|
- Secrets/keys with permissions broader than `0600`
|
||||||
|
- setuid/setgid bits on custom binaries (`chmod u+s`, `chmod g+s`)
|
||||||
|
- Writable paths in root's `$PATH` or systemd unit locations
|
||||||
|
|
||||||
|
### Network Binding
|
||||||
|
- Services binding `0.0.0.0` when `127.0.0.1` suffices
|
||||||
|
- Database/admin ports exposed globally in Docker Compose
|
||||||
|
- Binding low ports (<1024) as root instead of using capabilities
|
||||||
|
|
||||||
|
### systemd Sandboxing
|
||||||
|
- Missing `ProtectSystem=strict` (or `full` if strict breaks app)
|
||||||
|
- Missing `ProtectHome=yes`, `PrivateTmp=yes`, `NoNewPrivileges=yes`
|
||||||
|
- `User=root` when service could run unprivileged
|
||||||
|
- Missing `CapabilityBoundingSet=` restrictions
|
||||||
|
- For low ports: use `AmbientCapabilities=CAP_NET_BIND_SERVICE` instead of root
|
||||||
|
|
||||||
|
### Nix/NixOS Specific
|
||||||
|
- Secrets in Nix store (world-readable!) - use sops-nix/agenix instead
|
||||||
|
- Services without `DynamicUser=yes` when applicable
|
||||||
|
- Missing `StateDirectory=`, `CacheDirectory=` (proper isolation)
|
||||||
|
- Overly permissive `security.sudo.extraRules`
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
[PRIVILEGE] <severity:HIGH|MED|LOW> <file:line>
|
||||||
|
Issue: <what excessive permission exists>
|
||||||
|
Suggest: <specific least-privilege alternative>
|
||||||
|
Evidence: <permission pattern or config found>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Compensating Controls
|
||||||
|
|
||||||
|
Downgrade severity if these are present:
|
||||||
|
- Container: `cap_drop=ALL` + specific `cap_add`, `read_only=true`, `no-new-privileges`
|
||||||
|
- systemd: `ProtectSystem`, `PrivateTmp`, capability restrictions
|
||||||
|
- Explicit justification comment for necessary privileges
|
||||||
|
|
||||||
|
## Common Fixes
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Docker Compose: Least privilege
|
||||||
|
user: "1000:1000"
|
||||||
|
read_only: true
|
||||||
|
security_opt:
|
||||||
|
- no-new-privileges:true
|
||||||
|
cap_drop: [ALL]
|
||||||
|
cap_add: [NET_BIND_SERVICE] # only what's needed
|
||||||
|
```
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# systemd: Hardened service
|
||||||
|
[Service]
|
||||||
|
User=myservice
|
||||||
|
ProtectSystem=strict
|
||||||
|
ProtectHome=yes
|
||||||
|
PrivateTmp=yes
|
||||||
|
NoNewPrivileges=yes
|
||||||
|
CapabilityBoundingSet=
|
||||||
|
AmbientCapabilities=CAP_NET_BIND_SERVICE # for low ports
|
||||||
|
```
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
- **HIGH** = root/privileged without justification, docker.sock mount, world-writable sensitive files
|
||||||
|
- **MED** = missing sandboxing, broad sudo, root with some restrictions
|
||||||
|
- **LOW** = could be tighter but has compensating controls
|
||||||
|
- Ask: "What's the minimum permission needed?"
|
||||||
|
- Consider compensating controls before flagging HIGH
|
||||||
|
- Nix store is world-readable - secrets there are HIGH severity
|
||||||
53
skills/ops-review/lenses/secrets.md
Normal file
53
skills/ops-review/lenses/secrets.md
Normal file
|
|
@ -0,0 +1,53 @@
|
||||||
|
# Secrets Review Lens
|
||||||
|
|
||||||
|
Review operational infrastructure for **credential exposure and secrets hygiene**.
|
||||||
|
|
||||||
|
## What to Look For
|
||||||
|
|
||||||
|
### Hardcoded Credentials & Store Leaks
|
||||||
|
- API keys, tokens, passwords in source files
|
||||||
|
- SSH private keys (`BEGIN PRIVATE KEY`, `BEGIN RSA PRIVATE KEY`)
|
||||||
|
- **Nix**: Secrets in `.nix` strings, `writeText`, `environment.etc.*.text` (world-readable in /nix/store)
|
||||||
|
- **Docker**: Secrets in `ENV` or `ARG` instructions (persist in image layers/history)
|
||||||
|
|
||||||
|
### Secrets in Unsafe Channels
|
||||||
|
- Credentials passed as CLI arguments (visible in `ps`)
|
||||||
|
- Secrets in `export` statements in shell scripts
|
||||||
|
- Tokens in URLs, query parameters, or connection strings
|
||||||
|
- Docker `build-arg` for sensitive values
|
||||||
|
|
||||||
|
### Logging & CI Exposure
|
||||||
|
- `set -x` in scripts that handle credentials
|
||||||
|
- Secrets echoed to stdout/stderr or logs
|
||||||
|
- Missing CI secret masking (GitHub `::add-mask::`, GitLab masked vars)
|
||||||
|
- Debug flags that leak secrets (`curl -v`, `--debug`)
|
||||||
|
|
||||||
|
### SOPS & Encryption Issues
|
||||||
|
- Plaintext files that should use SOPS (secrets.yaml, credentials.json)
|
||||||
|
- Missing `.sops.yaml` when encrypted files present
|
||||||
|
- Overly broad SOPS `creation_rules` access
|
||||||
|
|
||||||
|
## Linter Integration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gitleaks detect --source . --report-format json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
[SECRETS] <severity:HIGH|MED|LOW> <file:line>
|
||||||
|
Issue: <what credential is exposed and via what channel>
|
||||||
|
Suggest: <sops-nix, Docker BuildKit secrets, env file with 0600, etc.>
|
||||||
|
Evidence: <pattern match or context>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
- **HIGH** = credential in repo, Nix store, Docker layer, or logs
|
||||||
|
- **MED** = credential in risky channel (CLI arg, build arg, unmasked CI)
|
||||||
|
- **LOW** = missing encryption best practice
|
||||||
|
- **Keywords**: `*_KEY`, `*_TOKEN`, `*_SECRET`, `*_PASSWORD`, `*_CREDENTIAL`
|
||||||
|
- **Ignore**: Nix hashes (`sha256-`, `narHash`, `vendorHash`), public keys, checksums, UUIDs, placeholders (`REPLACE_ME`, `changeme`, `example`)
|
||||||
|
- **Nix remediation**: Use `sops-nix` or `agenix`, reference via runtime paths not embedded strings
|
||||||
|
- **Docker remediation**: Use BuildKit `--mount=type=secret`, avoid `ENV` for secrets
|
||||||
76
skills/ops-review/lenses/shell-safety.md
Normal file
76
skills/ops-review/lenses/shell-safety.md
Normal file
|
|
@ -0,0 +1,76 @@
|
||||||
|
# Shell Safety Review Lens
|
||||||
|
|
||||||
|
Review shell scripts for **robustness, error handling, and defensive patterns**.
|
||||||
|
|
||||||
|
## What to Look For
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
- Missing error strategy: `set -euo pipefail` OR explicit checks per command
|
||||||
|
- Note: `set -e` has edge cases (conditionals, `||`, subshells) - explicit checks often safer
|
||||||
|
- Unchecked return codes from critical operations (file ops, network, root commands)
|
||||||
|
- Missing `trap` for cleanup on exit/error
|
||||||
|
- Pipes hiding exit codes without `pipefail` or `PIPESTATUS` checks
|
||||||
|
|
||||||
|
### Variable & Input Safety
|
||||||
|
- Unquoted variables in commands (SC2086: word splitting)
|
||||||
|
- Variables used before assignment or without defaults (`${VAR:-default}`)
|
||||||
|
- Missing input validation: required args, file existence, numeric checks
|
||||||
|
- `read` without `IFS= read -r` (SC2162: backslash/whitespace bugs)
|
||||||
|
|
||||||
|
### Command Safety
|
||||||
|
- Unsafe `cd` without checking: use `cd dir || exit 1`
|
||||||
|
- `rm -rf` with unguarded variables: use `${VAR:?}` or explicit checks
|
||||||
|
- Dangerous primitives: `eval`, `source` of non-constant paths, `curl | sh`
|
||||||
|
- Missing `--` to separate options from arguments
|
||||||
|
|
||||||
|
### Temp Files & Atomicity
|
||||||
|
- Hardcoded temp paths (`/tmp/foo`) instead of `mktemp`
|
||||||
|
- Predictable temp names (`/tmp/script.$$`) - use `mktemp -d`
|
||||||
|
- Missing cleanup of temp files on exit
|
||||||
|
|
||||||
|
## Linter Integration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
shellcheck -x --format=json --severity=style script.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Key codes for ops safety:
|
||||||
|
- SC2086: Double quote to prevent splitting
|
||||||
|
- SC2164: Use `cd ... || exit`
|
||||||
|
- SC2015: `A && B || C` logic error (C runs if B fails too)
|
||||||
|
- SC2162: `read` without `-r`
|
||||||
|
- SC2155: Declare and assign separately (masked return values)
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
[SHELL] <severity:HIGH|MED|LOW> <file:line>
|
||||||
|
Issue: <what's unsafe and why>
|
||||||
|
Suggest: <specific fix with code example>
|
||||||
|
Evidence: <shellcheck code or pattern matched>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Fixes
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Safe rm pattern
|
||||||
|
: "${TARGET:?TARGET must be set}"
|
||||||
|
rm -rf -- "$TARGET"
|
||||||
|
|
||||||
|
# Safe cd pattern
|
||||||
|
cd -- "$dir" || { echo "cd failed: $dir" >&2; exit 1; }
|
||||||
|
|
||||||
|
# Safe read loop
|
||||||
|
while IFS= read -r line; do
|
||||||
|
...
|
||||||
|
done < "$file"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Guidelines
|
||||||
|
|
||||||
|
- **HIGH** = data loss risk, silent failure, or injection vector
|
||||||
|
- **MED** = defensive pattern missing, potential edge-case bugs
|
||||||
|
- **LOW** = style, portability, maintainability
|
||||||
|
- Respect shell dialect: `local`, `[[ ]]`, `pipefail` are bash-only
|
||||||
|
- Prioritize scripts running as root or handling sensitive operations
|
||||||
|
- Consider: will this break if run twice? With empty input? As cron job?
|
||||||
260
specs/ops-review/plan.md
Normal file
260
specs/ops-review/plan.md
Normal file
|
|
@ -0,0 +1,260 @@
|
||||||
|
# ops-review Skill Design
|
||||||
|
|
||||||
|
A multi-lens review skill for operational infrastructure, modeled on code-review.
|
||||||
|
|
||||||
|
## Problem Statement
|
||||||
|
|
||||||
|
Ops artifacts (Nix configs, shell scripts, Python automation, Docker Compose, CI/CD) accumulate technical debt and security issues just like application code. Unlike code, they rarely get systematic review.
|
||||||
|
|
||||||
|
## Target Artifacts
|
||||||
|
|
||||||
|
Based on actual infrastructure in dotfiles and prox-setup:
|
||||||
|
|
||||||
|
| Category | Examples |
|
||||||
|
|----------|----------|
|
||||||
|
| **Nix/NixOS** | flake.nix, modules/*.nix, home-manager configs |
|
||||||
|
| **Shell Scripts** | bin/*.sh, setup_*.sh, fix_*.sh, deploy.sh |
|
||||||
|
| **Python Automation** | Proxmox API scripts, multi-stage deployments |
|
||||||
|
| **Container Configs** | docker-compose.yml, Dockerfile |
|
||||||
|
| **CI/CD** | .gitea/workflows/*.yml, .github/actions/*.yml |
|
||||||
|
| **Service Configs** | systemd units, Ory configs, SOPS files |
|
||||||
|
|
||||||
|
## Architecture: Linter-First Hybrid
|
||||||
|
|
||||||
|
**Consensus from model review**: Use deterministic tools as primary signals, LLM for interpretation and semantic analysis.
|
||||||
|
|
||||||
|
```
|
||||||
|
Stage 1: Static Tools (fast, deterministic)
|
||||||
|
├── shellcheck for shell scripts
|
||||||
|
├── statix + deadnix for Nix
|
||||||
|
├── hadolint for Dockerfiles
|
||||||
|
└── yamllint for YAML configs
|
||||||
|
|
||||||
|
Stage 2: LLM Analysis (semantic, contextual)
|
||||||
|
├── Interprets tool output in context
|
||||||
|
├── Finds logic bugs tools miss
|
||||||
|
├── Synthesizes cross-file issues
|
||||||
|
└── Suggests actionable fixes
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why**: LLMs hallucinate syntax but excel at understanding intent and impact. Tools catch syntax but miss semantics.
|
||||||
|
|
||||||
|
## Proposed Lenses (10 total)
|
||||||
|
|
||||||
|
### Core Safety (Phase 1)
|
||||||
|
|
||||||
|
#### 1. secrets
|
||||||
|
**Focus**: Credential hygiene
|
||||||
|
- Hardcoded secrets, API keys, tokens
|
||||||
|
- SOPS config issues
|
||||||
|
- Secrets in logs or error messages
|
||||||
|
- Secrets passed via CLI args (visible in process list)
|
||||||
|
- Missing encryption for sensitive data
|
||||||
|
|
||||||
|
#### 2. shell-safety
|
||||||
|
**Focus**: Shell script robustness (backed by shellcheck)
|
||||||
|
- Missing `set -euo pipefail`
|
||||||
|
- Unquoted variables (SC2086)
|
||||||
|
- Unsafe command substitution
|
||||||
|
- Missing error handling
|
||||||
|
- Hardcoded paths that should be parameters
|
||||||
|
|
||||||
|
#### 3. blast-radius
|
||||||
|
**Focus**: Change safety and risk containment
|
||||||
|
- Destructive operations without confirmation
|
||||||
|
- Missing dry-run mode
|
||||||
|
- No rollback strategy
|
||||||
|
- Bulk operations without batching
|
||||||
|
- Missing pre-flight checks
|
||||||
|
- No canary/progressive approach
|
||||||
|
|
||||||
|
#### 4. privilege
|
||||||
|
**Focus**: Least privilege violations
|
||||||
|
- Unnecessary sudo/root usage
|
||||||
|
- Containers running as root
|
||||||
|
- Overly permissive file modes (chmod 777)
|
||||||
|
- Missing capability drops
|
||||||
|
- Docker socket mounting
|
||||||
|
- systemd units without sandboxing (ProtectSystem, PrivateTmp)
|
||||||
|
|
||||||
|
### Reliability (Phase 2)
|
||||||
|
|
||||||
|
#### 5. idempotency
|
||||||
|
**Focus**: Safe re-execution and convergence
|
||||||
|
- Scripts that break on re-run
|
||||||
|
- Missing existence checks (create-if-not-exists)
|
||||||
|
- Non-atomic operations (partial failure states)
|
||||||
|
- Check-then-act race conditions
|
||||||
|
- Missing cleanup on failure
|
||||||
|
|
||||||
|
#### 6. supply-chain
|
||||||
|
**Focus**: Dependency provenance and pinning
|
||||||
|
- Unpinned versions (`latest` tags, floating refs)
|
||||||
|
- GitHub/Gitea actions not pinned to SHA
|
||||||
|
- Missing Nix flake.lock or SRI hashes
|
||||||
|
- Unsigned artifacts
|
||||||
|
- Untrusted substituters/registries
|
||||||
|
|
||||||
|
#### 7. observability
|
||||||
|
**Focus**: Visibility into system state
|
||||||
|
- Silent failures (no logging/alerting)
|
||||||
|
- Missing health checks (Docker healthcheck, systemd ExecStartPre)
|
||||||
|
- Incomplete metrics coverage
|
||||||
|
- Missing structured logging
|
||||||
|
- No correlation IDs in multi-step scripts
|
||||||
|
|
||||||
|
### Architecture (Phase 3)
|
||||||
|
|
||||||
|
#### 8. nix-hygiene
|
||||||
|
**Focus**: Nix-specific quality (backed by statix/deadnix)
|
||||||
|
- Dead code (unused let bindings, imports)
|
||||||
|
- Anti-patterns (with lib abuse, IFD without justification)
|
||||||
|
- Module boundary violations
|
||||||
|
- Overlay/override issues
|
||||||
|
- Missing type annotations on options
|
||||||
|
|
||||||
|
#### 9. resilience
|
||||||
|
**Focus**: Runtime fault tolerance
|
||||||
|
- Missing timeouts on network calls
|
||||||
|
- No retries with backoff/jitter
|
||||||
|
- Missing circuit breakers for API calls
|
||||||
|
- No graceful shutdown handling (SIGTERM)
|
||||||
|
- Missing resource limits (systemd MemoryMax, Docker mem_limit)
|
||||||
|
|
||||||
|
#### 10. orchestration
|
||||||
|
**Focus**: Execution ordering and coupling (formerly dependency-chains)
|
||||||
|
- Unclear prerequisites
|
||||||
|
- Missing documentation of execution order
|
||||||
|
- Circular dependencies
|
||||||
|
- Scripts assuming prior state without checking
|
||||||
|
- Implicit coupling between components
|
||||||
|
|
||||||
|
## Crisp Boundaries
|
||||||
|
|
||||||
|
To avoid duplicate findings across overlapping lenses:
|
||||||
|
|
||||||
|
| Lens | Owns | Does NOT Own |
|
||||||
|
|------|------|--------------|
|
||||||
|
| **idempotency** | Safe re-run, convergence, atomic writes, create-if-exists | Rollback (blast-radius), retries (resilience) |
|
||||||
|
| **resilience** | Runtime fault tolerance, timeouts, retries, graceful shutdown | Change safety (blast-radius), re-run safety (idempotency) |
|
||||||
|
| **blast-radius** | Change safety, dry-run, rollback, confirmation gates, batching | Runtime behavior (resilience), re-run (idempotency) |
|
||||||
|
|
||||||
|
## Skill Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
skills/ops-review/
|
||||||
|
├── SKILL.md # Agent instructions (workflow)
|
||||||
|
├── README.md # User documentation
|
||||||
|
└── lenses/
|
||||||
|
├── README.md # Lens index
|
||||||
|
├── secrets.md
|
||||||
|
├── shell-safety.md
|
||||||
|
├── blast-radius.md
|
||||||
|
├── privilege.md
|
||||||
|
├── idempotency.md
|
||||||
|
├── supply-chain.md
|
||||||
|
├── observability.md
|
||||||
|
├── nix-hygiene.md
|
||||||
|
├── resilience.md
|
||||||
|
└── orchestration.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Lenses deploy to `~/.config/lenses/ops/` via home-manager.
|
||||||
|
|
||||||
|
## Workflow
|
||||||
|
|
||||||
|
### Standard Mode
|
||||||
|
1. **Target selection** - files/directory to review
|
||||||
|
2. **Pre-pass** - Run static tools (shellcheck, statix, etc.)
|
||||||
|
3. **Reference mapping** - Build lightweight call graph (source, imports, ExecStart)
|
||||||
|
4. **Lens execution** - One pass per lens, tool output in context
|
||||||
|
5. **Synthesis** - Dedupe across lenses, rank by severity
|
||||||
|
6. **Interactive review** - User approves findings
|
||||||
|
7. **Issue filing** - `bd create` for approved items
|
||||||
|
|
||||||
|
### Quick Mode (`--quick`)
|
||||||
|
Runs Phase 1 lenses only: secrets, shell-safety, blast-radius, privilege.
|
||||||
|
Ideal for pre-commit or CI gates.
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
Per-lens findings:
|
||||||
|
```
|
||||||
|
[LENS-TAG] <severity:HIGH|MED|LOW> <file:line>
|
||||||
|
Issue: <what's wrong>
|
||||||
|
Suggest: <how to fix>
|
||||||
|
Evidence: <why it matters>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Severity Rubric
|
||||||
|
|
||||||
|
| Severity | Criteria |
|
||||||
|
|----------|----------|
|
||||||
|
| **HIGH** | Exploitable vulnerability, data loss risk, or will break on next run |
|
||||||
|
| **MED** | Reliability issue, tech debt, or violation of best practice |
|
||||||
|
| **LOW** | Polish, maintainability, or defense-in-depth improvement |
|
||||||
|
|
||||||
|
Context matters: same issue may be HIGH in production, LOW in homelab.
|
||||||
|
|
||||||
|
## Cross-File Awareness
|
||||||
|
|
||||||
|
Build a simple reference map before review:
|
||||||
|
- **Shell**: `source`, `.` includes, invoked scripts
|
||||||
|
- **Nix**: imports, flake inputs
|
||||||
|
- **CI**: referenced scripts, env vars, secrets names
|
||||||
|
- **Compose**: service dependencies, volumes, env files
|
||||||
|
- **systemd**: ExecStart targets, dependencies
|
||||||
|
|
||||||
|
This enables finding issues in the seams between components.
|
||||||
|
|
||||||
|
## Implementation Phases
|
||||||
|
|
||||||
|
### Phase 1: Safety Net (High ROI, Low Ambiguity)
|
||||||
|
1. **secrets** - Non-negotiable, prevents catastrophes
|
||||||
|
2. **shell-safety** - Most brittle artifact type, shellcheck-backed
|
||||||
|
3. **blast-radius** - Where LLMs shine (understanding implications)
|
||||||
|
4. **privilege** - Highly actionable, high impact
|
||||||
|
|
||||||
|
### Phase 2: Reliability Layer
|
||||||
|
5. **idempotency** - Essential for setup/deploy scripts
|
||||||
|
6. **supply-chain** - Critical for reproducibility
|
||||||
|
7. **observability** - Easy to check, high debugging value
|
||||||
|
|
||||||
|
### Phase 3: Architecture Polish
|
||||||
|
8. **nix-hygiene** - statix/deadnix backed, LLM explains
|
||||||
|
9. **resilience** - Needs nuance to avoid bad advice
|
||||||
|
10. **orchestration** - Most complex, needs full context
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
1. **Linter-first, LLM-second**: Static tools for syntax, LLM for semantics
|
||||||
|
2. **Crisp lens boundaries**: Each rule has one primary owner
|
||||||
|
3. **Severity tied to impact**: Not all violations are equal
|
||||||
|
4. **Quick mode**: Phase 1 for pre-commit/CI
|
||||||
|
5. **Cross-file awareness**: Grep-based reference mapping
|
||||||
|
6. **Escape hatches**: Intentional patterns can be flagged + suppressed
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- Can review dotfiles/ and find real issues
|
||||||
|
- Can review prox-setup/ and find real issues
|
||||||
|
- Findings are actionable, not noise
|
||||||
|
- Phase 1 lenses have <10% false positive rate
|
||||||
|
- Integrates with existing bd issue tracking
|
||||||
|
- Quick mode runs in <30 seconds
|
||||||
|
|
||||||
|
## Open Questions (Resolved)
|
||||||
|
|
||||||
|
| Question | Resolution |
|
||||||
|
|----------|------------|
|
||||||
|
| Nix: statix/deadnix or pure LLM? | **Hybrid**: Tools first, LLM interprets |
|
||||||
|
| Shell: integrate shellcheck? | **Yes**: Treat as compiler, LLM groups/prioritizes |
|
||||||
|
| Multi-file dependencies? | **Grep-based reference map** pre-pass |
|
||||||
|
| Quick mode? | **Yes**: Phase 1 lenses only |
|
||||||
|
| Prioritize across artifact types? | **By risk**: secrets/destructive ops first, not file type |
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Google SRE Book](https://sre.google/sre-book/table-of-contents/)
|
||||||
|
- [OWASP Infrastructure Security](https://owasp.org/www-project-devsecops-guideline/)
|
||||||
|
- Consensus review: sonar, flash-or, gemini, gpt (2025-01-01)
|
||||||
Loading…
Reference in a new issue