dan/skills

dan fb882a9434 feat: add ops-review skill with Phase 1 lenses

Multi-lens review skill for operational infrastructure (Nix, shell,
Docker, CI/CD). Modeled on code-review with linter-first hybrid
architecture.

Phase 1 lenses (core safety):
- secrets: credential exposure, Nix store, Docker layers, CI masking
- shell-safety: shellcheck-backed, temp files, guard snippets
- blast-radius: targeting/scoping, dry-run, rollback
- privilege: least-privilege, containers, systemd sandboxing

Design reviewed via orch consensus (sonar, flash-or, gemini, gpt).
Lenses deploy to ~/.config/lenses/ops/ via home-manager.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-01 17:36:24 -08:00

7.2 KiB

Raw Blame History

ops-review Skill Design, Orch Consensus Planning, and Skeleton Implementation

Session Summary
- Date: 2026-01-01
- Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis
Accomplishments
Key Decisions
Problems & Solutions
Technical Details
Process and Workflow
- What Worked Well
- What Was Challenging
Learning and Insights
Context for Future Work
Raw Notes
Session Metrics

Session Summary

Date: 2026-01-01

Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis

Accomplishments

Explored dotfiles and prox-setup repos to understand actual ops artifact landscape
Designed ops-review skill with 10 lenses across 3 phases
Ran orch consensus (sonar, flash-or, gemini, gpt) on initial plan
Incorporated consensus feedback: linter-first hybrid architecture, crisp lens boundaries
Created comprehensive plan.md in specs/ops-review/
Created bd epic (skills-9cu) with 14 child tasks, proper dependency graph
Built skill skeleton: SKILL.md, README.md, lenses/README.md
Drafted secrets.md lens with orch consensus review
Incorporated Nix store exposure, Docker layer persistence, CI masking feedback
Filed follow-up issue in dotfiles for gitleaks availability (dotfiles-x2m)
Remaining Phase 1 lenses: shell-safety, blast-radius, privilege

Key Decisions

Decision 1: Linter-first hybrid architecture

Context: How should ops-review analyze infrastructure code?
Options considered:
1. Pure LLM analysis - flexible but prone to syntax hallucinations
2. Pure linter - deterministic but misses semantic issues
3. Hybrid: linters first, LLM interprets - best of both
Rationale: All 4 consensus models agreed LLMs hallucinate syntax but excel at understanding intent. Static tools catch syntax, LLM finds logic bugs.
Impact: Each lens integrates with specific tools (shellcheck, statix, gitleaks)

Decision 2: 10 lenses across 3 phases

Context: How many lenses and how to prioritize?
Initial proposal: 8 lenses
Consensus feedback: Add privilege (least-privilege) and supply-chain (pinning)
Phase 1 (quick mode): secrets, shell-safety, blast-radius, privilege
Phase 2: idempotency, supply-chain, observability
Phase 3: nix-hygiene, resilience, orchestration

Decision 3: Crisp lens boundaries to avoid duplicate findings

Problem: resilience/blast-radius/idempotency overlap
Solution: Define ownership table
- idempotency: safe re-run, convergence, atomic writes
- resilience: runtime fault tolerance, timeouts, retries
- blast-radius: change safety, dry-run, rollback

Decision 4: Nix-specific checks as first-class concerns

Context: Nix has unique security model (world-readable store)
Insight from consensus: Secrets in .nix strings become readable in /nix/store
Added to secrets lens: explicit Nix store exposure check
Remediation: sops-nix/agenix with runtime paths, not embedded strings

Problems & Solutions

Problem	Solution	Learning
Initial lens drafts too long (60+ lines)	Reference existing code-review lenses (~45 lines)	Consistent format matters for usability
Overlapping lens scopes	Created "Crisp Boundaries" table in plan	Define ownership explicitly upfront
What lenses are actually needed?	Explored real repos (dotfiles, prox-setup)	Ground design in actual artifacts
False positive risk in secrets lens	Added explicit exemptions (Nix hashes, public keys)	Two-signal rule for generic matches

Technical Details

Code Changes

Total files created: 5
Key files:
- `specs/ops-review/plan.md` (261 lines) - Comprehensive design document
- `skills/ops-review/SKILL.md` (188 lines) - Agent workflow instructions
- `skills/ops-review/README.md` (96 lines) - User documentation
- `skills/ops-review/lenses/README.md` (85 lines) - Lens index
- `skills/ops-review/lenses/secrets.md` (53 lines) - First lens

Commands Used

# Explored actual infrastructure repos
# (via Task tool with Explore subagent)

# Ran multi-model consensus for plan review
uv run orch consensus "Review this ops-review skill design..." sonar flash-or gemini gpt

# Created bd epic with hierarchical children
bd create "ops-review skill" --type=epic -p 1 --description "..."
bd create "Lens: secrets" --parent skills-9cu -p 1 --deps skills-9cu.1

# Visualized dependency graph
bd graph skills-9cu

# Checked available work
bd ready

Architecture Notes

Skill follows code-review pattern: lenses as focused prompts
Lenses deploy to ~/.config/lenses/ops/ via home-manager
Quick mode (–quick) runs Phase 1 only for CI/pre-commit
Cross-file awareness via grep-based reference mapping (source, imports)

Process and Workflow

What Worked Well

Exploring real repos first grounded the design in actual needs
orch consensus with 4 models surfaced gaps (Nix store, Docker layers)
bd epic with –parent creates clean hierarchical structure
Dependency graph visualization helped verify task ordering

What Was Challenging

Balancing lens completeness with ~45 line target format
Deciding which checks are linter-backed vs LLM-primary
Managing context across long design session

Learning and Insights

Technical Insights

Nix store world-readability is a critical security consideration
Docker ENV/ARG persist in image layers even if later deleted
CI masking (::add-mask::) is often overlooked
shellcheck, statix, gitleaks provide structured JSON output for integration

Process Insights

orch consensus is valuable for pressure-testing designs
High temp for brainstorming, low temp for analysis decisions
bd hierarchical children (.1, .2, etc.) work well for epic breakdown

Architectural Insights

Linter-first hybrid is emerging pattern (doc-review also uses this)
Lens boundaries must be explicit to avoid duplicate findings
Platform-specific remediation matters (sops-nix vs BuildKit secrets)

Context for Future Work

Open Questions

Should ops-review have its own lens directory or share with code-review?
How to handle cross-repo awareness (dotfiles uses sops, prox-setup uses passage)?
Should we run linters in parallel before LLM pass?

Next Steps

Complete Phase 1 lenses: shell-safety, blast-radius, privilege
Integration: add to flake.nix, update ai-skills.nix
Validation: test on dotfiles and prox-setup repos
Ensure gitleaks available (dotfiles-x2m)

Related Work

Code Review Skill Creation - Original lens pattern
Doc-Review Skill Design - Hybrid architecture precedent
Multi-Lens Code Review Testing - LLM-in-the-loop pattern

Raw Notes

Dotfiles repo: 100+ Nix modules, 90+ shell scripts, SOPS secrets, Gitea Actions
Prox-setup repo: 88 Python scripts (Proxmox API), 41 shell scripts, Docker Compose
Models consulted: sonar, flash-or, gemini, gpt (all 4 supported the design)
Key insight from GPT: "Require two signals for MED/HIGH when not using known token format"
All models emphasized: don't flag Nix hashes (sha256-, narHash, vendorHash)

Session Metrics

Commits made: 0 (work in progress)
Files created: 5
Lines added: ~683 (plan.md + skill files + lens)
bd issues created: 16 (1 epic + 14 children + 1 in dotfiles)
orch consensus runs: 2

7.2 KiB Raw Blame History Unescape Escape

ops-review Skill Design, Orch Consensus Planning, and Skeleton Implementation

Session Summary

Date: 2026-01-01

Focus Area: Designing and implementing the ops-review skill for infrastructure code analysis

Accomplishments

Key Decisions

Decision 1: Linter-first hybrid architecture

Decision 2: 10 lenses across 3 phases

Decision 3: Crisp lens boundaries to avoid duplicate findings

Decision 4: Nix-specific checks as first-class concerns

Problems & Solutions

Technical Details

Code Changes

Commands Used

Architecture Notes

Process and Workflow

What Worked Well

What Was Challenging

Learning and Insights

Technical Insights

Process Insights

Architectural Insights

Context for Future Work

Open Questions

Next Steps

Related Work

Raw Notes

Session Metrics

7.2 KiB

Raw Blame History