diff --git a/skills.nix b/skills.nix index ccaf230..c18bd13 100644 --- a/skills.nix +++ b/skills.nix @@ -12,6 +12,8 @@ screenshot-latest = "Find latest screenshots"; spec-review = "Technical specification review"; tufte-press = "Generate study card JSON"; + test-review = "Audit test quality for flakiness, tautologies, and anti-patterns"; + verify-work = "The Gatekeeper: Run project build/tests before claiming success"; worklog = "Create structured worklogs"; update-spec-kit = "Update spec-kit ecosystem"; update-opencode = "Update OpenCode via Nix"; diff --git a/skills/test-review/README.md b/skills/test-review/README.md new file mode 100644 index 0000000..7fe46a0 --- /dev/null +++ b/skills/test-review/README.md @@ -0,0 +1,27 @@ +# Test Review Skill (The Auditor) + +This skill focuses on the **quality** of your test code. + +While `verify-work` checks if tests *pass*, `test-review` checks if tests are *good*. A test suite can pass 100% and still be useless if it contains tautologies, brittle mocks, or race conditions. + +## What It Checks + +1. **Flakiness:** Arbitrary sleeps, global state, race conditions. +2. **Tautologies:** Tests that always pass (`expect(true).toBe(true)`). +3. **Brittle Mocks:** Mocking internals instead of behavior. +4. **Structure:** Arrange-Act-Assert (AAA) pattern violations. +5. **PBT Opportunities:** Suggesting Property-Based Testing for suitable logic. + +## Usage + +Ask the agent to review specific test files or the whole suite: + +* "Review `src/auth.test.ts` for anti-patterns" +* "Audit my test suite quality" +* "Are these tests brittle?" + +## Philosophy + +"Test behavior, not implementation." +"Wait for conditions, not time." +"Don't mock what you don't own." diff --git a/skills/test-review/SKILL.md b/skills/test-review/SKILL.md new file mode 100644 index 0000000..159fd2e --- /dev/null +++ b/skills/test-review/SKILL.md @@ -0,0 +1,57 @@ +--- +name: test-review +description: Audit test quality using multiple lenses. Analyzes for flakiness, tautologies, structure, and mocking anti-patterns. Distinct from running tests (verify-work). +--- + +# Test Review Skill (The Auditor) + +Run focused analysis on test files to ensure they are robust and meaningful. + +## When to Use + +Invoke this skill when: +- "Review these tests" +- "Audit my test suite" +- "Are these tests brittle?" +- "Check for test anti-patterns" + +## Available Lenses + +Lenses are located in `skills/test-review/lenses/`: + +| Lens | Focus | +|------|-------| +| `flakiness.md` | Timing issues, sleeps, global state, race conditions | +| `lies.md` | Tautologies, meaningless assertions, swallowed errors | +| `structure.md` | Arrange-Act-Assert violations, single-action check | +| `mocking.md` | "Don't mock what you don't own", implementation vs behavior | + +## Workflow + +### Phase 1: Target Selection +1. Identify test files to review (e.g., `*.test.ts`, `test_*.py`). +2. Show file list to user for confirmation. + +### Phase 2: Lens Execution +For each lens, analyze the target test files: +1. Apply the lens criteria. +2. Collect findings (Severity: HIGH/MED/LOW). + +### Phase 3: Synthesis +1. Deduplicate findings. +2. Group related issues. +3. Present summary report. + +### Phase 4: Interactive Review +Ask: "Which findings should I file as issues?" + +## Guidelines + +1. **Behavior Over Implementation:** Ensure tests verify what the code *does*, not how it *looks*. +2. **Dynamic vs Static:** Remind user that this is a quality audit. Use `verify-work` to confirm if they actually *pass*. +3. **No Flakiness:** Be ruthless about `sleep` and `setTimeout`. + +## Integration + +- **Issue Tracking:** Uses `bd create` for beads issues. +- **Verification:** Works as part of the Triad (Verify -> Test-Review -> Code-Review). \ No newline at end of file diff --git a/skills/test-review/lenses/README.md b/skills/test-review/lenses/README.md new file mode 100644 index 0000000..0321020 --- /dev/null +++ b/skills/test-review/lenses/README.md @@ -0,0 +1,9 @@ +# Lenses for Test Review + +These lenses are used by the `test-review` skill to audit test quality. + +## Installation + +For global availability, these should be copied to `~/.config/lenses/test/` (or similar path configured in your agent settings). + +The `test-review` skill will also look for them in the skill's local `lenses/` directory if they are not found globally. diff --git a/skills/test-review/lenses/flakiness.md b/skills/test-review/lenses/flakiness.md new file mode 100644 index 0000000..5468d87 --- /dev/null +++ b/skills/test-review/lenses/flakiness.md @@ -0,0 +1,20 @@ +# Lens: Flakiness Auditor + +Analyze the tests for timing dependencies and non-deterministic behavior. + +## Checklist + +### 1. Arbitrary Sleeps +* **❌ Flag:** `setTimeout`, `sleep(100)`, `time.Sleep(time.Second)`. +* **Why:** Fast machines fail, slow machines wait too long. It is a "guess" at timing. +* **Fix:** Use condition-based waiting (e.g., `waitFor(() => condition)`). + +### 2. Global State Mutation +* **❌ Flag:** Tests that modify global variables, environment variables, or singleton states without cleanup. +* **Why:** Causes order-dependent failures. +* **Fix:** Use `beforeEach`/`afterEach` or generate unique IDs per test. + +### 3. Unmocked Network/IO +* **❌ Flag:** Tests that make real network calls or write to system-wide paths (e.g., `/tmp/test`). +* **Why:** External failures cause test flakiness. +* **Fix:** Mock network boundaries or use isolated temp directories. diff --git a/skills/test-review/lenses/lies.md b/skills/test-review/lenses/lies.md new file mode 100644 index 0000000..5b2b59d --- /dev/null +++ b/skills/test-review/lenses/lies.md @@ -0,0 +1,24 @@ +# Lens: Tautology Auditor + +Check if tests are "lying" or providing false confidence by asserting things that are always true. + +## Checklist + +### 1. Mock Assertions +* **❌ Flag:** `expect(mock).toHaveBeenCalled()`. +* **Why:** You are testing that the mock framework works, not that the code is correct. +* **Fix:** Assert on the *result* of the call or its *observable side effects*. + +### 2. Logic Duplication +* **❌ Flag:** `expect(add(a,b)).toBe(a+b)`. +* **Why:** If the logic is wrong in both, the test passes. +* **Fix:** Use hardcoded expected values (`expect(add(2,2)).toBe(4)`). + +### 3. Swallowed Errors +* **❌ Flag:** `try { ... } catch (e) {}` inside a test without re-throwing or failing. +* **Why:** Silences actual failures. +* **Fix:** Use the test runner's built-in error handling or `expect().toThrow()`. + +### 4. Meaningless Assertions +* **❌ Flag:** `expect(result).toBeDefined()` when the result is just a status code. +* **Fix:** Assert on the *specific value* (`expect(result.status).toBe(200)`). diff --git a/skills/test-review/lenses/mocking.md b/skills/test-review/lenses/mocking.md new file mode 100644 index 0000000..81bd7f1 --- /dev/null +++ b/skills/test-review/lenses/mocking.md @@ -0,0 +1,18 @@ +# Lens: Mocking Auditor + +Analyze how external dependencies are handled. + +## Checklist + +### 1. "Don't Mock What You Don't Own" +* **❌ Flag:** Mocking `axios`, `fs`, `react-router` or other 3rd party libraries directly. +* **Why:** Brittle. Changes in the library break your tests even if your code is fine. +* **Fix:** Suggest a wrapper (Adapter pattern) and mock your own wrapper. + +### 2. Implementation vs Behavior +* **❌ Flag:** Tests that depend on private variables or internal method names. +* **Fix:** Test only the public API. + +### 3. Complex Mock Setup +* **❌ Flag:** Mock setup that is > 50% of the test file. +* **Fix:** Suggest an integration test with real (managed) dependencies or a simpler mock strategy. diff --git a/skills/test-review/lenses/structure.md b/skills/test-review/lenses/structure.md new file mode 100644 index 0000000..1b7e316 --- /dev/null +++ b/skills/test-review/lenses/structure.md @@ -0,0 +1,18 @@ +# Lens: Structure Auditor + +Check if tests follow a clear, maintainable structure. + +## Checklist + +### 1. Arrange-Act-Assert (AAA) +* **Check:** Does the test have clear setup, execution, and verification phases? +* **❌ Flag:** Interleaved acting and asserting. +* **Fix:** Refactor into `// Arrange`, `// Act`, `// Assert` blocks. + +### 2. Single Action per Test +* **❌ Flag:** Tests that test 5 different things in one block. +* **Fix:** Split into multiple smaller tests. + +### 3. Readability +* **Check:** Does the test description match the assertion? +* **Fix:** Use descriptive names like `test('should reject invalid emails')` instead of `test('email test')`. diff --git a/skills/verify-work/README.md b/skills/verify-work/README.md new file mode 100644 index 0000000..80f6b0d --- /dev/null +++ b/skills/verify-work/README.md @@ -0,0 +1,46 @@ +# Verify Work Skill (The Gatekeeper) + +A skill that enforces "The Iron Law": **No completion claims without fresh verification evidence.** + +It prevents the "Hallucinated Success" problem where an AI agent claims "I fixed it" because the code *looks* right, even though it fails to compile or pass tests. + +## Features + +* **Universal Verification:** Automatically detects project type (Nix, Node, Rust, Python, Go, Make) and runs the appropriate checks. +* **Zero-Config:** Works out of the box for standard project structures. +* **Strict Gate:** Instructs the agent that it is *forbidden* from claiming success if verification fails. + +## Usage + +### For Agents +The agent invokes this skill whenever it believes a task is done. + +```bash +# Agent runs this +./skills/verify-work/scripts/verify.sh +``` + +### For Humans +You can run the script manually to verify the current directory: + +```bash +# Verify the current project +~/.claude/skills/verify-work/scripts/verify.sh +``` + +## Supported Project Types + +1. **Nix:** `flake.nix` -> `nix flake check` +2. **Rust:** `Cargo.toml` -> `cargo test && cargo build` +3. **Node/TS:** `package.json` -> `npm test`, `npm run build` +4. **Python:** `pyproject.toml`/`requirements.txt` -> `pytest` or `unittest` +5. **Go:** `go.mod` -> `go test ./...` +6. **Makefile:** Target `test`, `build`, or `all` + +## Philosophy + +This skill is the "Gatekeeper". Ideally, it runs *after* code changes but *before* `code-review` or `test-review`. + +1. **Verify-Work:** Does it compile/pass? (If No -> STOP). +2. **Test-Review:** Are the tests meaningful? +3. **Code-Review:** Is the code maintainable? diff --git a/skills/verify-work/SKILL.md b/skills/verify-work/SKILL.md new file mode 100644 index 0000000..03a6700 --- /dev/null +++ b/skills/verify-work/SKILL.md @@ -0,0 +1,72 @@ +--- +name: verify-work +description: Use BEFORE claiming completion of any task. Enforces "The Iron Law": No completion claims without fresh verification evidence. Runs project-specific build and test commands to prove correctness. +--- + +# Verify Work (The Gatekeeper) + +## The Iron Law + +``` +NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE +``` + +You are the **Gatekeeper**. Your job is to prevent "Hallucinated Success". + +**Core Principle:** A claim of "Done" is a lie unless backed by a green checkmark from the compiler/test runner *in this very turn*. + +## When to Use + +**MANDATORY usage before:** +- Claiming a task is "Done" or "Complete" +- Creating a Pull Request +- Committing code (unless explicitly WIP) +- Delegating to another agent (handing off "working" code) +- Expressing satisfaction ("Great, it works!") + +**Trigger phrases:** +- "Verify my work" +- "Run checks" +- "Is this ready?" +- "Check if I broke anything" + +## The Process + +1. **Detect & Run:** Execute the verification script. + ```bash + ./skills/verify-work/scripts/verify.sh + ``` +2. **Analyze Output:** + * **Exit Code 0:** ✅ PASS. + * **Exit Code != 0:** ❌ FAIL. +3. **Report:** + * **IF PASS:** "Verification Passed. [Quote output summary]. Task is complete." + * **IF FAIL:** "Verification Failed. [Quote specific error]. Task is NOT complete." + +## Handling Failures + +If verification fails, you are **FORBIDDEN** from claiming success. +* **Correct:** "I attempted the fix, but verification failed with [Error]. I need to fix this." +* **Incorrect:** "Fixed the bug! (But tests failed)." -> This is a lie. If tests failed, the bug is not fixed. + +## Common Rationalizations (DO NOT USE) + +| Rationalization | Reality | +|-----------------|---------| +| "It was a small change" | Small changes break builds. Verify. | +| "I'm confident" | Confidence is not evidence. Verify. | +| "The linter passed" | Linters aren't compilers. Verify. | +| "I'll run it later" | Then you are not "Done" now. | +| "It should work" | "Should" is the language of failure. Verify. | + +## Script Behavior + +The `verify.sh` script automatically detects the project type: +* **Nix:** `nix flake check` or `nix build` +* **Node:** `npm test && npm run build` +* **Rust:** `cargo test && cargo build` +* **Python:** `pytest` +* **Go:** `go test ./...` +* **Generic:** Checks for `Makefile` (target `test` or `build`) + +You do not need to know the specific command; the script handles it. diff --git a/skills/verify-work/scripts/verify.sh b/skills/verify-work/scripts/verify.sh new file mode 100755 index 0000000..b0acfb2 --- /dev/null +++ b/skills/verify-work/scripts/verify.sh @@ -0,0 +1,141 @@ +#!/usr/bin/env bash +set -euo pipefail + +# verify.sh - The "One Script" to verify any project +# Part of the verify-work skill + +# Colors for output +GREEN='\033[0;32m' +RED='\033[0;31m' +YELLOW='\033[1;33m' +NC='\033[0m' # No Color + +log_info() { echo -e "${GREEN}[VERIFY]${NC} $1"; } +log_warn() { echo -e "${YELLOW}[VERIFY]${NC} $1"; } +log_err() { echo -e "${RED}[VERIFY]${NC} $1" >&2; } + +# Function to run a command and time it +run_cmd() { + local cmd="$1" + local desc="$2" + log_info "Running $desc..." + if eval "$cmd"; then + echo -e "${GREEN}✓ $desc passed${NC}" + return 0 + else + echo -e "${RED}✗ $desc failed${NC}" + return 1 + fi +} + +# 1. Detect Project Root (if not current) +# We assume we are running from the project root usually, but good to check +PROJECT_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd) +cd "$PROJECT_ROOT" + +log_info "Verifying project in: $PROJECT_ROOT" + +# 2. Heuristics for Verification +# We check for most specific to least specific + +# --- NIX --- +if [ -f "flake.nix" ]; then + log_info "Detected Nix Flake" + # Try 'nix flake check' first as it's the gold standard + if run_cmd "nix flake check --keep-going --print-build-logs" "Nix Flake Check"; then + exit 0 + fi + # Fallback/Additional: If check fails or isn't enough, usually that's a hard fail. + # But maybe we just want to try building default package? + exit 1 +fi + +# --- RUST --- +if [ -f "Cargo.toml" ]; then + log_info "Detected Rust Project" + if run_cmd "cargo test" "Rust Tests" && run_cmd "cargo build" "Rust Build"; then + exit 0 + fi + exit 1 +fi + +# --- NODE/JS/TS --- +if [ -f "package.json" ]; then + log_info "Detected Node Project" + + # Check if 'test' script exists + if jq -e '.scripts.test' package.json >/dev/null; then + if ! run_cmd "npm test" "NPM Tests"; then + exit 1 + fi + else + log_warn "No 'test' script found in package.json" + fi + + # Check if 'build' script exists + if jq -e '.scripts.build' package.json >/dev/null; then + if ! run_cmd "npm run build" "NPM Build"; then + exit 1 + fi + fi + + # If we ran at least one thing, success. If neither existed, warn. + if ! jq -e '.scripts.test' package.json >/dev/null && ! jq -e '.scripts.build' package.json >/dev/null; then + log_warn "No test or build scripts found. Verification inconclusive." + # We don't fail, but we warn. + fi + exit 0 +fi + +# --- PYTHON --- +if [ -f "pyproject.toml" ] || [ -f "requirements.txt" ] || [ -f "setup.py" ]; then + log_info "Detected Python Project" + if command -v pytest >/dev/null; then + if run_cmd "pytest" "Pytest"; then + exit 0 + fi + exit 1 + else + log_warn "pytest not found in path. Trying unittest..." + if run_cmd "python3 -m unittest discover" "Unittest"; then + exit 0 + fi + exit 1 + fi +fi + +# --- GO --- +if [ -f "go.mod" ]; then + log_info "Detected Go Project" + if run_cmd "go test ./..." "Go Tests"; then + exit 0 + fi + exit 1 +fi + +# --- MAKEFILE --- +if [ -f "Makefile" ]; then + log_info "Detected Makefile" + # Try 'test' target + if grep -q "^test:" Makefile; then + if ! run_cmd "make test" "Make Test"; then + exit 1 + fi + fi + # Try 'build' or 'all' target + if grep -q "^build:" Makefile; then + if ! run_cmd "make build" "Make Build"; then + exit 1 + fi + elif grep -q "^all:" Makefile; then + if ! run_cmd "make all" "Make All"; then + exit 1 + fi + fi + exit 0 +fi + +# --- UNKNOWN --- +log_err "Could not detect project type (no flake.nix, package.json, Cargo.toml, etc.)" +log_err "Please create a known configuration or add a 'test' script." +exit 1