feat(skills): Add test-review and verify-work skills
- skills/verify-work: The Gatekeeper (runs verify.sh) - skills/test-review: The Auditor (checks test quality) - skills.nix: Registered new skills
This commit is contained in:
parent
5f1a9a4d3d
commit
97a2806d47
|
|
@ -12,6 +12,8 @@
|
|||
screenshot-latest = "Find latest screenshots";
|
||||
spec-review = "Technical specification review";
|
||||
tufte-press = "Generate study card JSON";
|
||||
test-review = "Audit test quality for flakiness, tautologies, and anti-patterns";
|
||||
verify-work = "The Gatekeeper: Run project build/tests before claiming success";
|
||||
worklog = "Create structured worklogs";
|
||||
update-spec-kit = "Update spec-kit ecosystem";
|
||||
update-opencode = "Update OpenCode via Nix";
|
||||
|
|
|
|||
27
skills/test-review/README.md
Normal file
27
skills/test-review/README.md
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
# Test Review Skill (The Auditor)
|
||||
|
||||
This skill focuses on the **quality** of your test code.
|
||||
|
||||
While `verify-work` checks if tests *pass*, `test-review` checks if tests are *good*. A test suite can pass 100% and still be useless if it contains tautologies, brittle mocks, or race conditions.
|
||||
|
||||
## What It Checks
|
||||
|
||||
1. **Flakiness:** Arbitrary sleeps, global state, race conditions.
|
||||
2. **Tautologies:** Tests that always pass (`expect(true).toBe(true)`).
|
||||
3. **Brittle Mocks:** Mocking internals instead of behavior.
|
||||
4. **Structure:** Arrange-Act-Assert (AAA) pattern violations.
|
||||
5. **PBT Opportunities:** Suggesting Property-Based Testing for suitable logic.
|
||||
|
||||
## Usage
|
||||
|
||||
Ask the agent to review specific test files or the whole suite:
|
||||
|
||||
* "Review `src/auth.test.ts` for anti-patterns"
|
||||
* "Audit my test suite quality"
|
||||
* "Are these tests brittle?"
|
||||
|
||||
## Philosophy
|
||||
|
||||
"Test behavior, not implementation."
|
||||
"Wait for conditions, not time."
|
||||
"Don't mock what you don't own."
|
||||
57
skills/test-review/SKILL.md
Normal file
57
skills/test-review/SKILL.md
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
---
|
||||
name: test-review
|
||||
description: Audit test quality using multiple lenses. Analyzes for flakiness, tautologies, structure, and mocking anti-patterns. Distinct from running tests (verify-work).
|
||||
---
|
||||
|
||||
# Test Review Skill (The Auditor)
|
||||
|
||||
Run focused analysis on test files to ensure they are robust and meaningful.
|
||||
|
||||
## When to Use
|
||||
|
||||
Invoke this skill when:
|
||||
- "Review these tests"
|
||||
- "Audit my test suite"
|
||||
- "Are these tests brittle?"
|
||||
- "Check for test anti-patterns"
|
||||
|
||||
## Available Lenses
|
||||
|
||||
Lenses are located in `skills/test-review/lenses/`:
|
||||
|
||||
| Lens | Focus |
|
||||
|------|-------|
|
||||
| `flakiness.md` | Timing issues, sleeps, global state, race conditions |
|
||||
| `lies.md` | Tautologies, meaningless assertions, swallowed errors |
|
||||
| `structure.md` | Arrange-Act-Assert violations, single-action check |
|
||||
| `mocking.md` | "Don't mock what you don't own", implementation vs behavior |
|
||||
|
||||
## Workflow
|
||||
|
||||
### Phase 1: Target Selection
|
||||
1. Identify test files to review (e.g., `*.test.ts`, `test_*.py`).
|
||||
2. Show file list to user for confirmation.
|
||||
|
||||
### Phase 2: Lens Execution
|
||||
For each lens, analyze the target test files:
|
||||
1. Apply the lens criteria.
|
||||
2. Collect findings (Severity: HIGH/MED/LOW).
|
||||
|
||||
### Phase 3: Synthesis
|
||||
1. Deduplicate findings.
|
||||
2. Group related issues.
|
||||
3. Present summary report.
|
||||
|
||||
### Phase 4: Interactive Review
|
||||
Ask: "Which findings should I file as issues?"
|
||||
|
||||
## Guidelines
|
||||
|
||||
1. **Behavior Over Implementation:** Ensure tests verify what the code *does*, not how it *looks*.
|
||||
2. **Dynamic vs Static:** Remind user that this is a quality audit. Use `verify-work` to confirm if they actually *pass*.
|
||||
3. **No Flakiness:** Be ruthless about `sleep` and `setTimeout`.
|
||||
|
||||
## Integration
|
||||
|
||||
- **Issue Tracking:** Uses `bd create` for beads issues.
|
||||
- **Verification:** Works as part of the Triad (Verify -> Test-Review -> Code-Review).
|
||||
9
skills/test-review/lenses/README.md
Normal file
9
skills/test-review/lenses/README.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Lenses for Test Review
|
||||
|
||||
These lenses are used by the `test-review` skill to audit test quality.
|
||||
|
||||
## Installation
|
||||
|
||||
For global availability, these should be copied to `~/.config/lenses/test/` (or similar path configured in your agent settings).
|
||||
|
||||
The `test-review` skill will also look for them in the skill's local `lenses/` directory if they are not found globally.
|
||||
20
skills/test-review/lenses/flakiness.md
Normal file
20
skills/test-review/lenses/flakiness.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
# Lens: Flakiness Auditor
|
||||
|
||||
Analyze the tests for timing dependencies and non-deterministic behavior.
|
||||
|
||||
## Checklist
|
||||
|
||||
### 1. Arbitrary Sleeps
|
||||
* **❌ Flag:** `setTimeout`, `sleep(100)`, `time.Sleep(time.Second)`.
|
||||
* **Why:** Fast machines fail, slow machines wait too long. It is a "guess" at timing.
|
||||
* **Fix:** Use condition-based waiting (e.g., `waitFor(() => condition)`).
|
||||
|
||||
### 2. Global State Mutation
|
||||
* **❌ Flag:** Tests that modify global variables, environment variables, or singleton states without cleanup.
|
||||
* **Why:** Causes order-dependent failures.
|
||||
* **Fix:** Use `beforeEach`/`afterEach` or generate unique IDs per test.
|
||||
|
||||
### 3. Unmocked Network/IO
|
||||
* **❌ Flag:** Tests that make real network calls or write to system-wide paths (e.g., `/tmp/test`).
|
||||
* **Why:** External failures cause test flakiness.
|
||||
* **Fix:** Mock network boundaries or use isolated temp directories.
|
||||
24
skills/test-review/lenses/lies.md
Normal file
24
skills/test-review/lenses/lies.md
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
# Lens: Tautology Auditor
|
||||
|
||||
Check if tests are "lying" or providing false confidence by asserting things that are always true.
|
||||
|
||||
## Checklist
|
||||
|
||||
### 1. Mock Assertions
|
||||
* **❌ Flag:** `expect(mock).toHaveBeenCalled()`.
|
||||
* **Why:** You are testing that the mock framework works, not that the code is correct.
|
||||
* **Fix:** Assert on the *result* of the call or its *observable side effects*.
|
||||
|
||||
### 2. Logic Duplication
|
||||
* **❌ Flag:** `expect(add(a,b)).toBe(a+b)`.
|
||||
* **Why:** If the logic is wrong in both, the test passes.
|
||||
* **Fix:** Use hardcoded expected values (`expect(add(2,2)).toBe(4)`).
|
||||
|
||||
### 3. Swallowed Errors
|
||||
* **❌ Flag:** `try { ... } catch (e) {}` inside a test without re-throwing or failing.
|
||||
* **Why:** Silences actual failures.
|
||||
* **Fix:** Use the test runner's built-in error handling or `expect().toThrow()`.
|
||||
|
||||
### 4. Meaningless Assertions
|
||||
* **❌ Flag:** `expect(result).toBeDefined()` when the result is just a status code.
|
||||
* **Fix:** Assert on the *specific value* (`expect(result.status).toBe(200)`).
|
||||
18
skills/test-review/lenses/mocking.md
Normal file
18
skills/test-review/lenses/mocking.md
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# Lens: Mocking Auditor
|
||||
|
||||
Analyze how external dependencies are handled.
|
||||
|
||||
## Checklist
|
||||
|
||||
### 1. "Don't Mock What You Don't Own"
|
||||
* **❌ Flag:** Mocking `axios`, `fs`, `react-router` or other 3rd party libraries directly.
|
||||
* **Why:** Brittle. Changes in the library break your tests even if your code is fine.
|
||||
* **Fix:** Suggest a wrapper (Adapter pattern) and mock your own wrapper.
|
||||
|
||||
### 2. Implementation vs Behavior
|
||||
* **❌ Flag:** Tests that depend on private variables or internal method names.
|
||||
* **Fix:** Test only the public API.
|
||||
|
||||
### 3. Complex Mock Setup
|
||||
* **❌ Flag:** Mock setup that is > 50% of the test file.
|
||||
* **Fix:** Suggest an integration test with real (managed) dependencies or a simpler mock strategy.
|
||||
18
skills/test-review/lenses/structure.md
Normal file
18
skills/test-review/lenses/structure.md
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# Lens: Structure Auditor
|
||||
|
||||
Check if tests follow a clear, maintainable structure.
|
||||
|
||||
## Checklist
|
||||
|
||||
### 1. Arrange-Act-Assert (AAA)
|
||||
* **Check:** Does the test have clear setup, execution, and verification phases?
|
||||
* **❌ Flag:** Interleaved acting and asserting.
|
||||
* **Fix:** Refactor into `// Arrange`, `// Act`, `// Assert` blocks.
|
||||
|
||||
### 2. Single Action per Test
|
||||
* **❌ Flag:** Tests that test 5 different things in one block.
|
||||
* **Fix:** Split into multiple smaller tests.
|
||||
|
||||
### 3. Readability
|
||||
* **Check:** Does the test description match the assertion?
|
||||
* **Fix:** Use descriptive names like `test('should reject invalid emails')` instead of `test('email test')`.
|
||||
46
skills/verify-work/README.md
Normal file
46
skills/verify-work/README.md
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
# Verify Work Skill (The Gatekeeper)
|
||||
|
||||
A skill that enforces "The Iron Law": **No completion claims without fresh verification evidence.**
|
||||
|
||||
It prevents the "Hallucinated Success" problem where an AI agent claims "I fixed it" because the code *looks* right, even though it fails to compile or pass tests.
|
||||
|
||||
## Features
|
||||
|
||||
* **Universal Verification:** Automatically detects project type (Nix, Node, Rust, Python, Go, Make) and runs the appropriate checks.
|
||||
* **Zero-Config:** Works out of the box for standard project structures.
|
||||
* **Strict Gate:** Instructs the agent that it is *forbidden* from claiming success if verification fails.
|
||||
|
||||
## Usage
|
||||
|
||||
### For Agents
|
||||
The agent invokes this skill whenever it believes a task is done.
|
||||
|
||||
```bash
|
||||
# Agent runs this
|
||||
./skills/verify-work/scripts/verify.sh
|
||||
```
|
||||
|
||||
### For Humans
|
||||
You can run the script manually to verify the current directory:
|
||||
|
||||
```bash
|
||||
# Verify the current project
|
||||
~/.claude/skills/verify-work/scripts/verify.sh
|
||||
```
|
||||
|
||||
## Supported Project Types
|
||||
|
||||
1. **Nix:** `flake.nix` -> `nix flake check`
|
||||
2. **Rust:** `Cargo.toml` -> `cargo test && cargo build`
|
||||
3. **Node/TS:** `package.json` -> `npm test`, `npm run build`
|
||||
4. **Python:** `pyproject.toml`/`requirements.txt` -> `pytest` or `unittest`
|
||||
5. **Go:** `go.mod` -> `go test ./...`
|
||||
6. **Makefile:** Target `test`, `build`, or `all`
|
||||
|
||||
## Philosophy
|
||||
|
||||
This skill is the "Gatekeeper". Ideally, it runs *after* code changes but *before* `code-review` or `test-review`.
|
||||
|
||||
1. **Verify-Work:** Does it compile/pass? (If No -> STOP).
|
||||
2. **Test-Review:** Are the tests meaningful?
|
||||
3. **Code-Review:** Is the code maintainable?
|
||||
72
skills/verify-work/SKILL.md
Normal file
72
skills/verify-work/SKILL.md
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
---
|
||||
name: verify-work
|
||||
description: Use BEFORE claiming completion of any task. Enforces "The Iron Law": No completion claims without fresh verification evidence. Runs project-specific build and test commands to prove correctness.
|
||||
---
|
||||
|
||||
# Verify Work (The Gatekeeper)
|
||||
|
||||
## The Iron Law
|
||||
|
||||
```
|
||||
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
|
||||
```
|
||||
|
||||
You are the **Gatekeeper**. Your job is to prevent "Hallucinated Success".
|
||||
|
||||
**Core Principle:** A claim of "Done" is a lie unless backed by a green checkmark from the compiler/test runner *in this very turn*.
|
||||
|
||||
## When to Use
|
||||
|
||||
**MANDATORY usage before:**
|
||||
- Claiming a task is "Done" or "Complete"
|
||||
- Creating a Pull Request
|
||||
- Committing code (unless explicitly WIP)
|
||||
- Delegating to another agent (handing off "working" code)
|
||||
- Expressing satisfaction ("Great, it works!")
|
||||
|
||||
**Trigger phrases:**
|
||||
- "Verify my work"
|
||||
- "Run checks"
|
||||
- "Is this ready?"
|
||||
- "Check if I broke anything"
|
||||
|
||||
## The Process
|
||||
|
||||
1. **Detect & Run:** Execute the verification script.
|
||||
```bash
|
||||
./skills/verify-work/scripts/verify.sh
|
||||
```
|
||||
2. **Analyze Output:**
|
||||
* **Exit Code 0:** ✅ PASS.
|
||||
* **Exit Code != 0:** ❌ FAIL.
|
||||
3. **Report:**
|
||||
* **IF PASS:** "Verification Passed. [Quote output summary]. Task is complete."
|
||||
* **IF FAIL:** "Verification Failed. [Quote specific error]. Task is NOT complete."
|
||||
|
||||
## Handling Failures
|
||||
|
||||
If verification fails, you are **FORBIDDEN** from claiming success.
|
||||
* **Correct:** "I attempted the fix, but verification failed with [Error]. I need to fix this."
|
||||
* **Incorrect:** "Fixed the bug! (But tests failed)." -> This is a lie. If tests failed, the bug is not fixed.
|
||||
|
||||
## Common Rationalizations (DO NOT USE)
|
||||
|
||||
| Rationalization | Reality |
|
||||
|-----------------|---------|
|
||||
| "It was a small change" | Small changes break builds. Verify. |
|
||||
| "I'm confident" | Confidence is not evidence. Verify. |
|
||||
| "The linter passed" | Linters aren't compilers. Verify. |
|
||||
| "I'll run it later" | Then you are not "Done" now. |
|
||||
| "It should work" | "Should" is the language of failure. Verify. |
|
||||
|
||||
## Script Behavior
|
||||
|
||||
The `verify.sh` script automatically detects the project type:
|
||||
* **Nix:** `nix flake check` or `nix build`
|
||||
* **Node:** `npm test && npm run build`
|
||||
* **Rust:** `cargo test && cargo build`
|
||||
* **Python:** `pytest`
|
||||
* **Go:** `go test ./...`
|
||||
* **Generic:** Checks for `Makefile` (target `test` or `build`)
|
||||
|
||||
You do not need to know the specific command; the script handles it.
|
||||
141
skills/verify-work/scripts/verify.sh
Executable file
141
skills/verify-work/scripts/verify.sh
Executable file
|
|
@ -0,0 +1,141 @@
|
|||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# verify.sh - The "One Script" to verify any project
|
||||
# Part of the verify-work skill
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
RED='\033[0;31m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
log_info() { echo -e "${GREEN}[VERIFY]${NC} $1"; }
|
||||
log_warn() { echo -e "${YELLOW}[VERIFY]${NC} $1"; }
|
||||
log_err() { echo -e "${RED}[VERIFY]${NC} $1" >&2; }
|
||||
|
||||
# Function to run a command and time it
|
||||
run_cmd() {
|
||||
local cmd="$1"
|
||||
local desc="$2"
|
||||
log_info "Running $desc..."
|
||||
if eval "$cmd"; then
|
||||
echo -e "${GREEN}✓ $desc passed${NC}"
|
||||
return 0
|
||||
else
|
||||
echo -e "${RED}✗ $desc failed${NC}"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# 1. Detect Project Root (if not current)
|
||||
# We assume we are running from the project root usually, but good to check
|
||||
PROJECT_ROOT=$(git rev-parse --show-toplevel 2>/dev/null || pwd)
|
||||
cd "$PROJECT_ROOT"
|
||||
|
||||
log_info "Verifying project in: $PROJECT_ROOT"
|
||||
|
||||
# 2. Heuristics for Verification
|
||||
# We check for most specific to least specific
|
||||
|
||||
# --- NIX ---
|
||||
if [ -f "flake.nix" ]; then
|
||||
log_info "Detected Nix Flake"
|
||||
# Try 'nix flake check' first as it's the gold standard
|
||||
if run_cmd "nix flake check --keep-going --print-build-logs" "Nix Flake Check"; then
|
||||
exit 0
|
||||
fi
|
||||
# Fallback/Additional: If check fails or isn't enough, usually that's a hard fail.
|
||||
# But maybe we just want to try building default package?
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- RUST ---
|
||||
if [ -f "Cargo.toml" ]; then
|
||||
log_info "Detected Rust Project"
|
||||
if run_cmd "cargo test" "Rust Tests" && run_cmd "cargo build" "Rust Build"; then
|
||||
exit 0
|
||||
fi
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- NODE/JS/TS ---
|
||||
if [ -f "package.json" ]; then
|
||||
log_info "Detected Node Project"
|
||||
|
||||
# Check if 'test' script exists
|
||||
if jq -e '.scripts.test' package.json >/dev/null; then
|
||||
if ! run_cmd "npm test" "NPM Tests"; then
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
log_warn "No 'test' script found in package.json"
|
||||
fi
|
||||
|
||||
# Check if 'build' script exists
|
||||
if jq -e '.scripts.build' package.json >/dev/null; then
|
||||
if ! run_cmd "npm run build" "NPM Build"; then
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# If we ran at least one thing, success. If neither existed, warn.
|
||||
if ! jq -e '.scripts.test' package.json >/dev/null && ! jq -e '.scripts.build' package.json >/dev/null; then
|
||||
log_warn "No test or build scripts found. Verification inconclusive."
|
||||
# We don't fail, but we warn.
|
||||
fi
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- PYTHON ---
|
||||
if [ -f "pyproject.toml" ] || [ -f "requirements.txt" ] || [ -f "setup.py" ]; then
|
||||
log_info "Detected Python Project"
|
||||
if command -v pytest >/dev/null; then
|
||||
if run_cmd "pytest" "Pytest"; then
|
||||
exit 0
|
||||
fi
|
||||
exit 1
|
||||
else
|
||||
log_warn "pytest not found in path. Trying unittest..."
|
||||
if run_cmd "python3 -m unittest discover" "Unittest"; then
|
||||
exit 0
|
||||
fi
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# --- GO ---
|
||||
if [ -f "go.mod" ]; then
|
||||
log_info "Detected Go Project"
|
||||
if run_cmd "go test ./..." "Go Tests"; then
|
||||
exit 0
|
||||
fi
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# --- MAKEFILE ---
|
||||
if [ -f "Makefile" ]; then
|
||||
log_info "Detected Makefile"
|
||||
# Try 'test' target
|
||||
if grep -q "^test:" Makefile; then
|
||||
if ! run_cmd "make test" "Make Test"; then
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
# Try 'build' or 'all' target
|
||||
if grep -q "^build:" Makefile; then
|
||||
if ! run_cmd "make build" "Make Build"; then
|
||||
exit 1
|
||||
fi
|
||||
elif grep -q "^all:" Makefile; then
|
||||
if ! run_cmd "make all" "Make All"; then
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- UNKNOWN ---
|
||||
log_err "Could not detect project type (no flake.nix, package.json, Cargo.toml, etc.)"
|
||||
log_err "Please create a known configuration or add a 'test' script."
|
||||
exit 1
|
||||
Loading…
Reference in a new issue