docs: research idle/alice quality gate mechanism

Comprehensive analysis of emes idle/alice plugin: - Hook chain (6 hooks, Stop is key blocker) - State management via jwz (topic-based messaging) - alice agent (read-only Opus reviewer) - Circuit breakers against infinite loops Conclusion: alice pattern is overkill for code-review (we ARE the reviewer). More useful: "review reminder" hook that checks if code-review was run before exit on significant changes. Closes: skills-9jk Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 16:43:46 -08:00 · 2026-01-09 16:43:46 -08:00 · 239c758dc7
parent 1b943742bd
commit 239c758dc7
2 changed files with 221 additions and 1 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
@ -53,7 +53,7 @@
 {"id":"skills-9cu.7","title":"Lens: supply-chain","description":"Create supply-chain.md lens for provenance:\n- Unpinned versions (latest tags)\n- Actions not pinned to SHA\n- Missing flake.lock/SRI hashes\n- Unsigned artifacts\n- Untrusted registries","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.317966318-05:00","created_by":"dan","updated_at":"2026-01-01T22:03:26.655269107-05:00","closed_at":"2026-01-01T22:03:26.655269107-05:00","close_reason":"Lens created with orch consensus: added Terraform/Tofu, build-time network access, GH Actions permissions, builtins.fetchTarball","dependencies":[{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.319754113-05:00","created_by":"dan"},{"issue_id":"skills-9cu.7","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.322943568-05:00","created_by":"dan"}]}
 {"id":"skills-9cu.8","title":"Lens: observability","description":"Create observability.md lens for visibility:\n- Silent failures\n- Missing health checks\n- Incomplete metrics\n- Missing structured logging\n- No correlation IDs","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-01T16:55:49.562009474-05:00","created_by":"dan","updated_at":"2026-01-01T22:05:03.351508622-05:00","closed_at":"2026-01-01T22:05:03.351508622-05:00","close_reason":"Lens created with orch consensus: added resource visibility, heartbeats, version/build metadata, log rotation","dependencies":[{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:55:49.564394694-05:00","created_by":"dan"},{"issue_id":"skills-9cu.8","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:55:49.571005731-05:00","created_by":"dan"}]}
 {"id":"skills-9cu.9","title":"Lens: nix-hygiene","description":"Create nix-hygiene.md lens (statix/deadnix-backed):\n- Dead code (unused bindings)\n- Anti-patterns (with lib abuse, IFD)\n- Module boundary violations\n- Overlay issues\n- Missing option types\n\nLinter integration: statix + deadnix JSON","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-01T16:56:00.623672452-05:00","created_by":"dan","updated_at":"2026-01-01T23:58:43.868830539-05:00","closed_at":"2026-01-01T23:58:43.868830539-05:00","close_reason":"Lens created with orch consensus: added lib.mkIf guards, mkDefault/mkForce, reproducibility/purity, build efficiency, expanded false positives","dependencies":[{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu","type":"parent-child","created_at":"2026-01-01T16:56:00.638729349-05:00","created_by":"dan"},{"issue_id":"skills-9cu.9","depends_on_id":"skills-9cu.1","type":"blocks","created_at":"2026-01-01T16:56:00.643063075-05:00","created_by":"dan"}]}
-{"id":"skills-9jk","title":"Research: emes idle quality gate for code-review","description":"Evaluate whether code-review skill should use idle-style quality gate (block exit until review approved). Would enforce review completion mechanically.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-09T10:59:25.094378206-08:00","created_by":"dan","updated_at":"2026-01-09T10:59:25.094378206-08:00","dependencies":[{"issue_id":"skills-9jk","depends_on_id":"skills-6x1","type":"blocks","created_at":"2026-01-09T10:59:33.267948785-08:00","created_by":"dan"}]}
+{"id":"skills-9jk","title":"Research: emes idle quality gate for code-review","description":"Evaluate whether code-review skill should use idle-style quality gate (block exit until review approved). Would enforce review completion mechanically.","status":"in_progress","priority":3,"issue_type":"task","created_at":"2026-01-09T10:59:25.094378206-08:00","created_by":"dan","updated_at":"2026-01-09T16:41:08.228529392-08:00","dependencies":[{"issue_id":"skills-9jk","depends_on_id":"skills-6x1","type":"blocks","created_at":"2026-01-09T10:59:33.267948785-08:00","created_by":"dan"}]}
 {"id":"skills-a0x","title":"spec-review: Add traceability requirements across artifacts","description":"Prompts don't enforce spec → plan → tasks linkage. Drift can occur without detection.\n\nAdd:\n- Require trace matrix or linkage in reviews\n- Each plan item should reference spec requirement\n- Each task should reference plan item\n- Flag unmapped items and extra scope","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:25.270581198-08:00","updated_at":"2025-12-15T14:05:48.196356786-08:00","closed_at":"2025-12-15T14:05:48.196356786-08:00"}
 {"id":"skills-a23","title":"Update main README to list all 9 skills","description":"Main README.md 'Skills Included' section only lists worklog and update-spec-kit. Repo actually has 9 skills: template, worklog, update-spec-kit, screenshot-latest, niri-window-capture, tufte-press, update-opencode, web-research, web-search.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-11-30T11:58:14.042397754-08:00","updated_at":"2025-12-28T22:08:02.074758486-05:00","closed_at":"2025-12-28T22:08:02.074758486-05:00","close_reason":"Updated README with table listing all 14 skills (5 deployed, 8 available, 1 development template)","dependencies":[{"issue_id":"skills-a23","depends_on_id":"skills-4yn","type":"blocks","created_at":"2025-11-30T12:01:30.306742184-08:00","created_by":"daemon","metadata":"{}"}]}
 {"id":"skills-al5","title":"Consider repo-setup-verification skill","description":"The dotfiles repo has a repo-setup-prompt.md verification checklist that could become a skill.\n\n**Source**: ~/proj/dotfiles/docs/repo-setup-prompt.md\n\n**What it does**:\n- Verifies .envrc has use_api_keys and skills loading\n- Checks .skills manifest exists with appropriate skills\n- Optionally checks beads setup\n- Verifies API keys are loaded\n\n**As a skill it could**:\n- Be invoked to audit any repo's agent setup\n- Offer to fix missing pieces\n- Provide consistent onboarding for new repos\n\n**Questions**:\n- Is this better as a skill vs a slash command?\n- Should it auto-fix or just report?\n- Does it belong in skills repo or dotfiles?","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T12:38:32.561337354-08:00","updated_at":"2025-12-28T22:22:57.639520516-05:00","closed_at":"2025-12-28T22:22:57.639520516-05:00","close_reason":"Decided: keep as prompt doc in dotfiles, not a skill. Claude can read it when asked. No wrapper benefit, and it's dotfiles-specific setup (not general skill). ai-tools-doctor handles version checking separately."}
--- a/docs/research/idle-alice-quality-gate.md
+++ b/docs/research/idle-alice-quality-gate.md
@ -0,0 +1,220 @@
+# idle/alice Quality Gate Analysis
+
+> **Date:** 2026-01-09
+> **Status:** Research complete
+> **Related:** [skills-9jk](../../.beads/), [ADR-005](../adr/005-dual-publish-plugin-architecture.md)
+
+## Overview
+
+**alice** (package name: idle) is a Claude Code plugin that mechanically enforces code quality by blocking agent exit until an independent reviewer (the alice agent) approves the work.
+
+- **Repo:** https://github.com/evil-mind-evil-sword/idle
+- **Language:** Zig
+- **Author:** femtomc
+- **License:** AGPL-3.0
+
+## How It Works
+
+### Activation
+
+Opt-in per-prompt via `#alice` prefix:
+```
+#alice implement user authentication with JWT
+```
+
+The `UserPromptSubmit` hook detects this prefix and sets review state via jwz.
+
+### Hook Chain
+
+alice uses 6 Claude Code hooks:
+
+| Hook | Purpose | Timeout |
+|------|---------|---------|
+| `SessionStart` | Initialize session state | 5s |
+| `UserPromptSubmit` | Detect `#alice` prefix, enable review | 5s |
+| `Stop` | **Block exit until approved** | 30s |
+| `PostToolUse` | Track tool usage | 5s |
+| `SubagentStop` | Validate alice posted decision | 5s |
+| `SessionEnd` | Cleanup | 5s |
+
+### The Stop Hook (Core Mechanism)
+
+When agent tries to exit:
+
+```
+1. Load jwz store
+2. Query "review:state:{session_id}" - is review enabled?
+3. If not enabled → approve immediately
+4. Query "alice:status:{session_id}" - did alice approve?
+5. If decision == "COMPLETE" → reset state, allow exit
+6. Otherwise → BLOCK, instruct agent to spawn alice
+```
+
+### hooks.json Structure
+
+```json
+{
+  "hooks": {
+    "SessionStart": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "alice hook session-start",
+            "timeout": 5
+          }
+        ]
+      }
+    ],
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "alice hook stop",
+            "timeout": 30
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+Each hook invokes the `alice` CLI with a subcommand. The CLI checks/updates state in jwz.
+
+## State Management (jwz)
+
+**jwz** is an append-only topic-based messaging system:
+
+- Stores messages in `.jwz/messages.jsonl` (git-mergeable)
+- SQLite cache for FTS5 search
+- Auto-captures git context (commit, branch, dirty status)
+- Topics like `review:state:{session}`, `alice:status:{session}`
+
+Key jwz commands:
+```bash
+jwz post <topic> -m <message>     # Post message
+jwz read <topic>                   # Read topic
+jwz search <query>                 # Full-text search
+```
+
+## The alice Agent
+
+alice is a **read-only Opus-based reviewer**:
+
+- **Model:** Claude Opus
+- **Access:** Read-only (no file modifications)
+- **Tools:** Read, Grep, Glob, Bash (restricted to `tissue` and `jwz`)
+- **Philosophy:** "Work for the user, not the agent"
+
+### Review Methodology
+
+1. Compare deliverables against **user's actual words** (not agent claims)
+2. Assume errors exist in complex work
+3. Steel-man the strongest case, then attack it
+4. Seek second opinions from Codex/Gemini
+5. Post decision: `COMPLETE` or `ISSUES`
+
+### Decision Output
+
+alice posts to `alice:status:{session_id}`:
+```json
+{
+  "decision": "COMPLETE" | "ISSUES",
+  "summary": "...",
+  "reasoning": "...",
+  "second_opinions": [...],
+  "message_to_agent": "..."
+}
+```
+
+## Circuit Breakers
+
+Three safeguards against infinite loops:
+
+1. **Stale Review Detection:** Same review blocks ≥3 times → fail open
+2. **No-ID Blocks:** alice never posts decision → 3 blocks → fail open
+3. **State Persistence:** Counters stored in jwz for recovery
+
+## Key Design Principles
+
+From emes architecture:
+
+| Principle | Implementation |
+|-----------|----------------|
+| **Pull over push** | Agent retrieves context on-demand, not upfront |
+| **Safety over policy** | Critical guardrails via hooks, not prompts |
+| **Pointer over payload** | Messages contain references (IDs), not full content |
+
+## Dependencies
+
+**Required:**
+- `jwz` - State management
+- `tissue` - Issue tracking
+- `jq` - JSON parsing in hooks
+
+**Optional (for consensus):**
+- `codex` - OpenAI CLI
+- `gemini` - Google CLI
+
+## Applicability to Our Skills
+
+### code-review Skill
+
+**Current state:** Interactive - runs lenses, presents findings, asks before filing issues.
+
+**Potential enhancement:** Add quality gate that blocks exit until findings are addressed.
+
+**Challenges:**
+1. We don't have jwz - would need state management
+2. Our review IS the quality gate (not a separate reviewer)
+3. Different use case: code-review reviews code, alice reviews agent work
+
+**Options:**
+
+| Approach | Pros | Cons |
+|----------|------|------|
+| **A: Adopt jwz** | Full emes compatibility | Another dependency, Zig tool |
+| **B: Use beads** | Already have it | Not designed for transient session state |
+| **C: Simple file state** | Minimal, portable | DIY circuit breakers |
+| **D: Hook-only (stateless)** | Simplest | No persistence across tool calls |
+
+### Recommendation
+
+For code-review, the alice pattern is overkill. Our skill already does the review - we don't need a second reviewer to review the review.
+
+**More useful pattern:** Use `Stop` hook to remind agent to run code-review before exiting if significant code changes were made. This is a "did you remember to review?" gate, not a "did review pass?" gate.
+
+Example:
+```json
+{
+  "hooks": {
+    "Stop": [{
+      "hooks": [{
+        "type": "command",
+        "command": "check-review-reminder.sh",
+        "timeout": 5
+      }]
+    }]
+  }
+}
+```
+
+The script checks if:
+1. Significant code changes exist (git diff)
+2. code-review was invoked this session
+3. If changes but no review → return non-zero (block with reminder)
+
+## Open Questions
+
+1. Should we adopt jwz for cross-skill state coordination?
+2. Is the "review reminder" pattern valuable enough to implement?
+3. Could ops-review benefit from similar gating?
+4. How do hooks interact with our dual-publish strategy?
+
+## References
+
+- [alice/idle repo](https://github.com/evil-mind-evil-sword/idle)
+- [jwz repo](https://github.com/evil-mind-evil-sword/jwz)
+- [Claude Code Hooks Docs](https://code.claude.com/docs/en/hooks)