alice is for reviewing AGENT work in unattended/autonomous contexts, not code review. Key use cases: - Autonomous runs on ops-jrz1 - CI/CD pipelines with agents - High-stakes changes without human oversight Added hybrid approach recommendation: use alice concepts (Stop hook, adversarial methodology) with our infrastructure (beads, orch). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8.1 KiB
idle/alice Quality Gate Analysis
Date: 2026-01-09 Status: Research complete Related: skills-9jk, ADR-005
Overview
alice (package name: idle) is a Claude Code plugin that mechanically enforces code quality by blocking agent exit until an independent reviewer (the alice agent) approves the work.
- Repo: https://github.com/evil-mind-evil-sword/idle
- Language: Zig
- Author: femtomc
- License: AGPL-3.0
How It Works
Activation
Opt-in per-prompt via #alice prefix:
#alice implement user authentication with JWT
The UserPromptSubmit hook detects this prefix and sets review state via jwz.
Hook Chain
alice uses 6 Claude Code hooks:
| Hook | Purpose | Timeout |
|---|---|---|
SessionStart |
Initialize session state | 5s |
UserPromptSubmit |
Detect #alice prefix, enable review |
5s |
Stop |
Block exit until approved | 30s |
PostToolUse |
Track tool usage | 5s |
SubagentStop |
Validate alice posted decision | 5s |
SessionEnd |
Cleanup | 5s |
The Stop Hook (Core Mechanism)
When agent tries to exit:
1. Load jwz store
2. Query "review:state:{session_id}" - is review enabled?
3. If not enabled → approve immediately
4. Query "alice:status:{session_id}" - did alice approve?
5. If decision == "COMPLETE" → reset state, allow exit
6. Otherwise → BLOCK, instruct agent to spawn alice
hooks.json Structure
{
"hooks": {
"SessionStart": [
{
"hooks": [
{
"type": "command",
"command": "alice hook session-start",
"timeout": 5
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "alice hook stop",
"timeout": 30
}
]
}
]
}
}
Each hook invokes the alice CLI with a subcommand. The CLI checks/updates state in jwz.
State Management (jwz)
jwz is an append-only topic-based messaging system:
- Stores messages in
.jwz/messages.jsonl(git-mergeable) - SQLite cache for FTS5 search
- Auto-captures git context (commit, branch, dirty status)
- Topics like
review:state:{session},alice:status:{session}
Key jwz commands:
jwz post <topic> -m <message> # Post message
jwz read <topic> # Read topic
jwz search <query> # Full-text search
The alice Agent
alice is a read-only Opus-based reviewer:
- Model: Claude Opus
- Access: Read-only (no file modifications)
- Tools: Read, Grep, Glob, Bash (restricted to
tissueandjwz) - Philosophy: "Work for the user, not the agent"
Review Methodology
- Compare deliverables against user's actual words (not agent claims)
- Assume errors exist in complex work
- Steel-man the strongest case, then attack it
- Seek second opinions from Codex/Gemini
- Post decision:
COMPLETEorISSUES
Decision Output
alice posts to alice:status:{session_id}:
{
"decision": "COMPLETE" | "ISSUES",
"summary": "...",
"reasoning": "...",
"second_opinions": [...],
"message_to_agent": "..."
}
Circuit Breakers
Three safeguards against infinite loops:
- Stale Review Detection: Same review blocks ≥3 times → fail open
- No-ID Blocks: alice never posts decision → 3 blocks → fail open
- State Persistence: Counters stored in jwz for recovery
Key Design Principles
From emes architecture:
| Principle | Implementation |
|---|---|
| Pull over push | Agent retrieves context on-demand, not upfront |
| Safety over policy | Critical guardrails via hooks, not prompts |
| Pointer over payload | Messages contain references (IDs), not full content |
Dependencies
Required:
jwz- State managementtissue- Issue trackingjq- JSON parsing in hooks
Optional (for consensus):
codex- OpenAI CLIgemini- Google CLI
Use Case: Adversarial Agent Review
alice is NOT about code review - it's about adversarial review of agent work in automation contexts.
The Problem alice Solves
When agents work autonomously (unattended, CI/CD, batch jobs):
- Agent may claim "done" when work is incomplete
- Agent may miss requirements or make incorrect assumptions
- Agent may take shortcuts that don't serve user intent
- No human in the loop to catch mistakes
alice's Solution
An independent adversarial reviewer (Opus) that:
- Works for the user, not the agent - grounds truth in user's actual words
- Assumes errors exist - actively looks for problems
- Steel-mans then attacks - gives benefit of doubt, then systematically challenges
- Seeks external opinions - uses Codex/Gemini for second opinions
- Mechanically blocks exit - can't be talked out of it via prompt
When to Use alice Pattern
| Scenario | Why alice helps |
|---|---|
| Autonomous/unattended runs | No human watching - need automated QA |
| CI/CD with agents | Quality gate before merge |
| Complex multi-step features | Verify each deliverable meets requirements |
| Refactoring | Ensure nothing broke |
| ops-jrz1 deployment | Remote server, less oversight |
When NOT to Use
- Interactive sessions with human oversight
- Simple, low-risk changes
- Exploratory/research work (no deliverable to review)
Applicability to Our Workflow
Potential Use Cases
-
Autonomous runs on ops-jrz1
- Agent implements feature on VPS
- alice reviews before agent exits
- Issues filed to tissue if problems found
-
Batch processing
- Agent processes multiple tasks
- alice spot-checks work quality
-
High-stakes changes
- Security-sensitive code
- Infrastructure changes
- Production deployments
Integration Options
| Approach | Pros | Cons |
|---|---|---|
| A: Adopt alice directly | Battle-tested, full features | Requires jwz, tissue, Zig deps |
| B: Build our own | Tailored to our needs, use beads | Dev effort, reinventing wheel |
| C: Hybrid | Use alice concepts, our infra | Best of both, some integration work |
| D: orch-as-reviewer | Already have orch for multi-model | Different purpose, not adversarial |
Hybrid Approach (Recommended)
Use alice's concepts with our infrastructure:
- Stop hook - Block exit until review passes
- beads for state - Track review status per session
- orch for second opinions - We already have multi-model consensus
- Adversarial prompt - Adapt alice's methodology
Example hooks.json:
{
"hooks": {
"Stop": [{
"hooks": [{
"type": "command",
"command": "review-gate check",
"timeout": 30
}]
}]
}
}
review-gate would:
- Check if review mode is enabled (beads flag or env var)
- If enabled, check for approval in beads
- If unapproved, block and instruct agent to spawn reviewer
- Circuit breaker after N failures
Reviewer Agent Design
Inspired by alice but using our tools:
# Adversarial Reviewer
You review agent work for the USER, not the agent.
## Methodology
1. Read the user's original request (not agent's summary)
2. Examine all changes made (git diff, file reads)
3. Assume errors exist - find them
4. Use orch for second opinions on non-trivial work
5. Post decision to beads
## Decision
- APPROVED: Work meets user's actual request
- ISSUES: Problems found (file beads issues)
## Tools Available
- Read, Grep, Glob (read-only)
- orch (second opinions)
- bd (issue tracking)
Open Questions
- Do we need jwz or can beads handle session state?
- Should the reviewer be a separate skill or plugin?
- How do we handle the "review the reviewer" problem?
- What's the circuit breaker threshold (3 like alice)?
- Should this be opt-in (
#review) or always-on for certain contexts?