skills/docs/research/idle-alice-quality-gate.md
dan 4773abe56f docs: correct alice framing - adversarial agent review for automation
alice is for reviewing AGENT work in unattended/autonomous contexts,
not code review. Key use cases:
- Autonomous runs on ops-jrz1
- CI/CD pipelines with agents
- High-stakes changes without human oversight

Added hybrid approach recommendation: use alice concepts (Stop hook,
adversarial methodology) with our infrastructure (beads, orch).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 16:45:49 -08:00

294 lines
8.1 KiB
Markdown

# idle/alice Quality Gate Analysis
> **Date:** 2026-01-09
> **Status:** Research complete
> **Related:** [skills-9jk](../../.beads/), [ADR-005](../adr/005-dual-publish-plugin-architecture.md)
## Overview
**alice** (package name: idle) is a Claude Code plugin that mechanically enforces code quality by blocking agent exit until an independent reviewer (the alice agent) approves the work.
- **Repo:** https://github.com/evil-mind-evil-sword/idle
- **Language:** Zig
- **Author:** femtomc
- **License:** AGPL-3.0
## How It Works
### Activation
Opt-in per-prompt via `#alice` prefix:
```
#alice implement user authentication with JWT
```
The `UserPromptSubmit` hook detects this prefix and sets review state via jwz.
### Hook Chain
alice uses 6 Claude Code hooks:
| Hook | Purpose | Timeout |
|------|---------|---------|
| `SessionStart` | Initialize session state | 5s |
| `UserPromptSubmit` | Detect `#alice` prefix, enable review | 5s |
| `Stop` | **Block exit until approved** | 30s |
| `PostToolUse` | Track tool usage | 5s |
| `SubagentStop` | Validate alice posted decision | 5s |
| `SessionEnd` | Cleanup | 5s |
### The Stop Hook (Core Mechanism)
When agent tries to exit:
```
1. Load jwz store
2. Query "review:state:{session_id}" - is review enabled?
3. If not enabled → approve immediately
4. Query "alice:status:{session_id}" - did alice approve?
5. If decision == "COMPLETE" → reset state, allow exit
6. Otherwise → BLOCK, instruct agent to spawn alice
```
### hooks.json Structure
```json
{
"hooks": {
"SessionStart": [
{
"hooks": [
{
"type": "command",
"command": "alice hook session-start",
"timeout": 5
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "alice hook stop",
"timeout": 30
}
]
}
]
}
}
```
Each hook invokes the `alice` CLI with a subcommand. The CLI checks/updates state in jwz.
## State Management (jwz)
**jwz** is an append-only topic-based messaging system:
- Stores messages in `.jwz/messages.jsonl` (git-mergeable)
- SQLite cache for FTS5 search
- Auto-captures git context (commit, branch, dirty status)
- Topics like `review:state:{session}`, `alice:status:{session}`
Key jwz commands:
```bash
jwz post <topic> -m <message> # Post message
jwz read <topic> # Read topic
jwz search <query> # Full-text search
```
## The alice Agent
alice is a **read-only Opus-based reviewer**:
- **Model:** Claude Opus
- **Access:** Read-only (no file modifications)
- **Tools:** Read, Grep, Glob, Bash (restricted to `tissue` and `jwz`)
- **Philosophy:** "Work for the user, not the agent"
### Review Methodology
1. Compare deliverables against **user's actual words** (not agent claims)
2. Assume errors exist in complex work
3. Steel-man the strongest case, then attack it
4. Seek second opinions from Codex/Gemini
5. Post decision: `COMPLETE` or `ISSUES`
### Decision Output
alice posts to `alice:status:{session_id}`:
```json
{
"decision": "COMPLETE" | "ISSUES",
"summary": "...",
"reasoning": "...",
"second_opinions": [...],
"message_to_agent": "..."
}
```
## Circuit Breakers
Three safeguards against infinite loops:
1. **Stale Review Detection:** Same review blocks ≥3 times → fail open
2. **No-ID Blocks:** alice never posts decision → 3 blocks → fail open
3. **State Persistence:** Counters stored in jwz for recovery
## Key Design Principles
From emes architecture:
| Principle | Implementation |
|-----------|----------------|
| **Pull over push** | Agent retrieves context on-demand, not upfront |
| **Safety over policy** | Critical guardrails via hooks, not prompts |
| **Pointer over payload** | Messages contain references (IDs), not full content |
## Dependencies
**Required:**
- `jwz` - State management
- `tissue` - Issue tracking
- `jq` - JSON parsing in hooks
**Optional (for consensus):**
- `codex` - OpenAI CLI
- `gemini` - Google CLI
## Use Case: Adversarial Agent Review
alice is NOT about code review - it's about **adversarial review of agent work** in automation contexts.
### The Problem alice Solves
When agents work autonomously (unattended, CI/CD, batch jobs):
- Agent may claim "done" when work is incomplete
- Agent may miss requirements or make incorrect assumptions
- Agent may take shortcuts that don't serve user intent
- No human in the loop to catch mistakes
### alice's Solution
An independent adversarial reviewer (Opus) that:
1. **Works for the user, not the agent** - grounds truth in user's actual words
2. **Assumes errors exist** - actively looks for problems
3. **Steel-mans then attacks** - gives benefit of doubt, then systematically challenges
4. **Seeks external opinions** - uses Codex/Gemini for second opinions
5. **Mechanically blocks exit** - can't be talked out of it via prompt
### When to Use alice Pattern
| Scenario | Why alice helps |
|----------|-----------------|
| **Autonomous/unattended runs** | No human watching - need automated QA |
| **CI/CD with agents** | Quality gate before merge |
| **Complex multi-step features** | Verify each deliverable meets requirements |
| **Refactoring** | Ensure nothing broke |
| **ops-jrz1 deployment** | Remote server, less oversight |
### When NOT to Use
- Interactive sessions with human oversight
- Simple, low-risk changes
- Exploratory/research work (no deliverable to review)
## Applicability to Our Workflow
### Potential Use Cases
1. **Autonomous runs on ops-jrz1**
- Agent implements feature on VPS
- alice reviews before agent exits
- Issues filed to tissue if problems found
2. **Batch processing**
- Agent processes multiple tasks
- alice spot-checks work quality
3. **High-stakes changes**
- Security-sensitive code
- Infrastructure changes
- Production deployments
### Integration Options
| Approach | Pros | Cons |
|----------|------|------|
| **A: Adopt alice directly** | Battle-tested, full features | Requires jwz, tissue, Zig deps |
| **B: Build our own** | Tailored to our needs, use beads | Dev effort, reinventing wheel |
| **C: Hybrid** | Use alice concepts, our infra | Best of both, some integration work |
| **D: orch-as-reviewer** | Already have orch for multi-model | Different purpose, not adversarial |
### Hybrid Approach (Recommended)
Use alice's **concepts** with our **infrastructure**:
1. **Stop hook** - Block exit until review passes
2. **beads for state** - Track review status per session
3. **orch for second opinions** - We already have multi-model consensus
4. **Adversarial prompt** - Adapt alice's methodology
Example hooks.json:
```json
{
"hooks": {
"Stop": [{
"hooks": [{
"type": "command",
"command": "review-gate check",
"timeout": 30
}]
}]
}
}
```
`review-gate` would:
1. Check if review mode is enabled (beads flag or env var)
2. If enabled, check for approval in beads
3. If unapproved, block and instruct agent to spawn reviewer
4. Circuit breaker after N failures
### Reviewer Agent Design
Inspired by alice but using our tools:
```markdown
# Adversarial Reviewer
You review agent work for the USER, not the agent.
## Methodology
1. Read the user's original request (not agent's summary)
2. Examine all changes made (git diff, file reads)
3. Assume errors exist - find them
4. Use orch for second opinions on non-trivial work
5. Post decision to beads
## Decision
- APPROVED: Work meets user's actual request
- ISSUES: Problems found (file beads issues)
## Tools Available
- Read, Grep, Glob (read-only)
- orch (second opinions)
- bd (issue tracking)
```
## Open Questions
1. Do we need jwz or can beads handle session state?
2. Should the reviewer be a separate skill or plugin?
3. How do we handle the "review the reviewer" problem?
4. What's the circuit breaker threshold (3 like alice)?
5. Should this be opt-in (`#review`) or always-on for certain contexts?
## References
- [alice/idle repo](https://github.com/evil-mind-evil-sword/idle)
- [jwz repo](https://github.com/evil-mind-evil-sword/jwz)
- [Claude Code Hooks Docs](https://code.claude.com/docs/en/hooks)