skills/docs/research/idle-alice-quality-gate.md
dan 4773abe56f docs: correct alice framing - adversarial agent review for automation
alice is for reviewing AGENT work in unattended/autonomous contexts,
not code review. Key use cases:
- Autonomous runs on ops-jrz1
- CI/CD pipelines with agents
- High-stakes changes without human oversight

Added hybrid approach recommendation: use alice concepts (Stop hook,
adversarial methodology) with our infrastructure (beads, orch).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 16:45:49 -08:00

8.1 KiB

idle/alice Quality Gate Analysis

Date: 2026-01-09 Status: Research complete Related: skills-9jk, ADR-005

Overview

alice (package name: idle) is a Claude Code plugin that mechanically enforces code quality by blocking agent exit until an independent reviewer (the alice agent) approves the work.

How It Works

Activation

Opt-in per-prompt via #alice prefix:

#alice implement user authentication with JWT

The UserPromptSubmit hook detects this prefix and sets review state via jwz.

Hook Chain

alice uses 6 Claude Code hooks:

Hook Purpose Timeout
SessionStart Initialize session state 5s
UserPromptSubmit Detect #alice prefix, enable review 5s
Stop Block exit until approved 30s
PostToolUse Track tool usage 5s
SubagentStop Validate alice posted decision 5s
SessionEnd Cleanup 5s

The Stop Hook (Core Mechanism)

When agent tries to exit:

1. Load jwz store
2. Query "review:state:{session_id}" - is review enabled?
3. If not enabled → approve immediately
4. Query "alice:status:{session_id}" - did alice approve?
5. If decision == "COMPLETE" → reset state, allow exit
6. Otherwise → BLOCK, instruct agent to spawn alice

hooks.json Structure

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "alice hook session-start",
            "timeout": 5
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "alice hook stop",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

Each hook invokes the alice CLI with a subcommand. The CLI checks/updates state in jwz.

State Management (jwz)

jwz is an append-only topic-based messaging system:

  • Stores messages in .jwz/messages.jsonl (git-mergeable)
  • SQLite cache for FTS5 search
  • Auto-captures git context (commit, branch, dirty status)
  • Topics like review:state:{session}, alice:status:{session}

Key jwz commands:

jwz post <topic> -m <message>     # Post message
jwz read <topic>                   # Read topic
jwz search <query>                 # Full-text search

The alice Agent

alice is a read-only Opus-based reviewer:

  • Model: Claude Opus
  • Access: Read-only (no file modifications)
  • Tools: Read, Grep, Glob, Bash (restricted to tissue and jwz)
  • Philosophy: "Work for the user, not the agent"

Review Methodology

  1. Compare deliverables against user's actual words (not agent claims)
  2. Assume errors exist in complex work
  3. Steel-man the strongest case, then attack it
  4. Seek second opinions from Codex/Gemini
  5. Post decision: COMPLETE or ISSUES

Decision Output

alice posts to alice:status:{session_id}:

{
  "decision": "COMPLETE" | "ISSUES",
  "summary": "...",
  "reasoning": "...",
  "second_opinions": [...],
  "message_to_agent": "..."
}

Circuit Breakers

Three safeguards against infinite loops:

  1. Stale Review Detection: Same review blocks ≥3 times → fail open
  2. No-ID Blocks: alice never posts decision → 3 blocks → fail open
  3. State Persistence: Counters stored in jwz for recovery

Key Design Principles

From emes architecture:

Principle Implementation
Pull over push Agent retrieves context on-demand, not upfront
Safety over policy Critical guardrails via hooks, not prompts
Pointer over payload Messages contain references (IDs), not full content

Dependencies

Required:

  • jwz - State management
  • tissue - Issue tracking
  • jq - JSON parsing in hooks

Optional (for consensus):

  • codex - OpenAI CLI
  • gemini - Google CLI

Use Case: Adversarial Agent Review

alice is NOT about code review - it's about adversarial review of agent work in automation contexts.

The Problem alice Solves

When agents work autonomously (unattended, CI/CD, batch jobs):

  • Agent may claim "done" when work is incomplete
  • Agent may miss requirements or make incorrect assumptions
  • Agent may take shortcuts that don't serve user intent
  • No human in the loop to catch mistakes

alice's Solution

An independent adversarial reviewer (Opus) that:

  1. Works for the user, not the agent - grounds truth in user's actual words
  2. Assumes errors exist - actively looks for problems
  3. Steel-mans then attacks - gives benefit of doubt, then systematically challenges
  4. Seeks external opinions - uses Codex/Gemini for second opinions
  5. Mechanically blocks exit - can't be talked out of it via prompt

When to Use alice Pattern

Scenario Why alice helps
Autonomous/unattended runs No human watching - need automated QA
CI/CD with agents Quality gate before merge
Complex multi-step features Verify each deliverable meets requirements
Refactoring Ensure nothing broke
ops-jrz1 deployment Remote server, less oversight

When NOT to Use

  • Interactive sessions with human oversight
  • Simple, low-risk changes
  • Exploratory/research work (no deliverable to review)

Applicability to Our Workflow

Potential Use Cases

  1. Autonomous runs on ops-jrz1

    • Agent implements feature on VPS
    • alice reviews before agent exits
    • Issues filed to tissue if problems found
  2. Batch processing

    • Agent processes multiple tasks
    • alice spot-checks work quality
  3. High-stakes changes

    • Security-sensitive code
    • Infrastructure changes
    • Production deployments

Integration Options

Approach Pros Cons
A: Adopt alice directly Battle-tested, full features Requires jwz, tissue, Zig deps
B: Build our own Tailored to our needs, use beads Dev effort, reinventing wheel
C: Hybrid Use alice concepts, our infra Best of both, some integration work
D: orch-as-reviewer Already have orch for multi-model Different purpose, not adversarial

Use alice's concepts with our infrastructure:

  1. Stop hook - Block exit until review passes
  2. beads for state - Track review status per session
  3. orch for second opinions - We already have multi-model consensus
  4. Adversarial prompt - Adapt alice's methodology

Example hooks.json:

{
  "hooks": {
    "Stop": [{
      "hooks": [{
        "type": "command",
        "command": "review-gate check",
        "timeout": 30
      }]
    }]
  }
}

review-gate would:

  1. Check if review mode is enabled (beads flag or env var)
  2. If enabled, check for approval in beads
  3. If unapproved, block and instruct agent to spawn reviewer
  4. Circuit breaker after N failures

Reviewer Agent Design

Inspired by alice but using our tools:

# Adversarial Reviewer

You review agent work for the USER, not the agent.

## Methodology
1. Read the user's original request (not agent's summary)
2. Examine all changes made (git diff, file reads)
3. Assume errors exist - find them
4. Use orch for second opinions on non-trivial work
5. Post decision to beads

## Decision
- APPROVED: Work meets user's actual request
- ISSUES: Problems found (file beads issues)

## Tools Available
- Read, Grep, Glob (read-only)
- orch (second opinions)
- bd (issue tracking)

Open Questions

  1. Do we need jwz or can beads handle session state?
  2. Should the reviewer be a separate skill or plugin?
  3. How do we handle the "review the reviewer" problem?
  4. What's the circuit breaker threshold (3 like alice)?
  5. Should this be opt-in (#review) or always-on for certain contexts?

References