dan/skills

dan 4773abe56f docs: correct alice framing - adversarial agent review for automation

alice is for reviewing AGENT work in unattended/autonomous contexts,
not code review. Key use cases:
- Autonomous runs on ops-jrz1
- CI/CD pipelines with agents
- High-stakes changes without human oversight

Added hybrid approach recommendation: use alice concepts (Stop hook,
adversarial methodology) with our infrastructure (beads, orch).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-09 16:45:49 -08:00

8.1 KiB

Raw Blame History

idle/alice Quality Gate Analysis

Date: 2026-01-09 Status: Research complete Related: skills-9jk, ADR-005

Overview

alice (package name: idle) is a Claude Code plugin that mechanically enforces code quality by blocking agent exit until an independent reviewer (the alice agent) approves the work.

Repo: https://github.com/evil-mind-evil-sword/idle
Language: Zig
Author: femtomc
License: AGPL-3.0

How It Works

Activation

Opt-in per-prompt via #alice prefix:

#alice implement user authentication with JWT

The UserPromptSubmit hook detects this prefix and sets review state via jwz.

Hook Chain

alice uses 6 Claude Code hooks:

Hook	Purpose	Timeout
`SessionStart`	Initialize session state	5s
`UserPromptSubmit`	Detect `#alice` prefix, enable review	5s
`Stop`	Block exit until approved	30s
`PostToolUse`	Track tool usage	5s
`SubagentStop`	Validate alice posted decision	5s
`SessionEnd`	Cleanup	5s

The Stop Hook (Core Mechanism)

When agent tries to exit:

1. Load jwz store
2. Query "review:state:{session_id}" - is review enabled?
3. If not enabled → approve immediately
4. Query "alice:status:{session_id}" - did alice approve?
5. If decision == "COMPLETE" → reset state, allow exit
6. Otherwise → BLOCK, instruct agent to spawn alice

hooks.json Structure

{
  "hooks": {
    "SessionStart": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "alice hook session-start",
            "timeout": 5
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "alice hook stop",
            "timeout": 30
          }
        ]
      }
    ]
  }
}

Each hook invokes the alice CLI with a subcommand. The CLI checks/updates state in jwz.

State Management (jwz)

jwz is an append-only topic-based messaging system:

Stores messages in .jwz/messages.jsonl (git-mergeable)
SQLite cache for FTS5 search
Auto-captures git context (commit, branch, dirty status)
Topics like review:state:{session}, alice:status:{session}

Key jwz commands:

jwz post <topic> -m <message>     # Post message
jwz read <topic>                   # Read topic
jwz search <query>                 # Full-text search

The alice Agent

alice is a read-only Opus-based reviewer:

Model: Claude Opus
Access: Read-only (no file modifications)
Tools: Read, Grep, Glob, Bash (restricted to tissue and jwz)
Philosophy: "Work for the user, not the agent"

Review Methodology

Compare deliverables against user's actual words (not agent claims)
Assume errors exist in complex work
Steel-man the strongest case, then attack it
Seek second opinions from Codex/Gemini
Post decision: COMPLETE or ISSUES

Decision Output

alice posts to alice:status:{session_id}:

{
  "decision": "COMPLETE" | "ISSUES",
  "summary": "...",
  "reasoning": "...",
  "second_opinions": [...],
  "message_to_agent": "..."
}

Circuit Breakers

Three safeguards against infinite loops:

Stale Review Detection: Same review blocks ≥3 times → fail open
No-ID Blocks: alice never posts decision → 3 blocks → fail open
State Persistence: Counters stored in jwz for recovery

Key Design Principles

From emes architecture:

Principle	Implementation
Pull over push	Agent retrieves context on-demand, not upfront
Safety over policy	Critical guardrails via hooks, not prompts
Pointer over payload	Messages contain references (IDs), not full content

Dependencies

Required:

jwz - State management
tissue - Issue tracking
jq - JSON parsing in hooks

Optional (for consensus):

codex - OpenAI CLI
gemini - Google CLI

Use Case: Adversarial Agent Review

alice is NOT about code review - it's about adversarial review of agent work in automation contexts.

The Problem alice Solves

When agents work autonomously (unattended, CI/CD, batch jobs):

Agent may claim "done" when work is incomplete
Agent may miss requirements or make incorrect assumptions
Agent may take shortcuts that don't serve user intent
No human in the loop to catch mistakes

alice's Solution

An independent adversarial reviewer (Opus) that:

Works for the user, not the agent - grounds truth in user's actual words
Assumes errors exist - actively looks for problems
Steel-mans then attacks - gives benefit of doubt, then systematically challenges
Seeks external opinions - uses Codex/Gemini for second opinions
Mechanically blocks exit - can't be talked out of it via prompt

When to Use alice Pattern

Scenario	Why alice helps
Autonomous/unattended runs	No human watching - need automated QA
CI/CD with agents	Quality gate before merge
Complex multi-step features	Verify each deliverable meets requirements
Refactoring	Ensure nothing broke
ops-jrz1 deployment	Remote server, less oversight

When NOT to Use

Interactive sessions with human oversight
Simple, low-risk changes
Exploratory/research work (no deliverable to review)

Applicability to Our Workflow

Potential Use Cases

Autonomous runs on ops-jrz1
- Agent implements feature on VPS
- alice reviews before agent exits
- Issues filed to tissue if problems found
Batch processing
- Agent processes multiple tasks
- alice spot-checks work quality
High-stakes changes
- Security-sensitive code
- Infrastructure changes
- Production deployments

Integration Options

Approach	Pros	Cons
A: Adopt alice directly	Battle-tested, full features	Requires jwz, tissue, Zig deps
B: Build our own	Tailored to our needs, use beads	Dev effort, reinventing wheel
C: Hybrid	Use alice concepts, our infra	Best of both, some integration work
D: orch-as-reviewer	Already have orch for multi-model	Different purpose, not adversarial

Hybrid Approach (Recommended)

Use alice's concepts with our infrastructure:

Stop hook - Block exit until review passes
beads for state - Track review status per session
orch for second opinions - We already have multi-model consensus
Adversarial prompt - Adapt alice's methodology

Example hooks.json:

{
  "hooks": {
    "Stop": [{
      "hooks": [{
        "type": "command",
        "command": "review-gate check",
        "timeout": 30
      }]
    }]
  }
}

review-gate would:

Check if review mode is enabled (beads flag or env var)
If enabled, check for approval in beads
If unapproved, block and instruct agent to spawn reviewer
Circuit breaker after N failures

Reviewer Agent Design

Inspired by alice but using our tools:

# Adversarial Reviewer

You review agent work for the USER, not the agent.

## Methodology
1. Read the user's original request (not agent's summary)
2. Examine all changes made (git diff, file reads)
3. Assume errors exist - find them
4. Use orch for second opinions on non-trivial work
5. Post decision to beads

## Decision
- APPROVED: Work meets user's actual request
- ISSUES: Problems found (file beads issues)

## Tools Available
- Read, Grep, Glob (read-only)
- orch (second opinions)
- bd (issue tracking)

Open Questions

Do we need jwz or can beads handle session state?
Should the reviewer be a separate skill or plugin?
How do we handle the "review the reviewer" problem?
What's the circuit breaker threshold (3 like alice)?
Should this be opt-in (#review) or always-on for certain contexts?

8.1 KiB Raw Blame History