docs: cross-agent enforcement architecture design

Comprehensive design covering: - Abstract layers (message passing, memory, enforcement) - Four enforcement strategies: - Hook-based (Claude/Gemini) - Orchestrator-enforced (OpenCode/Codex) - Validator sidecar (universal) - Proxy-based (API interception) - Circuit breakers (semantic drift, three-strike, budget) - Adversarial reviewer pattern - State flow diagram - Implementation phases Based on web research via orch (gemini --websearch). Addresses: skills-8sj Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 19:51:09 -08:00 · 2026-01-09 19:51:09 -08:00 · 75c5edb86c
parent 8c033eedd1
commit 75c5edb86c
2 changed files with 472 additions and 1 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
@ -41,7 +41,7 @@
 {"id":"skills-8d9","title":"Add conversational patterns to orch skill","description":"## Context\nThe orch skill currently documents consensus and single-shot chat, but doesn't\nteach agents how to use orch for multi-turn conversations with external AIs.\n\n## Goal\nAdd documentation and patterns for agent-driven conversations where the calling\nagent (Claude Code) orchestrates multi-turn dialogues using orch primitives.\n\n## Patterns to document\n\n### Session-based multi-turn\n```bash\n# Initial query\nRESPONSE=$(orch chat \"Analyze this\" --model claude --format json)\nSESSION=$(echo \"$RESPONSE\" | jq -r .session_id)\n\n# Continue conversation\norch chat \"Elaborate on X\" --model claude --session $SESSION\n\n# Inspect state\norch sessions info $SESSION\norch sessions show $SESSION --last 2 --format text\n```\n\n### Cross-model dialogue\n```bash\n# Get one model's take\nCLAUDE=$(orch chat \"Review this\" --model claude --format json)\nCLAUDE_SAYS=$(echo \"$CLAUDE\" | jq -r '.responses[0].content')\n\n# Ask another model to respond\norch chat \"Claude said: $CLAUDE_SAYS\n\nWhat's your perspective?\" --model gemini\n```\n\n### When to use conversations vs consensus\n- Consensus: quick parallel opinions on a decision\n- Conversation: deeper exploration, follow-up questions, iterative refinement\n\n## Files\n- skills/orch/SKILL.md\n\n## Related\n- orch-c3r: Design: Session introspection for agent-driven conversations (in orch repo)","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-18T19:57:28.201494288-08:00","updated_at":"2025-12-29T15:34:16.254181578-05:00","closed_at":"2025-12-29T15:34:16.254181578-05:00","close_reason":"Added conversational patterns section to orch SKILL.md: sessions, cross-model dialogue, iterative refinement, consensus vs chat guidance."}
 {"id":"skills-8ma","title":"worklog skill: remove org-mode references, use markdown instead","description":"The worklog skill currently references org-mode format (.org files) in the template and instructions. Update to use markdown (.md) instead:\n\n1. Update ~/.claude/skills/worklog/templates/worklog-template.org → worklog-template.md\n2. Convert org-mode syntax to markdown (#+TITLE → # Title, * → ##, etc.)\n3. Update skill instructions to reference .md files\n4. Update suggest-filename.sh to output .md extension\n\nContext: org-mode is less widely supported than markdown in tooling and editors.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-31T08:43:55.761429693-05:00","created_by":"dan","updated_at":"2026-01-02T00:13:05.338810905-05:00","closed_at":"2026-01-02T00:13:05.338810905-05:00","close_reason":"Migrated worklog skill from org-mode to markdown. Template, scripts, and SKILL.md updated. Backward compatible with existing .org files."}
 {"id":"skills-8nl","title":"Fix: Gemini path restrictions for skills (skills-bo8)","description":"Concrete fix for Gemini not reading ~/.claude/skills/.\n\n## Problem\nGemini's ReadFile tool restricts to workspace directories.\nSymlinked ~/.claude/skills/ is blocked.\n\n## Options\n\n### A: Copy skills into workspace\n- Add skills/ to project repos\n- Pro: Works immediately\n- Con: Duplication, sync issues\n\n### B: Shell workaround in skill\n- Use \\`cat\\` instead of ReadFile\n- Pro: No duplication\n- Con: Fragile, skill must know about limitation\n\n### C: Configure Gemini allowed paths\n- Research if Gemini has path config\n- Pro: Clean solution\n- Con: May not exist\n\n### D: MCP server for skills\n- Skills exposed via MCP\n- Pro: Agent-agnostic\n- Con: Complexity, user said not interested in MCP\n\n## Deliverable\nWorking solution for Gemini to read skills","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-09T19:32:57.683370528-08:00","created_by":"dan","updated_at":"2026-01-09T19:35:28.05552467-08:00","closed_at":"2026-01-09T19:35:28.05552467-08:00","close_reason":"Fix found: Gemini includeDirectories setting"}
-{"id":"skills-8sj","title":"Design: Cross-agent enforcement architecture","description":"Unified design for cross-agent quality gates and coordination.\n\nConsolidates: skills-3gk, skills-3ja, skills-thk, skills-6fu\n\n## Abstract Layers\n\n### 1. Message Passing Layer\n- **Purpose:** Async agent coordination, session handoffs\n- **Interface:** post(topic, message), read(topic), reply(id, message)\n- **Requirements:** Append-only, git-mergeable, agent-attributed\n\n### 2. Memory Layer  \n- **Purpose:** Persistent work items, review state\n- **Interface:** create(issue), update(id, state), query(filters)\n- **Requirements:** Cross-session, dependency tracking, searchable\n\n### 3. Enforcement Layer\n- **Purpose:** Quality gates, completion blocking\n- **Interface:** check_gate(session) -\u003e allow/block, register_reviewer(session)\n\n## Enforcement Strategies\n\n| Agent | Mechanism | Strength |\n|-------|-----------|----------|\n| Claude Code | Stop hook | Mechanical |\n| Gemini CLI | Stop hook | Mechanical |\n| OpenCode | Orchestrator | Protocol |\n| Codex | Orchestrator | Protocol |\n| Any | Wrapper script | External |\n\n## State Schema\n\n```\nreview_state:\n  session_id: string\n  status: pending | in_review | approved | rejected\n  worker_agent: string\n  reviewer_agent: string\n  issues_found: [issue_ids]\n  approved_at: timestamp\n```\n\n## Circuit Breakers\n- Semantic drift detection\n- Three-strike tool failures  \n- Budget/time limits\n\n## Deliverables\n1. Architecture diagram\n2. Interface definitions\n3. State schema\n4. Hook configuration templates\n5. Orchestrator flow pseudocode","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-09T19:33:31.801684987-08:00","created_by":"dan","updated_at":"2026-01-09T19:33:31.801684987-08:00"}
+{"id":"skills-8sj","title":"Design: Cross-agent enforcement architecture","description":"Unified design for cross-agent quality gates and coordination.\n\nConsolidates: skills-3gk, skills-3ja, skills-thk, skills-6fu\n\n## Abstract Layers\n\n### 1. Message Passing Layer\n- **Purpose:** Async agent coordination, session handoffs\n- **Interface:** post(topic, message), read(topic), reply(id, message)\n- **Requirements:** Append-only, git-mergeable, agent-attributed\n\n### 2. Memory Layer  \n- **Purpose:** Persistent work items, review state\n- **Interface:** create(issue), update(id, state), query(filters)\n- **Requirements:** Cross-session, dependency tracking, searchable\n\n### 3. Enforcement Layer\n- **Purpose:** Quality gates, completion blocking\n- **Interface:** check_gate(session) -\u003e allow/block, register_reviewer(session)\n\n## Enforcement Strategies\n\n| Agent | Mechanism | Strength |\n|-------|-----------|----------|\n| Claude Code | Stop hook | Mechanical |\n| Gemini CLI | Stop hook | Mechanical |\n| OpenCode | Orchestrator | Protocol |\n| Codex | Orchestrator | Protocol |\n| Any | Wrapper script | External |\n\n## State Schema\n\n```\nreview_state:\n  session_id: string\n  status: pending | in_review | approved | rejected\n  worker_agent: string\n  reviewer_agent: string\n  issues_found: [issue_ids]\n  approved_at: timestamp\n```\n\n## Circuit Breakers\n- Semantic drift detection\n- Three-strike tool failures  \n- Budget/time limits\n\n## Deliverables\n1. Architecture diagram\n2. Interface definitions\n3. State schema\n4. Hook configuration templates\n5. Orchestrator flow pseudocode","status":"in_progress","priority":2,"issue_type":"task","created_at":"2026-01-09T19:33:31.801684987-08:00","created_by":"dan","updated_at":"2026-01-09T19:37:55.87091058-08:00"}
 {"id":"skills-8v0","title":"Consolidate skill list definitions (flake.nix + ai-skills.nix)","description":"Skill list duplicated in:\n- flake.nix (lines 15-27)\n- modules/ai-skills.nix (lines 8-18)\n\nIssues:\n- Manual sync required when adding skills\n- No validation that referenced skills exist\n\nFix:\n- Single source of truth for skill list\n- Consider generating one from the other\n\nSeverity: MEDIUM","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-24T02:51:14.432158871-05:00","updated_at":"2026-01-03T12:06:23.731969973-08:00","closed_at":"2026-01-03T12:06:23.731969973-08:00","close_reason":"Created skills.nix as single source of truth for skill names and descriptions. Updated flake.nix and Home Manager module to use it."}
 {"id":"skills-8y6","title":"Define skill versioning strategy","description":"Git SHA alone is insufficient. Need tuple approach:\n\n- skill_source_rev: git SHA (if available)\n- skill_content_hash: hash of SKILL.md + scripts\n- runtime_ref: flake.lock hash or Nix store path\n\nQuestions to resolve:\n- Do Protos pin to versions (stable but maintenance) or float on latest (risky)?\n- How to handle breaking changes in skills?\n- Record in wisp trace vs proto definition?\n\nFrom consensus: both models flagged versioning instability as high severity.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T19:49:30.839064445-05:00","updated_at":"2025-12-23T20:55:04.439779336-05:00","closed_at":"2025-12-23T20:55:04.439779336-05:00","close_reason":"ADRs revised with orch consensus feedback"}
 {"id":"skills-9af","title":"spec-review: Add spike/research task handling","description":"Tasks like 'Investigate X' can linger without clear outcomes.\n\nAdd to REVIEW_TASKS:\n- Flag research/spike tasks\n- Require timebox and concrete outputs (decision record, prototype, risks)\n- Pattern for handling unknowns","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:26.887719136-08:00","updated_at":"2025-12-15T14:08:13.441095034-08:00","closed_at":"2025-12-15T14:08:13.441095034-08:00"}
--- a/docs/design/cross-agent-enforcement-architecture.md
+++ b/docs/design/cross-agent-enforcement-architecture.md
@ -0,0 +1,471 @@
+# Cross-Agent Enforcement Architecture
+
+> **Date:** 2026-01-09
+> **Status:** Draft
+> **Issue:** skills-8sj
+> **Research:** Web research via orch (gemini --websearch)
+
+## Executive Summary
+
+Design for quality gates and agent coordination that works across Claude Code, Gemini CLI, OpenCode, and Codex - regardless of which agent plays orchestrator, worker, or reviewer.
+
+**Core principle:** External enforcement, not internal trust. The orchestrator/infrastructure enforces gates, not agent prompts.
+
+---
+
+## Abstract Layers
+
+### Layer 1: Message Passing
+
+**Purpose:** Async agent coordination, session handoffs, status updates
+
+| Requirement | Description |
+|-------------|-------------|
+| Append-only | No overwrites, git-mergeable |
+| Agent-attributed | Know which agent posted (model, role) |
+| Topic-based | Namespaced conversations |
+| Git-native | Lives in repo, auto-commits |
+
+**Interface:**
+```
+post(topic, message, metadata) -> message_id
+read(topic, since?) -> [messages]
+reply(message_id, message) -> message_id
+```
+
+**Implementation options:**
+- jwz (emes) - Zig, full-featured
+- JSONL files with watcher
+- beads extension
+
+**Recommended format:** JSONL with sentinel trick for merge-safety
+```jsonl
+{"id": "01ABC", "topic": "review:session-123", "agent": "claude", "body": "Starting review", "ts": 1736456789}
+{"id": "01ABD", "topic": "review:session-123", "agent": "gemini", "body": "APPROVED", "ts": 1736456800}
+<!-- SENTINEL -->
+```
+
+---
+
+### Layer 2: Memory
+
+**Purpose:** Persistent work items, review state, dependencies
+
+| Requirement | Description |
+|-------------|-------------|
+| Cross-session | Survives compaction, restarts |
+| Dependency tracking | Issue X blocks issue Y |
+| Queryable | Find by status, type, assignee |
+| Git-native | Versioned with code |
+
+**Interface:**
+```
+create(issue) -> issue_id
+update(id, fields) -> issue
+query(filters) -> [issues]
+close(id, reason) -> issue
+```
+
+**Implementation:** beads (already have this)
+
+**Review state schema:**
+```yaml
+review_state:
+  session_id: string
+  status: pending | in_review | approved | rejected
+  worker_agent: string      # e.g., "claude-sonnet-4.5"
+  reviewer_agent: string    # e.g., "gemini-pro"
+  issues_found: [issue_ids] # beads issues if rejected
+  attempts: number          # for circuit breaker
+  created_at: timestamp
+  updated_at: timestamp
+```
+
+---
+
+### Layer 3: Enforcement
+
+**Purpose:** Quality gates that block completion until approved
+
+**Key insight:** Enforcement must be EXTERNAL to the worker agent. You cannot trust an agent to enforce its own gates.
+
+---
+
+## Enforcement Strategies
+
+### Strategy A: Hook-Based (Claude Code, Gemini CLI)
+
+For agents with lifecycle hooks, use Stop hook to block exit.
+
+```json
+{
+  "hooks": {
+    "Stop": [{
+      "hooks": [{
+        "type": "command",
+        "command": "review-gate check",
+        "timeout": 30
+      }]
+    }]
+  }
+}
+```
+
+**review-gate CLI:**
+```bash
+#!/bin/bash
+# review-gate check
+
+SESSION_ID=${CLAUDE_SESSION_ID:-$(cat .state/session_id)}
+STATE=$(bd show "review:$SESSION_ID" --format json 2>/dev/null)
+
+if [ -z "$STATE" ]; then
+  # No review registered - allow exit
+  exit 0
+fi
+
+STATUS=$(echo "$STATE" | jq -r '.status')
+
+case "$STATUS" in
+  approved)
+    exit 0  # Allow exit
+    ;;
+  rejected|pending|in_review)
+    echo "BLOCKED: Review not approved. Status: $STATUS"
+    echo "Spawn reviewer with: /review"
+    exit 1  # Block exit
+    ;;
+esac
+```
+
+**Strength:** Mechanical - agent cannot bypass
+
+---
+
+### Strategy B: Orchestrator-Enforced (OpenCode, Codex, any)
+
+For agents without hooks, orchestrator controls the session.
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  ORCHESTRATOR (shell script or agent)                        │
+│                                                              │
+│  1. Start worker session                                     │
+│  2. Worker does task, signals "done"                         │
+│  3. Orchestrator checks: Is review registered?               │
+│     NO  → Accept done, exit                                  │
+│     YES → Spawn reviewer, wait for verdict                   │
+│  4. If APPROVED → Exit                                       │
+│  5. If REJECTED → Feed issues back to worker, loop           │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Implementation:**
+```bash
+#!/bin/bash
+# orchestrator.sh
+
+# Start worker
+opencode --session worker-123 --prompt "$TASK"
+
+# Check if review needed
+if bd show "review:worker-123" 2>/dev/null; then
+  # Spawn reviewer
+  claude --session reviewer-123 --prompt "Review work in session worker-123.
+    Post APPROVED or REJECTED with issues to beads."
+
+  # Check verdict
+  VERDICT=$(bd show "review:worker-123" --format json | jq -r '.status')
+
+  if [ "$VERDICT" != "approved" ]; then
+    # Feed back to worker
+    ISSUES=$(bd list --blocked-by "review:worker-123")
+    opencode --session worker-123 --prompt "Fix these issues: $ISSUES"
+    # Loop...
+  fi
+fi
+```
+
+**Strength:** Works with any agent that can run in a script
+
+---
+
+### Strategy C: Validator Sidecar (Universal)
+
+Agent outputs to staging area. Validator checks before "done" is accepted.
+
+```
+Worker Agent                    Validator
+     │                              │
+     ├─── writes to staging/ ──────►│
+     │                              │
+     ├─── signals "done" ──────────►│ checks artifacts
+     │                              │
+     │◄── PASS or FAIL + reasons ───┤
+     │                              │
+     └─── if FAIL, retry ──────────►│
+```
+
+**Validator checks:**
+- Does output file exist?
+- Does code compile/lint?
+- Do tests pass?
+- Does schema validate?
+
+**Strength:** Works without any agent cooperation - pure infrastructure
+
+---
+
+### Strategy D: Proxy-Based (API interception)
+
+Intercept critical tool calls, require approval before execution.
+
+```
+Agent ──► Proxy ──► Tool
+            │
+            ▼
+    [If critical tool]
+            │
+            ▼
+    Pause, notify human
+            │
+            ▼
+    Wait for approval
+            │
+            ▼
+    Execute or reject
+```
+
+**Critical tools:** write_file, bash (rm, deploy), send_email, git push
+
+**Strength:** Agent perceives slow response or permission error - no special logic needed
+
+---
+
+## Circuit Breakers
+
+Prevent infinite loops when worker keeps failing review.
+
+### Semantic Drift Detection
+
+```python
+def check_semantic_drift(thoughts: list[str], threshold=0.95) -> bool:
+    """Return True if agent is stuck repeating itself."""
+    if len(thoughts) < 3:
+        return False
+
+    embeddings = embed(thoughts[-3:])
+    similarities = pairwise_cosine(embeddings)
+
+    return all(sim > threshold for sim in similarities)
+```
+
+**Action:** Inject "You are stuck. Try completely different approach."
+
+### Three-Strike Rule
+
+```python
+tool_errors = defaultdict(list)
+
+def on_tool_error(tool: str, args: dict, error: str):
+    sig = hash(f"{tool}:{args}:{error}")
+    tool_errors[tool].append(sig)
+
+    if tool_errors[tool][-3:].count(sig) >= 3:
+        inject("STOP. Same error 3 times. Use different tool/approach.")
+        tool_errors[tool].clear()
+```
+
+### Budget Limits
+
+```python
+MAX_REVIEW_ATTEMPTS = 3
+MAX_TOKENS_PER_TASK = 50000
+MAX_TIME_PER_TASK = 1800  # 30 minutes
+
+def check_limits(session):
+    if session.review_attempts >= MAX_REVIEW_ATTEMPTS:
+        escalate_to_human("Review failed 3 times")
+        return ABORT
+
+    if session.tokens_used >= MAX_TOKENS_PER_TASK:
+        escalate_to_human("Token budget exceeded")
+        return ABORT
+
+    if session.elapsed >= MAX_TIME_PER_TASK:
+        escalate_to_human("Time limit exceeded")
+        return ABORT
+
+    return CONTINUE
+```
+
+---
+
+## Adversarial Review Pattern
+
+The reviewer works for the USER, not the worker agent.
+
+### Reviewer Prompt Template
+
+```markdown
+# Adversarial Reviewer
+
+You are reviewing work done by another agent. Your job is to find problems.
+
+## Ground Truth
+The USER requested: [original user prompt]
+
+## Your Methodology
+1. Read the user's EXACT words, not the agent's summary
+2. Examine ALL changes (git diff, file contents)
+3. ASSUME errors exist - find them
+4. Steel-man the work, then systematically attack it
+5. Use orch for second opinions on non-trivial findings
+
+## Your Tools (READ-ONLY)
+- Read, Grep, Glob (examine files)
+- Bash (git diff, git log only)
+- orch (second opinions)
+- bd (file issues)
+
+## Your Decision
+Post to beads topic "review:{session_id}":
+
+If APPROVED:
+  - Status: approved
+  - Summary: Brief confirmation
+
+If REJECTED:
+  - Status: rejected
+  - Create beads issues for each problem found
+  - Link issues to review topic
+```
+
+### Multi-Model Verification
+
+For high-stakes work, use orch for consensus:
+
+```bash
+orch consensus "Review this code change for security issues:
+
+$(git diff HEAD~1)
+
+Is this safe to deploy?" gemini claude deepseek --mode critique
+```
+
+---
+
+## State Flow Diagram
+
+```
+                    ┌─────────────────┐
+                    │  Task Received  │
+                    └────────┬────────┘
+                             │
+                    ┌────────▼────────┐
+                    │  Worker Starts  │
+                    │  (any agent)    │
+                    └────────┬────────┘
+                             │
+                    ┌────────▼────────┐
+                    │  Worker Done    │
+                    └────────┬────────┘
+                             │
+              ┌──────────────┼──────────────┐
+              │              │              │
+     ┌────────▼────────┐     │     ┌────────▼────────┐
+     │  Hook Check     │     │     │  Orchestrator   │
+     │  (Claude/Gem)   │     │     │  Check (others) │
+     └────────┬────────┘     │     └────────┬────────┘
+              │              │              │
+              └──────────────┼──────────────┘
+                             │
+                    ┌────────▼────────┐
+                    │ Review Needed?  │
+                    └────────┬────────┘
+                             │
+              ┌──────────────┼──────────────┐
+              │ NO           │              │ YES
+              │              │              │
+     ┌────────▼────────┐     │     ┌────────▼────────┐
+     │  Exit Allowed   │     │     │  Spawn Reviewer │
+     └─────────────────┘     │     └────────┬────────┘
+                             │              │
+                             │     ┌────────▼────────┐
+                             │     │ Reviewer Checks │
+                             │     └────────┬────────┘
+                             │              │
+                             │     ┌────────▼────────┐
+                             │     │   APPROVED?     │
+                             │     └────────┬────────┘
+                             │              │
+                             │   ┌──────────┼──────────┐
+                             │   │ YES      │          │ NO
+                             │   │          │          │
+                             │   ▼          │     ┌────▼────────┐
+                             │  EXIT        │     │ File Issues │
+                             │              │     │ Loop Back   │
+                             │              │     └─────────────┘
+```
+
+---
+
+## Implementation Phases
+
+### Phase 1: Hook-based for Claude/Gemini
+1. Create `review-gate` CLI
+2. Add hooks.json to skills
+3. Test with simple approve/reject flow
+
+### Phase 2: Orchestrator for others
+1. Create orchestrator.sh wrapper
+2. Integrate with beads for state
+3. Test with OpenCode/Codex
+
+### Phase 3: Circuit breakers
+1. Add attempt tracking to beads
+2. Implement budget limits
+3. Add semantic drift detection (optional)
+
+### Phase 4: Reviewer skill
+1. Create adversarial reviewer prompt
+2. Integrate with orch for second opinions
+3. Test cross-agent review scenarios
+
+---
+
+## File Structure
+
+```
+skills/review-gate/
+├── SKILL.md                 # Reviewer skill
+├── .claude-plugin/
+│   └── plugin.json
+├── skills/
+│   └── review-gate.md
+├── hooks/
+│   └── hooks.json           # Stop hook config
+├── scripts/
+│   ├── review-gate          # CLI for gate checks
+│   └── orchestrator.sh      # Wrapper for non-hook agents
+└── templates/
+    └── reviewer-prompt.md   # Adversarial reviewer template
+```
+
+---
+
+## Open Questions
+
+1. **Session ID passing:** How does Stop hook know which session to check?
+2. **Cross-agent spawning:** Can Claude spawn Gemini reviewer? Via what mechanism?
+3. **Beads schema:** Need `review:` topic type or use existing issues?
+4. **Circuit breaker storage:** In beads or separate state file?
+
+---
+
+## References
+
+- [alice/idle (emes)](https://github.com/evil-mind-evil-sword/idle)
+- [jwz (emes)](https://github.com/evil-mind-evil-sword/jwz)
+- [LangGraph](https://langchain-ai.github.io/langgraph/)
+- Web research via orch (gemini --websearch)