dan/skills

dan bffa966e76 docs: add pi extension ecosystem and synod research

Research conducted 2026-01-22:
- pi-extension-ecosystem-research.md: 56 GitHub projects, 52 official examples
- pi-ui-ecosystem-research.md: TUI patterns, components, overlays
- multi-model-consensus-analysis.md: gap analysis leading to /synod design

2026-01-23 00:31:22 -08:00

21 KiB

Raw Blame History

Multi-Model Consensus: Current State & Pi Integration Analysis

Date: 2026-01-22 Purpose: Analyze what we have in orch vs what pi needs for multi-model consensus

What We Have: Orch CLI

Core Capabilities

Commands:

orch consensus - Parallel multi-model queries with vote/brainstorm/critique/open modes
orch chat - Single-model conversation with session management
orch models - List/resolve 423 available models
orch sessions - Manage conversation history

Key Features:

Model Selection:

423 models across providers (OpenAI, Anthropic, Google, DeepSeek, Qwen, Perplexity, etc.)
Aliases: flash, gemini, gpt, claude, sonnet, opus, haiku, deepseek, r1, qwen
Stance modifiers: gpt:for, claude:against, gemini:neutral
Cost awareness: --allow-expensive for opus/r1

Modes:

vote - Support/Oppose/Neutral verdict with reasoning
brainstorm - Generate ideas without judgment
critique - Find flaws and weaknesses
open - Freeform responses

Context:

File inclusion: --file PATH (multiple allowed)
Stdin piping: cat code.py | orch consensus "..."
Session continuity: --session ID for chat mode
Web search: --websearch (Gemini only)

Execution:

Parallel by default, --serial for sequential
Serial strategies: neutral, refine, debate, brainstorm
Synthesis: --synthesize MODEL to aggregate responses
Timeout control: --timeout SECS

Output:

Structured vote results with verdict counts
Reasoning for each model
Color-coded output (SUPPORT/OPPOSE/NEUTRAL)
Session IDs for continuation

Current Skill Integration

Location: ~/.codex/skills/orch/

What it provides:

Documentation of orch capabilities
Usage patterns (second opinion, architecture decision, code review, devil's advocate, etc.)
Model selection guidance
Conversational patterns (session-based multi-turn, cross-model dialogue, iterative refinement)
Combined patterns (explore then validate)

What it does NOT provide:

Direct agent tool invocation (agent must shell out to orch)
UI integration (no pickers, no inline results)
Conversation context sharing (agent's conversation ≠ orch's conversation)
Interactive model selection
Add-to-context workflow

What Pi Oracle Extension Provides

From shitty-extensions/oracle.ts

UI Features:

Interactive model picker overlay
Quick keys (1-9) for fast selection
Shows which models are authenticated/available
Excludes current model from picker
Formatted result display with scrolling

Context Sharing:

Inherits full conversation context - Oracle sees the entire pi conversation
Sends conversation history to queried model
No need to re-explain context

Workflow:

User types /oracle <prompt>
Model picker appears
Select model with arrow keys or number
Oracle queries model with full conversation context + prompt
Result displays in scrollable overlay
"Add to context?" prompt - YES/NO choice
If YES, oracle response appends to conversation

Model Awareness:

Only shows models with valid API keys
Filters out current model
Groups by provider (OpenAI, Google, Anthropic, OpenAI Codex)

Input Options:

Direct: /oracle -m gpt-4o <prompt> (skips picker)
Files: /oracle -f file.ts <prompt> (includes file content)

Implementation Details:

Uses pi's @mariozechner/pi-ai complete() API
Serializes conversation with serializeConversation()
Converts to LLM format with convertToLlm()
Custom TUI component for result display
BorderedLoader during query

Gap Analysis

What Orch Has That Oracle Doesn't

Multiple simultaneous queries - Oracle queries one model at a time
Structured voting - Support/Oppose/Neutral verdicts with counts
Multiple modes - vote/brainstorm/critique/open (Oracle is always "open")
Stance modifiers - :for/:against/:neutral bias (devil's advocate)
Serial strategies - refine, debate, brainstorm sequences
Synthesis - Aggregate multiple responses into summary
Session management - Persistent conversation threads
423 models - Far more models than Oracle's ~18
Cost awareness - Explicit --allow-expensive gate
Web search - Integrated search for Gemini/Perplexity
CLI flexibility - File piping, stdin, session export

What Oracle Has That Orch Doesn't

Conversation context inheritance - Oracle sees full pi conversation automatically
Interactive UI - Model picker, scrollable results, keyboard navigation
Add-to-context workflow - Explicit YES/NO to inject response
Current model exclusion - Automatically filters out active model
Native pi integration - No subprocess, uses pi's AI API directly
Quick keys - 1-9 for instant model selection
Authenticated model filtering - Only shows models with valid keys
Inline result display - Formatted overlay with scrolling

What Neither Has (Opportunities)

Side-by-side comparison - Show multiple model responses in split view
Vote visualization - Bar chart or consensus gauge
Response diff - Highlight disagreements between models
Model capability awareness - Filter by vision/reasoning/coding/etc.
Cost preview - Show estimated cost before querying
Cached responses - Don't re-query same prompt to same model
Response export - Save consensus to file/issue
Model recommendations - Suggest models based on query type
Confidence scoring - Gauge certainty in responses
Conversation branching - Fork conversation with different models

Pi Integration Options

Option 1: Wrap Orch CLI as Tool

Approach: Register orch as a pi tool, shell out to CLI

Pros:

Minimal code, reuses existing orch
All orch features available (423 models, voting, synthesis, etc.)
Already works with current skill

Cons:

No conversation context sharing (pi's conversation ≠ orch's input)
No interactive UI (no model picker, no add-to-context)
Subprocess overhead
Output parsing required
Can't leverage pi's AI API

Implementation:

pi.registerTool({
  name: "orch_consensus",
  description: "Query multiple AI models for consensus on a question",
  parameters: Type.Object({
    prompt: Type.String({ description: "Question to ask" }),
    models: Type.Array(Type.String(), { description: "Model aliases (flash, gemini, gpt, claude, etc.)" }),
    mode: Type.Optional(Type.Enum({ vote: "vote", brainstorm: "brainstorm", critique: "critique", open: "open" })),
    files: Type.Optional(Type.Array(Type.String(), { description: "Paths to include as context" })),
  }),
  async execute(toolCallId, params, onUpdate, ctx, signal) {
    const args = ["consensus", params.prompt, ...params.models];
    if (params.mode) args.push("--mode", params.mode);
    if (params.files) params.files.forEach(f => args.push("--file", f));
    
    const result = await pi.exec("orch", args);
    return { content: [{ type: "text", text: result.stdout }] };
  }
});

Context issue: Agent would need to manually provide conversation context:

// Agent would have to do this:
const context = serializeConversation(ctx.sessionManager.getBranch());
const contextFile = writeToTempFile(context);
args.push("--file", contextFile);

Option 2: Oracle-Style Extension with Orch Models

Approach: Port Oracle's UI/UX but use orch's model registry

Pros:

Best UX: interactive picker, add-to-context, full conversation sharing
Native pi integration, no subprocess
Can query multiple models and show side-by-side
Direct access to pi's AI API

Cons:

Doesn't leverage orch's advanced features (voting, synthesis, serial strategies)
Duplicate model registry (though could import from orch config)
More code to maintain
Loses orch's CLI flexibility (piping, session export, etc.)

Implementation:

pi.registerCommand("consensus", {
  description: "Get consensus from multiple models",
  handler: async (args, ctx) => {
    // 1. Show model picker (multi-select)
    const models = await ctx.ui.custom(
      (tui, theme, kb, done) => new ModelPickerComponent(theme, done, { multiSelect: true })
    );
    
    // 2. Serialize conversation context
    const conversationHistory = serializeConversation(ctx.sessionManager.getBranch());
    
    // 3. Query models in parallel
    const promises = models.map(m => 
      complete(m.model, [
        ...conversationHistory.map(convertToLlm),
        { role: "user", content: args }
      ], m.apiKey)
    );
    
    // 4. Show results in comparison view
    const results = await Promise.all(promises);
    await ctx.ui.custom(
      (tui, theme, kb, done) => new ConsensusResultComponent(results, theme, done)
    );
    
    // 5. Add to context?
    const shouldAdd = await ctx.ui.confirm("Add responses to conversation context?");
    if (shouldAdd) {
      // Append all responses or synthesized summary
      ctx.sessionManager.appendMessage({
        role: "assistant",
        content: formatConsensus(results)
      });
    }
  }
});

Features to implement:

Multi-select model picker (checkboxes)
Parallel query with progress indicators
Side-by-side result display with scrolling
Voting mode: parse "SUPPORT/OPPOSE/NEUTRAL" from responses
Add-to-context with synthesis option

Option 3: Hybrid Approach

Approach: Keep orch CLI for advanced use, add Oracle-style extension for quick queries

Pros:

Best of both worlds
Agent can use tool for programmatic access
User can use /oracle for interactive queries
Orch handles complex scenarios (serial strategies, synthesis)
Oracle handles quick second opinions

Cons:

Two parallel systems to maintain
Potential confusion about which to use

Implementation:

Tool (for agent):

pi.registerTool({
  name: "orch_consensus",
  // ... as in Option 1, shells out to orch CLI
});

Command (for user):

pi.registerCommand("oracle", {
  description: "Get second opinion from another model",
  // ... as in Option 2, native UI integration
});

Usage patterns:

User types /oracle <prompt> → interactive picker, add-to-context flow
Agent calls orch_consensus() → structured vote results in tool output
Agent suggests: "I can get consensus from multiple models using orch_consensus if you'd like"
User can also run orch directly in shell for advanced features

Option 4: Enhanced Oracle with Orch Backend

Approach: Oracle UI that calls orch CLI under the hood

Pros:

Leverage orch's features through nice UI
Single source of truth (orch)
Can expose orch modes/options in UI

Cons:

Subprocess overhead
Hard to share conversation context (orch doesn't expect serialized conversations)
Awkward impedance mismatch

Implementation challenges:

// How to pass conversation context to orch?
// Orch expects a prompt, not a conversation history

// Option A: Serialize entire conversation to temp file
const contextFile = "/tmp/pi-conversation.txt";
fs.writeFileSync(contextFile, formatConversation(history));
await pi.exec("orch", ["consensus", prompt, ...models, "--file", contextFile]);

// Option B: Inject context into prompt
const augmentedPrompt = `
Given this conversation:
${formatConversation(history)}

Answer this question: ${prompt}
`;
await pi.exec("orch", ["consensus", augmentedPrompt, ...models]);

Both are awkward because orch's input model doesn't match pi's conversation model.

Recommendation

Short Term: Option 3 (Hybrid)

Rationale:

Keep orch CLI for its strengths:
- 423 models (way more than Oracle)
- Voting/synthesis/serial strategies
- CLI flexibility (piping, sessions, export)
- Already works, well-tested
Add Oracle-style extension for its strengths:
- Interactive UI (model picker, results display)
- Conversation context sharing
- Add-to-context workflow
- Quick keys, better UX
Clear division of labor:
- /oracle → quick second opinion, inherits conversation, nice UI
- orch_consensus tool → agent programmatic access, structured voting
- orch CLI → advanced features (synthesis, serial strategies, sessions)

Long Term: Option 2 (Native Integration) + Orch as Fallback

Rationale: Eventually, we want:

Native pi tool with full UI integration
Access to orch's model registry (import from config)
Voting, synthesis, comparison built into UI
Conversation context sharing by default

But keep orch CLI for:

Session management
Export/archival
Scripting/automation
Features not yet in pi extension

Implementation Plan

Phase 1: Oracle Extension (Week 1)

Goal: Interactive second opinion with conversation context

Tasks:

Port Oracle extension from shitty-extensions
Add model aliases from orch config
Implement model picker with multi-select
Conversation context serialization
Add-to-context workflow
Test with flash/gemini/gpt/claude

Deliverable: /oracle command for quick second opinions

Phase 2: Orch Tool Wrapper (Week 2)

Goal: Agent can invoke orch programmatically

Tasks:

Register orch_consensus tool
Map tool parameters to orch CLI args
Serialize conversation context to temp file
Parse orch output (vote results)
Format for agent consumption

Deliverable: Agent can call orch for structured consensus

Phase 3: Enhanced Oracle UI (Week 3-4)

Goal: Side-by-side comparison and voting

Tasks:

Multi-model query in parallel
Split-pane result display
Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
Consensus gauge visualization
Diff highlighting (show disagreements)
Cost preview before query

Deliverable: Rich consensus UI with voting

Phase 4: Advanced Features (Month 2)

Goal: Match orch's advanced features

Tasks:

Synthesis mode (aggregate responses)
Serial strategies (refine, debate)
Stance modifiers (:for/:against)
Response caching (don't re-query)
Model recommendations based on query
Export to file/issue

Deliverable: Feature parity with orch CLI

Technical Details

Current state: Orch has 423 models in Python config

Options:

Import orch config - Parse orch's model registry
Duplicate registry - Maintain separate TypeScript registry
Query orch - Call orch models and parse output

Recommendation: Start with (3), migrate to (1) later

async function getOrchModels(): Promise<ModelAlias[]> {
  const { stdout } = await pi.exec("orch", ["models"]);
  return parseOrchModels(stdout);
}

Conversation Context Serialization

Challenge: Pi's conversation format ≠ standard chat format

Solution: Use pi's built-in serializeConversation() and convertToLlm()

import { serializeConversation, convertToLlm } from "@mariozechner/pi-coding-agent";

const history = ctx.sessionManager.getBranch();
const serialized = serializeConversation(history);
const llmMessages = serialized.map(convertToLlm);

// Now compatible with any model's chat API
const response = await complete(model, llmMessages, apiKey);

Add-to-Context Workflow

UI Flow:

Show consensus results
Prompt: "Add responses to conversation context?"
Options:
- YES - Add all responses (verbose)
- SUMMARY - Add synthesized summary (concise)
- NO - Don't add

Implementation:

const choice = await ctx.ui.select("Add to context?", [
  "Yes, add all responses",
  "Yes, add synthesized summary",
  "No, keep separate"
]);

if (choice === 0) {
  // Append all model responses
  for (const result of results) {
    ctx.sessionManager.appendMessage({
      role: "assistant",
      content: `[${result.modelName}]: ${result.response}`
    });
  }
} else if (choice === 1) {
  // Synthesize and append
  const summary = await synthesize(results, "gemini");
  ctx.sessionManager.appendMessage({
    role: "assistant",
    content: `[Consensus]: ${summary}`
  });
}

Vote Parsing

Challenge: Extract SUPPORT/OPPOSE/NEUTRAL from freeform responses

Strategies:

Prompt engineering - Ask models to start response with verdict
Regex matching - Parse structured output
Secondary query - Ask "classify this response as SUPPORT/OPPOSE/NEUTRAL"

Recommendation: (1) with (3) as fallback

const votePrompt = `${originalPrompt}

Respond with your verdict first: SUPPORT, OPPOSE, or NEUTRAL
Then explain your reasoning.`;

const response = await complete(model, [...history, { role: "user", content: votePrompt }]);

const match = response.match(/^(SUPPORT|OPPOSE|NEUTRAL)/i);
const verdict = match ? match[1].toUpperCase() : "NEUTRAL";

Cost Estimation

Orch approach: Uses pricing data in model registry

Implementation:

interface ModelInfo {
  id: string;
  name: string;
  inputCostPer1M: number;
  outputCostPer1M: number;
}

function estimateCost(prompt: string, history: Message[], models: ModelInfo[]): number {
  const inputTokens = estimateTokens([...history, { role: "user", content: prompt }]);
  const outputTokens = 1000; // Estimate
  
  return models.reduce((total, m) => {
    const inputCost = (inputTokens / 1_000_000) * m.inputCostPer1M;
    const outputCost = (outputTokens / 1_000_000) * m.outputCostPer1M;
    return total + inputCost + outputCost;
  }, 0);
}

// Show before querying
const cost = estimateCost(prompt, history, selectedModels);
const confirmed = await ctx.ui.confirm(`Estimated cost: $${cost.toFixed(3)}. Continue?`);

Design Questions

1. Should Oracle query multiple models or just one?

Current Oracle: One model at a time Orch: Multiple models in parallel

Recommendation: Support both

/oracle <prompt> → single model picker (quick second opinion)
/oracle-consensus <prompt> → multi-select picker (true consensus)

Or:

/oracle with Shift+Enter for multi-select

2. Should results auto-add to context or always prompt?

Current Oracle: Always prompts Orch: No context, just output

Recommendation: Make it configurable

Default: always prompt
Setting: oracle.autoAddToContext = true to skip prompt
ESC = don't add (quick exit)

3. How to handle expensive models?

Orch: Requires --allow-expensive flag

Recommendation: Show cost and prompt

Model picker shows cost per model
Selecting opus/r1 shows warning: "This is expensive ($X per query). Continue?"
Can disable in settings

4. Should we cache responses?

Problem: Querying same prompt to same model multiple times wastes money

Recommendation: Short-term cache

Cache key: hash(model + conversation_context + prompt)
TTL: 5 minutes
Show indicator: "(cached)" in results
Option to force refresh

5. How to visualize consensus?

Options:

List view (like orch) - each model's response sequentially
Side-by-side - split screen with responses in columns
Gauge - visual consensus meter (% support)
Diff view - highlight agreements/disagreements

Recommendation: Progressive disclosure

Initial: Gauge + vote counts
Expand: List view with reasoning
Advanced: Side-by-side diff view

Next Steps

Prototype Oracle extension (today)
- Port from shitty-extensions
- Test with flash/gemini
- Verify conversation context sharing
Design consensus UI (tomorrow)
- Sketch multi-model result layout
- Decide on vote visualization
- Mock up add-to-context flow
Implement model picker (day 3)
- Multi-select support
- Quick keys (1-9 for single, checkboxes for multi)
- Show cost/capabilities
- Filter by authenticated models
Build comparison view (day 4-5)
- Parallel query execution
- Progress indicators
- Side-by-side results
- Diff highlighting
Add orch tool wrapper (day 6)
- Register tool for agent use
- Map parameters to CLI args
- Parse vote output
Integration testing (day 7)
- Test with real conversations
- Verify context sharing works
- Check cost estimates
- Test with slow models (timeout handling)

Success Metrics

Must Have:

/oracle command works with conversation context
Model picker shows authenticated models only
Results display with add-to-context option
Multi-model query in parallel
Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
Cost estimation before query

Nice to Have:

Side-by-side comparison view
Diff highlighting for disagreements
Response caching (5min TTL)
Model recommendations based on query
Export consensus to file/issue
Serial strategies (refine, debate)

Stretch Goals:

Synthesis mode with custom prompts
Confidence scoring
Conversation branching
Historical consensus tracking
Model capability filtering (vision/reasoning/coding)

21 KiB Raw Blame History

Multi-Model Consensus: Current State & Pi Integration Analysis

What We Have: Orch CLI

Core Capabilities

Current Skill Integration

What Pi Oracle Extension Provides

From shitty-extensions/oracle.ts

Gap Analysis

What Orch Has That Oracle Doesn't

What Oracle Has That Orch Doesn't

What Neither Has (Opportunities)

Pi Integration Options

Option 1: Wrap Orch CLI as Tool

Option 2: Oracle-Style Extension with Orch Models

Option 3: Hybrid Approach

Option 4: Enhanced Oracle with Orch Backend

Recommendation

Short Term: Option 3 (Hybrid)

Long Term: Option 2 (Native Integration) + Orch as Fallback

Implementation Plan

Phase 1: Oracle Extension (Week 1)

Phase 2: Orch Tool Wrapper (Week 2)

Phase 3: Enhanced Oracle UI (Week 3-4)

Phase 4: Advanced Features (Month 2)

Technical Details

Model Registry Sharing

Conversation Context Serialization

Add-to-Context Workflow

Vote Parsing

Cost Estimation

Design Questions

1. Should Oracle query multiple models or just one?

2. Should results auto-add to context or always prompt?

3. How to handle expensive models?

4. Should we cache responses?

5. How to visualize consensus?

Next Steps

Success Metrics

References

21 KiB

Raw Blame History