Research conducted 2026-01-22: - pi-extension-ecosystem-research.md: 56 GitHub projects, 52 official examples - pi-ui-ecosystem-research.md: TUI patterns, components, overlays - multi-model-consensus-analysis.md: gap analysis leading to /synod design
21 KiB
Multi-Model Consensus: Current State & Pi Integration Analysis
Date: 2026-01-22 Purpose: Analyze what we have in orch vs what pi needs for multi-model consensus
What We Have: Orch CLI
Core Capabilities
Commands:
orch consensus- Parallel multi-model queries with vote/brainstorm/critique/open modesorch chat- Single-model conversation with session managementorch models- List/resolve 423 available modelsorch sessions- Manage conversation history
Key Features:
Model Selection:
- 423 models across providers (OpenAI, Anthropic, Google, DeepSeek, Qwen, Perplexity, etc.)
- Aliases:
flash,gemini,gpt,claude,sonnet,opus,haiku,deepseek,r1,qwen - Stance modifiers:
gpt:for,claude:against,gemini:neutral - Cost awareness:
--allow-expensivefor opus/r1
Modes:
vote- Support/Oppose/Neutral verdict with reasoningbrainstorm- Generate ideas without judgmentcritique- Find flaws and weaknessesopen- Freeform responses
Context:
- File inclusion:
--file PATH(multiple allowed) - Stdin piping:
cat code.py | orch consensus "..." - Session continuity:
--session IDfor chat mode - Web search:
--websearch(Gemini only)
Execution:
- Parallel by default,
--serialfor sequential - Serial strategies: neutral, refine, debate, brainstorm
- Synthesis:
--synthesize MODELto aggregate responses - Timeout control:
--timeout SECS
Output:
- Structured vote results with verdict counts
- Reasoning for each model
- Color-coded output (SUPPORT/OPPOSE/NEUTRAL)
- Session IDs for continuation
Current Skill Integration
Location: ~/.codex/skills/orch/
What it provides:
- Documentation of orch capabilities
- Usage patterns (second opinion, architecture decision, code review, devil's advocate, etc.)
- Model selection guidance
- Conversational patterns (session-based multi-turn, cross-model dialogue, iterative refinement)
- Combined patterns (explore then validate)
What it does NOT provide:
- Direct agent tool invocation (agent must shell out to
orch) - UI integration (no pickers, no inline results)
- Conversation context sharing (agent's conversation ≠ orch's conversation)
- Interactive model selection
- Add-to-context workflow
What Pi Oracle Extension Provides
From shitty-extensions/oracle.ts
UI Features:
- Interactive model picker overlay
- Quick keys (1-9) for fast selection
- Shows which models are authenticated/available
- Excludes current model from picker
- Formatted result display with scrolling
Context Sharing:
- Inherits full conversation context - Oracle sees the entire pi conversation
- Sends conversation history to queried model
- No need to re-explain context
Workflow:
- User types
/oracle <prompt> - Model picker appears
- Select model with arrow keys or number
- Oracle queries model with full conversation context + prompt
- Result displays in scrollable overlay
- "Add to context?" prompt - YES/NO choice
- If YES, oracle response appends to conversation
Model Awareness:
- Only shows models with valid API keys
- Filters out current model
- Groups by provider (OpenAI, Google, Anthropic, OpenAI Codex)
Input Options:
- Direct:
/oracle -m gpt-4o <prompt>(skips picker) - Files:
/oracle -f file.ts <prompt>(includes file content)
Implementation Details:
- Uses pi's
@mariozechner/pi-aicomplete() API - Serializes conversation with
serializeConversation() - Converts to LLM format with
convertToLlm() - Custom TUI component for result display
- BorderedLoader during query
Gap Analysis
What Orch Has That Oracle Doesn't
- Multiple simultaneous queries - Oracle queries one model at a time
- Structured voting - Support/Oppose/Neutral verdicts with counts
- Multiple modes - vote/brainstorm/critique/open (Oracle is always "open")
- Stance modifiers - :for/:against/:neutral bias (devil's advocate)
- Serial strategies - refine, debate, brainstorm sequences
- Synthesis - Aggregate multiple responses into summary
- Session management - Persistent conversation threads
- 423 models - Far more models than Oracle's ~18
- Cost awareness - Explicit
--allow-expensivegate - Web search - Integrated search for Gemini/Perplexity
- CLI flexibility - File piping, stdin, session export
What Oracle Has That Orch Doesn't
- Conversation context inheritance - Oracle sees full pi conversation automatically
- Interactive UI - Model picker, scrollable results, keyboard navigation
- Add-to-context workflow - Explicit YES/NO to inject response
- Current model exclusion - Automatically filters out active model
- Native pi integration - No subprocess, uses pi's AI API directly
- Quick keys - 1-9 for instant model selection
- Authenticated model filtering - Only shows models with valid keys
- Inline result display - Formatted overlay with scrolling
What Neither Has (Opportunities)
- Side-by-side comparison - Show multiple model responses in split view
- Vote visualization - Bar chart or consensus gauge
- Response diff - Highlight disagreements between models
- Model capability awareness - Filter by vision/reasoning/coding/etc.
- Cost preview - Show estimated cost before querying
- Cached responses - Don't re-query same prompt to same model
- Response export - Save consensus to file/issue
- Model recommendations - Suggest models based on query type
- Confidence scoring - Gauge certainty in responses
- Conversation branching - Fork conversation with different models
Pi Integration Options
Option 1: Wrap Orch CLI as Tool
Approach: Register orch as a pi tool, shell out to CLI
Pros:
- Minimal code, reuses existing orch
- All orch features available (423 models, voting, synthesis, etc.)
- Already works with current skill
Cons:
- No conversation context sharing (pi's conversation ≠ orch's input)
- No interactive UI (no model picker, no add-to-context)
- Subprocess overhead
- Output parsing required
- Can't leverage pi's AI API
Implementation:
pi.registerTool({
name: "orch_consensus",
description: "Query multiple AI models for consensus on a question",
parameters: Type.Object({
prompt: Type.String({ description: "Question to ask" }),
models: Type.Array(Type.String(), { description: "Model aliases (flash, gemini, gpt, claude, etc.)" }),
mode: Type.Optional(Type.Enum({ vote: "vote", brainstorm: "brainstorm", critique: "critique", open: "open" })),
files: Type.Optional(Type.Array(Type.String(), { description: "Paths to include as context" })),
}),
async execute(toolCallId, params, onUpdate, ctx, signal) {
const args = ["consensus", params.prompt, ...params.models];
if (params.mode) args.push("--mode", params.mode);
if (params.files) params.files.forEach(f => args.push("--file", f));
const result = await pi.exec("orch", args);
return { content: [{ type: "text", text: result.stdout }] };
}
});
Context issue: Agent would need to manually provide conversation context:
// Agent would have to do this:
const context = serializeConversation(ctx.sessionManager.getBranch());
const contextFile = writeToTempFile(context);
args.push("--file", contextFile);
Option 2: Oracle-Style Extension with Orch Models
Approach: Port Oracle's UI/UX but use orch's model registry
Pros:
- Best UX: interactive picker, add-to-context, full conversation sharing
- Native pi integration, no subprocess
- Can query multiple models and show side-by-side
- Direct access to pi's AI API
Cons:
- Doesn't leverage orch's advanced features (voting, synthesis, serial strategies)
- Duplicate model registry (though could import from orch config)
- More code to maintain
- Loses orch's CLI flexibility (piping, session export, etc.)
Implementation:
pi.registerCommand("consensus", {
description: "Get consensus from multiple models",
handler: async (args, ctx) => {
// 1. Show model picker (multi-select)
const models = await ctx.ui.custom(
(tui, theme, kb, done) => new ModelPickerComponent(theme, done, { multiSelect: true })
);
// 2. Serialize conversation context
const conversationHistory = serializeConversation(ctx.sessionManager.getBranch());
// 3. Query models in parallel
const promises = models.map(m =>
complete(m.model, [
...conversationHistory.map(convertToLlm),
{ role: "user", content: args }
], m.apiKey)
);
// 4. Show results in comparison view
const results = await Promise.all(promises);
await ctx.ui.custom(
(tui, theme, kb, done) => new ConsensusResultComponent(results, theme, done)
);
// 5. Add to context?
const shouldAdd = await ctx.ui.confirm("Add responses to conversation context?");
if (shouldAdd) {
// Append all responses or synthesized summary
ctx.sessionManager.appendMessage({
role: "assistant",
content: formatConsensus(results)
});
}
}
});
Features to implement:
- Multi-select model picker (checkboxes)
- Parallel query with progress indicators
- Side-by-side result display with scrolling
- Voting mode: parse "SUPPORT/OPPOSE/NEUTRAL" from responses
- Add-to-context with synthesis option
Option 3: Hybrid Approach
Approach: Keep orch CLI for advanced use, add Oracle-style extension for quick queries
Pros:
- Best of both worlds
- Agent can use tool for programmatic access
- User can use
/oraclefor interactive queries - Orch handles complex scenarios (serial strategies, synthesis)
- Oracle handles quick second opinions
Cons:
- Two parallel systems to maintain
- Potential confusion about which to use
Implementation:
Tool (for agent):
pi.registerTool({
name: "orch_consensus",
// ... as in Option 1, shells out to orch CLI
});
Command (for user):
pi.registerCommand("oracle", {
description: "Get second opinion from another model",
// ... as in Option 2, native UI integration
});
Usage patterns:
- User types
/oracle <prompt>→ interactive picker, add-to-context flow - Agent calls
orch_consensus()→ structured vote results in tool output - Agent suggests: "I can get consensus from multiple models using orch_consensus if you'd like"
- User can also run
orchdirectly in shell for advanced features
Option 4: Enhanced Oracle with Orch Backend
Approach: Oracle UI that calls orch CLI under the hood
Pros:
- Leverage orch's features through nice UI
- Single source of truth (orch)
- Can expose orch modes/options in UI
Cons:
- Subprocess overhead
- Hard to share conversation context (orch doesn't expect serialized conversations)
- Awkward impedance mismatch
Implementation challenges:
// How to pass conversation context to orch?
// Orch expects a prompt, not a conversation history
// Option A: Serialize entire conversation to temp file
const contextFile = "/tmp/pi-conversation.txt";
fs.writeFileSync(contextFile, formatConversation(history));
await pi.exec("orch", ["consensus", prompt, ...models, "--file", contextFile]);
// Option B: Inject context into prompt
const augmentedPrompt = `
Given this conversation:
${formatConversation(history)}
Answer this question: ${prompt}
`;
await pi.exec("orch", ["consensus", augmentedPrompt, ...models]);
Both are awkward because orch's input model doesn't match pi's conversation model.
Recommendation
Short Term: Option 3 (Hybrid)
Rationale:
-
Keep orch CLI for its strengths:
- 423 models (way more than Oracle)
- Voting/synthesis/serial strategies
- CLI flexibility (piping, sessions, export)
- Already works, well-tested
-
Add Oracle-style extension for its strengths:
- Interactive UI (model picker, results display)
- Conversation context sharing
- Add-to-context workflow
- Quick keys, better UX
-
Clear division of labor:
/oracle→ quick second opinion, inherits conversation, nice UIorch_consensustool → agent programmatic access, structured votingorchCLI → advanced features (synthesis, serial strategies, sessions)
Long Term: Option 2 (Native Integration) + Orch as Fallback
Rationale: Eventually, we want:
- Native pi tool with full UI integration
- Access to orch's model registry (import from config)
- Voting, synthesis, comparison built into UI
- Conversation context sharing by default
But keep orch CLI for:
- Session management
- Export/archival
- Scripting/automation
- Features not yet in pi extension
Implementation Plan
Phase 1: Oracle Extension (Week 1)
Goal: Interactive second opinion with conversation context
Tasks:
- Port Oracle extension from shitty-extensions
- Add model aliases from orch config
- Implement model picker with multi-select
- Conversation context serialization
- Add-to-context workflow
- Test with flash/gemini/gpt/claude
Deliverable: /oracle command for quick second opinions
Phase 2: Orch Tool Wrapper (Week 2)
Goal: Agent can invoke orch programmatically
Tasks:
- Register
orch_consensustool - Map tool parameters to orch CLI args
- Serialize conversation context to temp file
- Parse orch output (vote results)
- Format for agent consumption
Deliverable: Agent can call orch for structured consensus
Phase 3: Enhanced Oracle UI (Week 3-4)
Goal: Side-by-side comparison and voting
Tasks:
- Multi-model query in parallel
- Split-pane result display
- Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
- Consensus gauge visualization
- Diff highlighting (show disagreements)
- Cost preview before query
Deliverable: Rich consensus UI with voting
Phase 4: Advanced Features (Month 2)
Goal: Match orch's advanced features
Tasks:
- Synthesis mode (aggregate responses)
- Serial strategies (refine, debate)
- Stance modifiers (:for/:against)
- Response caching (don't re-query)
- Model recommendations based on query
- Export to file/issue
Deliverable: Feature parity with orch CLI
Technical Details
Model Registry Sharing
Current state: Orch has 423 models in Python config
Options:
- Import orch config - Parse orch's model registry
- Duplicate registry - Maintain separate TypeScript registry
- Query orch - Call
orch modelsand parse output
Recommendation: Start with (3), migrate to (1) later
async function getOrchModels(): Promise<ModelAlias[]> {
const { stdout } = await pi.exec("orch", ["models"]);
return parseOrchModels(stdout);
}
Conversation Context Serialization
Challenge: Pi's conversation format ≠ standard chat format
Solution: Use pi's built-in serializeConversation() and convertToLlm()
import { serializeConversation, convertToLlm } from "@mariozechner/pi-coding-agent";
const history = ctx.sessionManager.getBranch();
const serialized = serializeConversation(history);
const llmMessages = serialized.map(convertToLlm);
// Now compatible with any model's chat API
const response = await complete(model, llmMessages, apiKey);
Add-to-Context Workflow
UI Flow:
- Show consensus results
- Prompt: "Add responses to conversation context?"
- Options:
- YES - Add all responses (verbose)
- SUMMARY - Add synthesized summary (concise)
- NO - Don't add
Implementation:
const choice = await ctx.ui.select("Add to context?", [
"Yes, add all responses",
"Yes, add synthesized summary",
"No, keep separate"
]);
if (choice === 0) {
// Append all model responses
for (const result of results) {
ctx.sessionManager.appendMessage({
role: "assistant",
content: `[${result.modelName}]: ${result.response}`
});
}
} else if (choice === 1) {
// Synthesize and append
const summary = await synthesize(results, "gemini");
ctx.sessionManager.appendMessage({
role: "assistant",
content: `[Consensus]: ${summary}`
});
}
Vote Parsing
Challenge: Extract SUPPORT/OPPOSE/NEUTRAL from freeform responses
Strategies:
- Prompt engineering - Ask models to start response with verdict
- Regex matching - Parse structured output
- Secondary query - Ask "classify this response as SUPPORT/OPPOSE/NEUTRAL"
Recommendation: (1) with (3) as fallback
const votePrompt = `${originalPrompt}
Respond with your verdict first: SUPPORT, OPPOSE, or NEUTRAL
Then explain your reasoning.`;
const response = await complete(model, [...history, { role: "user", content: votePrompt }]);
const match = response.match(/^(SUPPORT|OPPOSE|NEUTRAL)/i);
const verdict = match ? match[1].toUpperCase() : "NEUTRAL";
Cost Estimation
Orch approach: Uses pricing data in model registry
Implementation:
interface ModelInfo {
id: string;
name: string;
inputCostPer1M: number;
outputCostPer1M: number;
}
function estimateCost(prompt: string, history: Message[], models: ModelInfo[]): number {
const inputTokens = estimateTokens([...history, { role: "user", content: prompt }]);
const outputTokens = 1000; // Estimate
return models.reduce((total, m) => {
const inputCost = (inputTokens / 1_000_000) * m.inputCostPer1M;
const outputCost = (outputTokens / 1_000_000) * m.outputCostPer1M;
return total + inputCost + outputCost;
}, 0);
}
// Show before querying
const cost = estimateCost(prompt, history, selectedModels);
const confirmed = await ctx.ui.confirm(`Estimated cost: $${cost.toFixed(3)}. Continue?`);
Design Questions
1. Should Oracle query multiple models or just one?
Current Oracle: One model at a time Orch: Multiple models in parallel
Recommendation: Support both
/oracle <prompt>→ single model picker (quick second opinion)/oracle-consensus <prompt>→ multi-select picker (true consensus)
Or:
/oraclewith Shift+Enter for multi-select
2. Should results auto-add to context or always prompt?
Current Oracle: Always prompts Orch: No context, just output
Recommendation: Make it configurable
- Default: always prompt
- Setting:
oracle.autoAddToContext = trueto skip prompt - ESC = don't add (quick exit)
3. How to handle expensive models?
Orch: Requires --allow-expensive flag
Recommendation: Show cost and prompt
- Model picker shows cost per model
- Selecting opus/r1 shows warning: "This is expensive ($X per query). Continue?"
- Can disable in settings
4. Should we cache responses?
Problem: Querying same prompt to same model multiple times wastes money
Recommendation: Short-term cache
- Cache key:
hash(model + conversation_context + prompt) - TTL: 5 minutes
- Show indicator: "(cached)" in results
- Option to force refresh
5. How to visualize consensus?
Options:
- List view (like orch) - each model's response sequentially
- Side-by-side - split screen with responses in columns
- Gauge - visual consensus meter (% support)
- Diff view - highlight agreements/disagreements
Recommendation: Progressive disclosure
- Initial: Gauge + vote counts
- Expand: List view with reasoning
- Advanced: Side-by-side diff view
Next Steps
-
Prototype Oracle extension (today)
- Port from shitty-extensions
- Test with flash/gemini
- Verify conversation context sharing
-
Design consensus UI (tomorrow)
- Sketch multi-model result layout
- Decide on vote visualization
- Mock up add-to-context flow
-
Implement model picker (day 3)
- Multi-select support
- Quick keys (1-9 for single, checkboxes for multi)
- Show cost/capabilities
- Filter by authenticated models
-
Build comparison view (day 4-5)
- Parallel query execution
- Progress indicators
- Side-by-side results
- Diff highlighting
-
Add orch tool wrapper (day 6)
- Register tool for agent use
- Map parameters to CLI args
- Parse vote output
-
Integration testing (day 7)
- Test with real conversations
- Verify context sharing works
- Check cost estimates
- Test with slow models (timeout handling)
Success Metrics
Must Have:
/oraclecommand works with conversation context- Model picker shows authenticated models only
- Results display with add-to-context option
- Multi-model query in parallel
- Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
- Cost estimation before query
Nice to Have:
- Side-by-side comparison view
- Diff highlighting for disagreements
- Response caching (5min TTL)
- Model recommendations based on query
- Export consensus to file/issue
- Serial strategies (refine, debate)
Stretch Goals:
- Synthesis mode with custom prompts
- Confidence scoring
- Conversation branching
- Historical consensus tracking
- Model capability filtering (vision/reasoning/coding)
References
- orch CLI - Current implementation
- shitty-extensions/oracle.ts
- pi-mono extension docs
- pi-mono TUI docs