Research conducted 2026-01-22: - pi-extension-ecosystem-research.md: 56 GitHub projects, 52 official examples - pi-ui-ecosystem-research.md: TUI patterns, components, overlays - multi-model-consensus-analysis.md: gap analysis leading to /synod design
702 lines
21 KiB
Markdown
702 lines
21 KiB
Markdown
# Multi-Model Consensus: Current State & Pi Integration Analysis
|
|
|
|
**Date**: 2026-01-22
|
|
**Purpose**: Analyze what we have in orch vs what pi needs for multi-model consensus
|
|
|
|
---
|
|
|
|
## What We Have: Orch CLI
|
|
|
|
### Core Capabilities
|
|
|
|
**Commands**:
|
|
1. `orch consensus` - Parallel multi-model queries with vote/brainstorm/critique/open modes
|
|
2. `orch chat` - Single-model conversation with session management
|
|
3. `orch models` - List/resolve 423 available models
|
|
4. `orch sessions` - Manage conversation history
|
|
|
|
**Key Features**:
|
|
|
|
**Model Selection**:
|
|
- 423 models across providers (OpenAI, Anthropic, Google, DeepSeek, Qwen, Perplexity, etc.)
|
|
- Aliases: `flash`, `gemini`, `gpt`, `claude`, `sonnet`, `opus`, `haiku`, `deepseek`, `r1`, `qwen`
|
|
- Stance modifiers: `gpt:for`, `claude:against`, `gemini:neutral`
|
|
- Cost awareness: `--allow-expensive` for opus/r1
|
|
|
|
**Modes**:
|
|
- `vote` - Support/Oppose/Neutral verdict with reasoning
|
|
- `brainstorm` - Generate ideas without judgment
|
|
- `critique` - Find flaws and weaknesses
|
|
- `open` - Freeform responses
|
|
|
|
**Context**:
|
|
- File inclusion: `--file PATH` (multiple allowed)
|
|
- Stdin piping: `cat code.py | orch consensus "..."`
|
|
- Session continuity: `--session ID` for chat mode
|
|
- Web search: `--websearch` (Gemini only)
|
|
|
|
**Execution**:
|
|
- Parallel by default, `--serial` for sequential
|
|
- Serial strategies: neutral, refine, debate, brainstorm
|
|
- Synthesis: `--synthesize MODEL` to aggregate responses
|
|
- Timeout control: `--timeout SECS`
|
|
|
|
**Output**:
|
|
- Structured vote results with verdict counts
|
|
- Reasoning for each model
|
|
- Color-coded output (SUPPORT/OPPOSE/NEUTRAL)
|
|
- Session IDs for continuation
|
|
|
|
### Current Skill Integration
|
|
|
|
**Location**: `~/.codex/skills/orch/`
|
|
|
|
**What it provides**:
|
|
- Documentation of orch capabilities
|
|
- Usage patterns (second opinion, architecture decision, code review, devil's advocate, etc.)
|
|
- Model selection guidance
|
|
- Conversational patterns (session-based multi-turn, cross-model dialogue, iterative refinement)
|
|
- Combined patterns (explore then validate)
|
|
|
|
**What it does NOT provide**:
|
|
- Direct agent tool invocation (agent must shell out to `orch`)
|
|
- UI integration (no pickers, no inline results)
|
|
- Conversation context sharing (agent's conversation ≠ orch's conversation)
|
|
- Interactive model selection
|
|
- Add-to-context workflow
|
|
|
|
---
|
|
|
|
## What Pi Oracle Extension Provides
|
|
|
|
### From shitty-extensions/oracle.ts
|
|
|
|
**UI Features**:
|
|
- Interactive model picker overlay
|
|
- Quick keys (1-9) for fast selection
|
|
- Shows which models are authenticated/available
|
|
- Excludes current model from picker
|
|
- Formatted result display with scrolling
|
|
|
|
**Context Sharing**:
|
|
- **Inherits full conversation context** - Oracle sees the entire pi conversation
|
|
- Sends conversation history to queried model
|
|
- No need to re-explain context
|
|
|
|
**Workflow**:
|
|
1. User types `/oracle <prompt>`
|
|
2. Model picker appears
|
|
3. Select model with arrow keys or number
|
|
4. Oracle queries model with **full conversation context + prompt**
|
|
5. Result displays in scrollable overlay
|
|
6. **"Add to context?" prompt** - YES/NO choice
|
|
7. If YES, oracle response appends to conversation
|
|
|
|
**Model Awareness**:
|
|
- Only shows models with valid API keys
|
|
- Filters out current model
|
|
- Groups by provider (OpenAI, Google, Anthropic, OpenAI Codex)
|
|
|
|
**Input Options**:
|
|
- Direct: `/oracle -m gpt-4o <prompt>` (skips picker)
|
|
- Files: `/oracle -f file.ts <prompt>` (includes file content)
|
|
|
|
**Implementation Details**:
|
|
- Uses pi's `@mariozechner/pi-ai` complete() API
|
|
- Serializes conversation with `serializeConversation()`
|
|
- Converts to LLM format with `convertToLlm()`
|
|
- Custom TUI component for result display
|
|
- BorderedLoader during query
|
|
|
|
---
|
|
|
|
## Gap Analysis
|
|
|
|
### What Orch Has That Oracle Doesn't
|
|
|
|
1. **Multiple simultaneous queries** - Oracle queries one model at a time
|
|
2. **Structured voting** - Support/Oppose/Neutral verdicts with counts
|
|
3. **Multiple modes** - vote/brainstorm/critique/open (Oracle is always "open")
|
|
4. **Stance modifiers** - :for/:against/:neutral bias (devil's advocate)
|
|
5. **Serial strategies** - refine, debate, brainstorm sequences
|
|
6. **Synthesis** - Aggregate multiple responses into summary
|
|
7. **Session management** - Persistent conversation threads
|
|
8. **423 models** - Far more models than Oracle's ~18
|
|
9. **Cost awareness** - Explicit `--allow-expensive` gate
|
|
10. **Web search** - Integrated search for Gemini/Perplexity
|
|
11. **CLI flexibility** - File piping, stdin, session export
|
|
|
|
### What Oracle Has That Orch Doesn't
|
|
|
|
1. **Conversation context inheritance** - Oracle sees full pi conversation automatically
|
|
2. **Interactive UI** - Model picker, scrollable results, keyboard navigation
|
|
3. **Add-to-context workflow** - Explicit YES/NO to inject response
|
|
4. **Current model exclusion** - Automatically filters out active model
|
|
5. **Native pi integration** - No subprocess, uses pi's AI API directly
|
|
6. **Quick keys** - 1-9 for instant model selection
|
|
7. **Authenticated model filtering** - Only shows models with valid keys
|
|
8. **Inline result display** - Formatted overlay with scrolling
|
|
|
|
### What Neither Has (Opportunities)
|
|
|
|
1. **Side-by-side comparison** - Show multiple model responses in split view
|
|
2. **Vote visualization** - Bar chart or consensus gauge
|
|
3. **Response diff** - Highlight disagreements between models
|
|
4. **Model capability awareness** - Filter by vision/reasoning/coding/etc.
|
|
5. **Cost preview** - Show estimated cost before querying
|
|
6. **Cached responses** - Don't re-query same prompt to same model
|
|
7. **Response export** - Save consensus to file/issue
|
|
8. **Model recommendations** - Suggest models based on query type
|
|
9. **Confidence scoring** - Gauge certainty in responses
|
|
10. **Conversation branching** - Fork conversation with different models
|
|
|
|
---
|
|
|
|
## Pi Integration Options
|
|
|
|
### Option 1: Wrap Orch CLI as Tool
|
|
|
|
**Approach**: Register `orch` as a pi tool, shell out to CLI
|
|
|
|
**Pros**:
|
|
- Minimal code, reuses existing orch
|
|
- All orch features available (423 models, voting, synthesis, etc.)
|
|
- Already works with current skill
|
|
|
|
**Cons**:
|
|
- No conversation context sharing (pi's conversation ≠ orch's input)
|
|
- No interactive UI (no model picker, no add-to-context)
|
|
- Subprocess overhead
|
|
- Output parsing required
|
|
- Can't leverage pi's AI API
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
pi.registerTool({
|
|
name: "orch_consensus",
|
|
description: "Query multiple AI models for consensus on a question",
|
|
parameters: Type.Object({
|
|
prompt: Type.String({ description: "Question to ask" }),
|
|
models: Type.Array(Type.String(), { description: "Model aliases (flash, gemini, gpt, claude, etc.)" }),
|
|
mode: Type.Optional(Type.Enum({ vote: "vote", brainstorm: "brainstorm", critique: "critique", open: "open" })),
|
|
files: Type.Optional(Type.Array(Type.String(), { description: "Paths to include as context" })),
|
|
}),
|
|
async execute(toolCallId, params, onUpdate, ctx, signal) {
|
|
const args = ["consensus", params.prompt, ...params.models];
|
|
if (params.mode) args.push("--mode", params.mode);
|
|
if (params.files) params.files.forEach(f => args.push("--file", f));
|
|
|
|
const result = await pi.exec("orch", args);
|
|
return { content: [{ type: "text", text: result.stdout }] };
|
|
}
|
|
});
|
|
```
|
|
|
|
**Context issue**: Agent would need to manually provide conversation context:
|
|
```typescript
|
|
// Agent would have to do this:
|
|
const context = serializeConversation(ctx.sessionManager.getBranch());
|
|
const contextFile = writeToTempFile(context);
|
|
args.push("--file", contextFile);
|
|
```
|
|
|
|
---
|
|
|
|
### Option 2: Oracle-Style Extension with Orch Models
|
|
|
|
**Approach**: Port Oracle's UI/UX but use orch's model registry
|
|
|
|
**Pros**:
|
|
- Best UX: interactive picker, add-to-context, full conversation sharing
|
|
- Native pi integration, no subprocess
|
|
- Can query multiple models and show side-by-side
|
|
- Direct access to pi's AI API
|
|
|
|
**Cons**:
|
|
- Doesn't leverage orch's advanced features (voting, synthesis, serial strategies)
|
|
- Duplicate model registry (though could import from orch config)
|
|
- More code to maintain
|
|
- Loses orch's CLI flexibility (piping, session export, etc.)
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
pi.registerCommand("consensus", {
|
|
description: "Get consensus from multiple models",
|
|
handler: async (args, ctx) => {
|
|
// 1. Show model picker (multi-select)
|
|
const models = await ctx.ui.custom(
|
|
(tui, theme, kb, done) => new ModelPickerComponent(theme, done, { multiSelect: true })
|
|
);
|
|
|
|
// 2. Serialize conversation context
|
|
const conversationHistory = serializeConversation(ctx.sessionManager.getBranch());
|
|
|
|
// 3. Query models in parallel
|
|
const promises = models.map(m =>
|
|
complete(m.model, [
|
|
...conversationHistory.map(convertToLlm),
|
|
{ role: "user", content: args }
|
|
], m.apiKey)
|
|
);
|
|
|
|
// 4. Show results in comparison view
|
|
const results = await Promise.all(promises);
|
|
await ctx.ui.custom(
|
|
(tui, theme, kb, done) => new ConsensusResultComponent(results, theme, done)
|
|
);
|
|
|
|
// 5. Add to context?
|
|
const shouldAdd = await ctx.ui.confirm("Add responses to conversation context?");
|
|
if (shouldAdd) {
|
|
// Append all responses or synthesized summary
|
|
ctx.sessionManager.appendMessage({
|
|
role: "assistant",
|
|
content: formatConsensus(results)
|
|
});
|
|
}
|
|
}
|
|
});
|
|
```
|
|
|
|
**Features to implement**:
|
|
- Multi-select model picker (checkboxes)
|
|
- Parallel query with progress indicators
|
|
- Side-by-side result display with scrolling
|
|
- Voting mode: parse "SUPPORT/OPPOSE/NEUTRAL" from responses
|
|
- Add-to-context with synthesis option
|
|
|
|
---
|
|
|
|
### Option 3: Hybrid Approach
|
|
|
|
**Approach**: Keep orch CLI for advanced use, add Oracle-style extension for quick queries
|
|
|
|
**Pros**:
|
|
- Best of both worlds
|
|
- Agent can use tool for programmatic access
|
|
- User can use `/oracle` for interactive queries
|
|
- Orch handles complex scenarios (serial strategies, synthesis)
|
|
- Oracle handles quick second opinions
|
|
|
|
**Cons**:
|
|
- Two parallel systems to maintain
|
|
- Potential confusion about which to use
|
|
|
|
**Implementation**:
|
|
|
|
**Tool (for agent)**:
|
|
```typescript
|
|
pi.registerTool({
|
|
name: "orch_consensus",
|
|
// ... as in Option 1, shells out to orch CLI
|
|
});
|
|
```
|
|
|
|
**Command (for user)**:
|
|
```typescript
|
|
pi.registerCommand("oracle", {
|
|
description: "Get second opinion from another model",
|
|
// ... as in Option 2, native UI integration
|
|
});
|
|
```
|
|
|
|
**Usage patterns**:
|
|
- User types `/oracle <prompt>` → interactive picker, add-to-context flow
|
|
- Agent calls `orch_consensus()` → structured vote results in tool output
|
|
- Agent suggests: "I can get consensus from multiple models using orch_consensus if you'd like"
|
|
- User can also run `orch` directly in shell for advanced features
|
|
|
|
---
|
|
|
|
### Option 4: Enhanced Oracle with Orch Backend
|
|
|
|
**Approach**: Oracle UI that calls orch CLI under the hood
|
|
|
|
**Pros**:
|
|
- Leverage orch's features through nice UI
|
|
- Single source of truth (orch)
|
|
- Can expose orch modes/options in UI
|
|
|
|
**Cons**:
|
|
- Subprocess overhead
|
|
- Hard to share conversation context (orch doesn't expect serialized conversations)
|
|
- Awkward impedance mismatch
|
|
|
|
**Implementation challenges**:
|
|
```typescript
|
|
// How to pass conversation context to orch?
|
|
// Orch expects a prompt, not a conversation history
|
|
|
|
// Option A: Serialize entire conversation to temp file
|
|
const contextFile = "/tmp/pi-conversation.txt";
|
|
fs.writeFileSync(contextFile, formatConversation(history));
|
|
await pi.exec("orch", ["consensus", prompt, ...models, "--file", contextFile]);
|
|
|
|
// Option B: Inject context into prompt
|
|
const augmentedPrompt = `
|
|
Given this conversation:
|
|
${formatConversation(history)}
|
|
|
|
Answer this question: ${prompt}
|
|
`;
|
|
await pi.exec("orch", ["consensus", augmentedPrompt, ...models]);
|
|
```
|
|
|
|
Both are awkward because orch's input model doesn't match pi's conversation model.
|
|
|
|
---
|
|
|
|
## Recommendation
|
|
|
|
### Short Term: Option 3 (Hybrid)
|
|
|
|
**Rationale**:
|
|
1. **Keep orch CLI** for its strengths:
|
|
- 423 models (way more than Oracle)
|
|
- Voting/synthesis/serial strategies
|
|
- CLI flexibility (piping, sessions, export)
|
|
- Already works, well-tested
|
|
|
|
2. **Add Oracle-style extension** for its strengths:
|
|
- Interactive UI (model picker, results display)
|
|
- Conversation context sharing
|
|
- Add-to-context workflow
|
|
- Quick keys, better UX
|
|
|
|
3. **Clear division of labor**:
|
|
- `/oracle` → quick second opinion, inherits conversation, nice UI
|
|
- `orch_consensus` tool → agent programmatic access, structured voting
|
|
- `orch` CLI → advanced features (synthesis, serial strategies, sessions)
|
|
|
|
### Long Term: Option 2 (Native Integration) + Orch as Fallback
|
|
|
|
**Rationale**:
|
|
Eventually, we want:
|
|
1. Native pi tool with full UI integration
|
|
2. Access to orch's model registry (import from config)
|
|
3. Voting, synthesis, comparison built into UI
|
|
4. Conversation context sharing by default
|
|
|
|
But keep `orch` CLI for:
|
|
- Session management
|
|
- Export/archival
|
|
- Scripting/automation
|
|
- Features not yet in pi extension
|
|
|
|
---
|
|
|
|
## Implementation Plan
|
|
|
|
### Phase 1: Oracle Extension (Week 1)
|
|
|
|
**Goal**: Interactive second opinion with conversation context
|
|
|
|
**Tasks**:
|
|
1. Port Oracle extension from shitty-extensions
|
|
2. Add model aliases from orch config
|
|
3. Implement model picker with multi-select
|
|
4. Conversation context serialization
|
|
5. Add-to-context workflow
|
|
6. Test with flash/gemini/gpt/claude
|
|
|
|
**Deliverable**: `/oracle` command for quick second opinions
|
|
|
|
### Phase 2: Orch Tool Wrapper (Week 2)
|
|
|
|
**Goal**: Agent can invoke orch programmatically
|
|
|
|
**Tasks**:
|
|
1. Register `orch_consensus` tool
|
|
2. Map tool parameters to orch CLI args
|
|
3. Serialize conversation context to temp file
|
|
4. Parse orch output (vote results)
|
|
5. Format for agent consumption
|
|
|
|
**Deliverable**: Agent can call orch for structured consensus
|
|
|
|
### Phase 3: Enhanced Oracle UI (Week 3-4)
|
|
|
|
**Goal**: Side-by-side comparison and voting
|
|
|
|
**Tasks**:
|
|
1. Multi-model query in parallel
|
|
2. Split-pane result display
|
|
3. Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
|
|
4. Consensus gauge visualization
|
|
5. Diff highlighting (show disagreements)
|
|
6. Cost preview before query
|
|
|
|
**Deliverable**: Rich consensus UI with voting
|
|
|
|
### Phase 4: Advanced Features (Month 2)
|
|
|
|
**Goal**: Match orch's advanced features
|
|
|
|
**Tasks**:
|
|
1. Synthesis mode (aggregate responses)
|
|
2. Serial strategies (refine, debate)
|
|
3. Stance modifiers (:for/:against)
|
|
4. Response caching (don't re-query)
|
|
5. Model recommendations based on query
|
|
6. Export to file/issue
|
|
|
|
**Deliverable**: Feature parity with orch CLI
|
|
|
|
---
|
|
|
|
## Technical Details
|
|
|
|
### Model Registry Sharing
|
|
|
|
**Current state**: Orch has 423 models in Python config
|
|
|
|
**Options**:
|
|
1. **Import orch config** - Parse orch's model registry
|
|
2. **Duplicate registry** - Maintain separate TypeScript registry
|
|
3. **Query orch** - Call `orch models` and parse output
|
|
|
|
**Recommendation**: Start with (3), migrate to (1) later
|
|
|
|
```typescript
|
|
async function getOrchModels(): Promise<ModelAlias[]> {
|
|
const { stdout } = await pi.exec("orch", ["models"]);
|
|
return parseOrchModels(stdout);
|
|
}
|
|
```
|
|
|
|
### Conversation Context Serialization
|
|
|
|
**Challenge**: Pi's conversation format ≠ standard chat format
|
|
|
|
**Solution**: Use pi's built-in `serializeConversation()` and `convertToLlm()`
|
|
|
|
```typescript
|
|
import { serializeConversation, convertToLlm } from "@mariozechner/pi-coding-agent";
|
|
|
|
const history = ctx.sessionManager.getBranch();
|
|
const serialized = serializeConversation(history);
|
|
const llmMessages = serialized.map(convertToLlm);
|
|
|
|
// Now compatible with any model's chat API
|
|
const response = await complete(model, llmMessages, apiKey);
|
|
```
|
|
|
|
### Add-to-Context Workflow
|
|
|
|
**UI Flow**:
|
|
1. Show consensus results
|
|
2. Prompt: "Add responses to conversation context?"
|
|
3. Options:
|
|
- YES - Add all responses (verbose)
|
|
- SUMMARY - Add synthesized summary (concise)
|
|
- NO - Don't add
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
const choice = await ctx.ui.select("Add to context?", [
|
|
"Yes, add all responses",
|
|
"Yes, add synthesized summary",
|
|
"No, keep separate"
|
|
]);
|
|
|
|
if (choice === 0) {
|
|
// Append all model responses
|
|
for (const result of results) {
|
|
ctx.sessionManager.appendMessage({
|
|
role: "assistant",
|
|
content: `[${result.modelName}]: ${result.response}`
|
|
});
|
|
}
|
|
} else if (choice === 1) {
|
|
// Synthesize and append
|
|
const summary = await synthesize(results, "gemini");
|
|
ctx.sessionManager.appendMessage({
|
|
role: "assistant",
|
|
content: `[Consensus]: ${summary}`
|
|
});
|
|
}
|
|
```
|
|
|
|
### Vote Parsing
|
|
|
|
**Challenge**: Extract SUPPORT/OPPOSE/NEUTRAL from freeform responses
|
|
|
|
**Strategies**:
|
|
1. **Prompt engineering** - Ask models to start response with verdict
|
|
2. **Regex matching** - Parse structured output
|
|
3. **Secondary query** - Ask "classify this response as SUPPORT/OPPOSE/NEUTRAL"
|
|
|
|
**Recommendation**: (1) with (3) as fallback
|
|
|
|
```typescript
|
|
const votePrompt = `${originalPrompt}
|
|
|
|
Respond with your verdict first: SUPPORT, OPPOSE, or NEUTRAL
|
|
Then explain your reasoning.`;
|
|
|
|
const response = await complete(model, [...history, { role: "user", content: votePrompt }]);
|
|
|
|
const match = response.match(/^(SUPPORT|OPPOSE|NEUTRAL)/i);
|
|
const verdict = match ? match[1].toUpperCase() : "NEUTRAL";
|
|
```
|
|
|
|
### Cost Estimation
|
|
|
|
**Orch approach**: Uses pricing data in model registry
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
interface ModelInfo {
|
|
id: string;
|
|
name: string;
|
|
inputCostPer1M: number;
|
|
outputCostPer1M: number;
|
|
}
|
|
|
|
function estimateCost(prompt: string, history: Message[], models: ModelInfo[]): number {
|
|
const inputTokens = estimateTokens([...history, { role: "user", content: prompt }]);
|
|
const outputTokens = 1000; // Estimate
|
|
|
|
return models.reduce((total, m) => {
|
|
const inputCost = (inputTokens / 1_000_000) * m.inputCostPer1M;
|
|
const outputCost = (outputTokens / 1_000_000) * m.outputCostPer1M;
|
|
return total + inputCost + outputCost;
|
|
}, 0);
|
|
}
|
|
|
|
// Show before querying
|
|
const cost = estimateCost(prompt, history, selectedModels);
|
|
const confirmed = await ctx.ui.confirm(`Estimated cost: $${cost.toFixed(3)}. Continue?`);
|
|
```
|
|
|
|
---
|
|
|
|
## Design Questions
|
|
|
|
### 1. Should Oracle query multiple models or just one?
|
|
|
|
**Current Oracle**: One model at a time
|
|
**Orch**: Multiple models in parallel
|
|
|
|
**Recommendation**: Support both
|
|
- `/oracle <prompt>` → single model picker (quick second opinion)
|
|
- `/oracle-consensus <prompt>` → multi-select picker (true consensus)
|
|
|
|
Or:
|
|
- `/oracle` with Shift+Enter for multi-select
|
|
|
|
### 2. Should results auto-add to context or always prompt?
|
|
|
|
**Current Oracle**: Always prompts
|
|
**Orch**: No context, just output
|
|
|
|
**Recommendation**: Make it configurable
|
|
- Default: always prompt
|
|
- Setting: `oracle.autoAddToContext = true` to skip prompt
|
|
- ESC = don't add (quick exit)
|
|
|
|
### 3. How to handle expensive models?
|
|
|
|
**Orch**: Requires `--allow-expensive` flag
|
|
|
|
**Recommendation**: Show cost and prompt
|
|
- Model picker shows cost per model
|
|
- Selecting opus/r1 shows warning: "This is expensive ($X per query). Continue?"
|
|
- Can disable in settings
|
|
|
|
### 4. Should we cache responses?
|
|
|
|
**Problem**: Querying same prompt to same model multiple times wastes money
|
|
|
|
**Recommendation**: Short-term cache
|
|
- Cache key: `hash(model + conversation_context + prompt)`
|
|
- TTL: 5 minutes
|
|
- Show indicator: "(cached)" in results
|
|
- Option to force refresh
|
|
|
|
### 5. How to visualize consensus?
|
|
|
|
**Options**:
|
|
1. List view (like orch) - each model's response sequentially
|
|
2. Side-by-side - split screen with responses in columns
|
|
3. Gauge - visual consensus meter (% support)
|
|
4. Diff view - highlight agreements/disagreements
|
|
|
|
**Recommendation**: Progressive disclosure
|
|
- Initial: Gauge + vote counts
|
|
- Expand: List view with reasoning
|
|
- Advanced: Side-by-side diff view
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Prototype Oracle extension** (today)
|
|
- Port from shitty-extensions
|
|
- Test with flash/gemini
|
|
- Verify conversation context sharing
|
|
|
|
2. **Design consensus UI** (tomorrow)
|
|
- Sketch multi-model result layout
|
|
- Decide on vote visualization
|
|
- Mock up add-to-context flow
|
|
|
|
3. **Implement model picker** (day 3)
|
|
- Multi-select support
|
|
- Quick keys (1-9 for single, checkboxes for multi)
|
|
- Show cost/capabilities
|
|
- Filter by authenticated models
|
|
|
|
4. **Build comparison view** (day 4-5)
|
|
- Parallel query execution
|
|
- Progress indicators
|
|
- Side-by-side results
|
|
- Diff highlighting
|
|
|
|
5. **Add orch tool wrapper** (day 6)
|
|
- Register tool for agent use
|
|
- Map parameters to CLI args
|
|
- Parse vote output
|
|
|
|
6. **Integration testing** (day 7)
|
|
- Test with real conversations
|
|
- Verify context sharing works
|
|
- Check cost estimates
|
|
- Test with slow models (timeout handling)
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
**Must Have**:
|
|
- [ ] `/oracle` command works with conversation context
|
|
- [ ] Model picker shows authenticated models only
|
|
- [ ] Results display with add-to-context option
|
|
- [ ] Multi-model query in parallel
|
|
- [ ] Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
|
|
- [ ] Cost estimation before query
|
|
|
|
**Nice to Have**:
|
|
- [ ] Side-by-side comparison view
|
|
- [ ] Diff highlighting for disagreements
|
|
- [ ] Response caching (5min TTL)
|
|
- [ ] Model recommendations based on query
|
|
- [ ] Export consensus to file/issue
|
|
- [ ] Serial strategies (refine, debate)
|
|
|
|
**Stretch Goals**:
|
|
- [ ] Synthesis mode with custom prompts
|
|
- [ ] Confidence scoring
|
|
- [ ] Conversation branching
|
|
- [ ] Historical consensus tracking
|
|
- [ ] Model capability filtering (vision/reasoning/coding)
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [orch CLI](https://github.com/yourusername/orch) - Current implementation
|
|
- [shitty-extensions/oracle.ts](https://github.com/hjanuschka/shitty-extensions/blob/main/extensions/oracle.ts)
|
|
- [pi-mono extension docs](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/extensions.md)
|
|
- [pi-mono TUI docs](https://github.com/badlogic/pi-mono/blob/main/packages/tui/README.md)
|