docs: add pi extension ecosystem and synod research
Research conducted 2026-01-22: - pi-extension-ecosystem-research.md: 56 GitHub projects, 52 official examples - pi-ui-ecosystem-research.md: TUI patterns, components, overlays - multi-model-consensus-analysis.md: gap analysis leading to /synod design
This commit is contained in:
parent
8728746491
commit
bffa966e76
File diff suppressed because one or more lines are too long
701
docs/research/multi-model-consensus-analysis.md
Normal file
701
docs/research/multi-model-consensus-analysis.md
Normal file
|
|
@ -0,0 +1,701 @@
|
|||
# Multi-Model Consensus: Current State & Pi Integration Analysis
|
||||
|
||||
**Date**: 2026-01-22
|
||||
**Purpose**: Analyze what we have in orch vs what pi needs for multi-model consensus
|
||||
|
||||
---
|
||||
|
||||
## What We Have: Orch CLI
|
||||
|
||||
### Core Capabilities
|
||||
|
||||
**Commands**:
|
||||
1. `orch consensus` - Parallel multi-model queries with vote/brainstorm/critique/open modes
|
||||
2. `orch chat` - Single-model conversation with session management
|
||||
3. `orch models` - List/resolve 423 available models
|
||||
4. `orch sessions` - Manage conversation history
|
||||
|
||||
**Key Features**:
|
||||
|
||||
**Model Selection**:
|
||||
- 423 models across providers (OpenAI, Anthropic, Google, DeepSeek, Qwen, Perplexity, etc.)
|
||||
- Aliases: `flash`, `gemini`, `gpt`, `claude`, `sonnet`, `opus`, `haiku`, `deepseek`, `r1`, `qwen`
|
||||
- Stance modifiers: `gpt:for`, `claude:against`, `gemini:neutral`
|
||||
- Cost awareness: `--allow-expensive` for opus/r1
|
||||
|
||||
**Modes**:
|
||||
- `vote` - Support/Oppose/Neutral verdict with reasoning
|
||||
- `brainstorm` - Generate ideas without judgment
|
||||
- `critique` - Find flaws and weaknesses
|
||||
- `open` - Freeform responses
|
||||
|
||||
**Context**:
|
||||
- File inclusion: `--file PATH` (multiple allowed)
|
||||
- Stdin piping: `cat code.py | orch consensus "..."`
|
||||
- Session continuity: `--session ID` for chat mode
|
||||
- Web search: `--websearch` (Gemini only)
|
||||
|
||||
**Execution**:
|
||||
- Parallel by default, `--serial` for sequential
|
||||
- Serial strategies: neutral, refine, debate, brainstorm
|
||||
- Synthesis: `--synthesize MODEL` to aggregate responses
|
||||
- Timeout control: `--timeout SECS`
|
||||
|
||||
**Output**:
|
||||
- Structured vote results with verdict counts
|
||||
- Reasoning for each model
|
||||
- Color-coded output (SUPPORT/OPPOSE/NEUTRAL)
|
||||
- Session IDs for continuation
|
||||
|
||||
### Current Skill Integration
|
||||
|
||||
**Location**: `~/.codex/skills/orch/`
|
||||
|
||||
**What it provides**:
|
||||
- Documentation of orch capabilities
|
||||
- Usage patterns (second opinion, architecture decision, code review, devil's advocate, etc.)
|
||||
- Model selection guidance
|
||||
- Conversational patterns (session-based multi-turn, cross-model dialogue, iterative refinement)
|
||||
- Combined patterns (explore then validate)
|
||||
|
||||
**What it does NOT provide**:
|
||||
- Direct agent tool invocation (agent must shell out to `orch`)
|
||||
- UI integration (no pickers, no inline results)
|
||||
- Conversation context sharing (agent's conversation ≠ orch's conversation)
|
||||
- Interactive model selection
|
||||
- Add-to-context workflow
|
||||
|
||||
---
|
||||
|
||||
## What Pi Oracle Extension Provides
|
||||
|
||||
### From shitty-extensions/oracle.ts
|
||||
|
||||
**UI Features**:
|
||||
- Interactive model picker overlay
|
||||
- Quick keys (1-9) for fast selection
|
||||
- Shows which models are authenticated/available
|
||||
- Excludes current model from picker
|
||||
- Formatted result display with scrolling
|
||||
|
||||
**Context Sharing**:
|
||||
- **Inherits full conversation context** - Oracle sees the entire pi conversation
|
||||
- Sends conversation history to queried model
|
||||
- No need to re-explain context
|
||||
|
||||
**Workflow**:
|
||||
1. User types `/oracle <prompt>`
|
||||
2. Model picker appears
|
||||
3. Select model with arrow keys or number
|
||||
4. Oracle queries model with **full conversation context + prompt**
|
||||
5. Result displays in scrollable overlay
|
||||
6. **"Add to context?" prompt** - YES/NO choice
|
||||
7. If YES, oracle response appends to conversation
|
||||
|
||||
**Model Awareness**:
|
||||
- Only shows models with valid API keys
|
||||
- Filters out current model
|
||||
- Groups by provider (OpenAI, Google, Anthropic, OpenAI Codex)
|
||||
|
||||
**Input Options**:
|
||||
- Direct: `/oracle -m gpt-4o <prompt>` (skips picker)
|
||||
- Files: `/oracle -f file.ts <prompt>` (includes file content)
|
||||
|
||||
**Implementation Details**:
|
||||
- Uses pi's `@mariozechner/pi-ai` complete() API
|
||||
- Serializes conversation with `serializeConversation()`
|
||||
- Converts to LLM format with `convertToLlm()`
|
||||
- Custom TUI component for result display
|
||||
- BorderedLoader during query
|
||||
|
||||
---
|
||||
|
||||
## Gap Analysis
|
||||
|
||||
### What Orch Has That Oracle Doesn't
|
||||
|
||||
1. **Multiple simultaneous queries** - Oracle queries one model at a time
|
||||
2. **Structured voting** - Support/Oppose/Neutral verdicts with counts
|
||||
3. **Multiple modes** - vote/brainstorm/critique/open (Oracle is always "open")
|
||||
4. **Stance modifiers** - :for/:against/:neutral bias (devil's advocate)
|
||||
5. **Serial strategies** - refine, debate, brainstorm sequences
|
||||
6. **Synthesis** - Aggregate multiple responses into summary
|
||||
7. **Session management** - Persistent conversation threads
|
||||
8. **423 models** - Far more models than Oracle's ~18
|
||||
9. **Cost awareness** - Explicit `--allow-expensive` gate
|
||||
10. **Web search** - Integrated search for Gemini/Perplexity
|
||||
11. **CLI flexibility** - File piping, stdin, session export
|
||||
|
||||
### What Oracle Has That Orch Doesn't
|
||||
|
||||
1. **Conversation context inheritance** - Oracle sees full pi conversation automatically
|
||||
2. **Interactive UI** - Model picker, scrollable results, keyboard navigation
|
||||
3. **Add-to-context workflow** - Explicit YES/NO to inject response
|
||||
4. **Current model exclusion** - Automatically filters out active model
|
||||
5. **Native pi integration** - No subprocess, uses pi's AI API directly
|
||||
6. **Quick keys** - 1-9 for instant model selection
|
||||
7. **Authenticated model filtering** - Only shows models with valid keys
|
||||
8. **Inline result display** - Formatted overlay with scrolling
|
||||
|
||||
### What Neither Has (Opportunities)
|
||||
|
||||
1. **Side-by-side comparison** - Show multiple model responses in split view
|
||||
2. **Vote visualization** - Bar chart or consensus gauge
|
||||
3. **Response diff** - Highlight disagreements between models
|
||||
4. **Model capability awareness** - Filter by vision/reasoning/coding/etc.
|
||||
5. **Cost preview** - Show estimated cost before querying
|
||||
6. **Cached responses** - Don't re-query same prompt to same model
|
||||
7. **Response export** - Save consensus to file/issue
|
||||
8. **Model recommendations** - Suggest models based on query type
|
||||
9. **Confidence scoring** - Gauge certainty in responses
|
||||
10. **Conversation branching** - Fork conversation with different models
|
||||
|
||||
---
|
||||
|
||||
## Pi Integration Options
|
||||
|
||||
### Option 1: Wrap Orch CLI as Tool
|
||||
|
||||
**Approach**: Register `orch` as a pi tool, shell out to CLI
|
||||
|
||||
**Pros**:
|
||||
- Minimal code, reuses existing orch
|
||||
- All orch features available (423 models, voting, synthesis, etc.)
|
||||
- Already works with current skill
|
||||
|
||||
**Cons**:
|
||||
- No conversation context sharing (pi's conversation ≠ orch's input)
|
||||
- No interactive UI (no model picker, no add-to-context)
|
||||
- Subprocess overhead
|
||||
- Output parsing required
|
||||
- Can't leverage pi's AI API
|
||||
|
||||
**Implementation**:
|
||||
```typescript
|
||||
pi.registerTool({
|
||||
name: "orch_consensus",
|
||||
description: "Query multiple AI models for consensus on a question",
|
||||
parameters: Type.Object({
|
||||
prompt: Type.String({ description: "Question to ask" }),
|
||||
models: Type.Array(Type.String(), { description: "Model aliases (flash, gemini, gpt, claude, etc.)" }),
|
||||
mode: Type.Optional(Type.Enum({ vote: "vote", brainstorm: "brainstorm", critique: "critique", open: "open" })),
|
||||
files: Type.Optional(Type.Array(Type.String(), { description: "Paths to include as context" })),
|
||||
}),
|
||||
async execute(toolCallId, params, onUpdate, ctx, signal) {
|
||||
const args = ["consensus", params.prompt, ...params.models];
|
||||
if (params.mode) args.push("--mode", params.mode);
|
||||
if (params.files) params.files.forEach(f => args.push("--file", f));
|
||||
|
||||
const result = await pi.exec("orch", args);
|
||||
return { content: [{ type: "text", text: result.stdout }] };
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Context issue**: Agent would need to manually provide conversation context:
|
||||
```typescript
|
||||
// Agent would have to do this:
|
||||
const context = serializeConversation(ctx.sessionManager.getBranch());
|
||||
const contextFile = writeToTempFile(context);
|
||||
args.push("--file", contextFile);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Option 2: Oracle-Style Extension with Orch Models
|
||||
|
||||
**Approach**: Port Oracle's UI/UX but use orch's model registry
|
||||
|
||||
**Pros**:
|
||||
- Best UX: interactive picker, add-to-context, full conversation sharing
|
||||
- Native pi integration, no subprocess
|
||||
- Can query multiple models and show side-by-side
|
||||
- Direct access to pi's AI API
|
||||
|
||||
**Cons**:
|
||||
- Doesn't leverage orch's advanced features (voting, synthesis, serial strategies)
|
||||
- Duplicate model registry (though could import from orch config)
|
||||
- More code to maintain
|
||||
- Loses orch's CLI flexibility (piping, session export, etc.)
|
||||
|
||||
**Implementation**:
|
||||
```typescript
|
||||
pi.registerCommand("consensus", {
|
||||
description: "Get consensus from multiple models",
|
||||
handler: async (args, ctx) => {
|
||||
// 1. Show model picker (multi-select)
|
||||
const models = await ctx.ui.custom(
|
||||
(tui, theme, kb, done) => new ModelPickerComponent(theme, done, { multiSelect: true })
|
||||
);
|
||||
|
||||
// 2. Serialize conversation context
|
||||
const conversationHistory = serializeConversation(ctx.sessionManager.getBranch());
|
||||
|
||||
// 3. Query models in parallel
|
||||
const promises = models.map(m =>
|
||||
complete(m.model, [
|
||||
...conversationHistory.map(convertToLlm),
|
||||
{ role: "user", content: args }
|
||||
], m.apiKey)
|
||||
);
|
||||
|
||||
// 4. Show results in comparison view
|
||||
const results = await Promise.all(promises);
|
||||
await ctx.ui.custom(
|
||||
(tui, theme, kb, done) => new ConsensusResultComponent(results, theme, done)
|
||||
);
|
||||
|
||||
// 5. Add to context?
|
||||
const shouldAdd = await ctx.ui.confirm("Add responses to conversation context?");
|
||||
if (shouldAdd) {
|
||||
// Append all responses or synthesized summary
|
||||
ctx.sessionManager.appendMessage({
|
||||
role: "assistant",
|
||||
content: formatConsensus(results)
|
||||
});
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Features to implement**:
|
||||
- Multi-select model picker (checkboxes)
|
||||
- Parallel query with progress indicators
|
||||
- Side-by-side result display with scrolling
|
||||
- Voting mode: parse "SUPPORT/OPPOSE/NEUTRAL" from responses
|
||||
- Add-to-context with synthesis option
|
||||
|
||||
---
|
||||
|
||||
### Option 3: Hybrid Approach
|
||||
|
||||
**Approach**: Keep orch CLI for advanced use, add Oracle-style extension for quick queries
|
||||
|
||||
**Pros**:
|
||||
- Best of both worlds
|
||||
- Agent can use tool for programmatic access
|
||||
- User can use `/oracle` for interactive queries
|
||||
- Orch handles complex scenarios (serial strategies, synthesis)
|
||||
- Oracle handles quick second opinions
|
||||
|
||||
**Cons**:
|
||||
- Two parallel systems to maintain
|
||||
- Potential confusion about which to use
|
||||
|
||||
**Implementation**:
|
||||
|
||||
**Tool (for agent)**:
|
||||
```typescript
|
||||
pi.registerTool({
|
||||
name: "orch_consensus",
|
||||
// ... as in Option 1, shells out to orch CLI
|
||||
});
|
||||
```
|
||||
|
||||
**Command (for user)**:
|
||||
```typescript
|
||||
pi.registerCommand("oracle", {
|
||||
description: "Get second opinion from another model",
|
||||
// ... as in Option 2, native UI integration
|
||||
});
|
||||
```
|
||||
|
||||
**Usage patterns**:
|
||||
- User types `/oracle <prompt>` → interactive picker, add-to-context flow
|
||||
- Agent calls `orch_consensus()` → structured vote results in tool output
|
||||
- Agent suggests: "I can get consensus from multiple models using orch_consensus if you'd like"
|
||||
- User can also run `orch` directly in shell for advanced features
|
||||
|
||||
---
|
||||
|
||||
### Option 4: Enhanced Oracle with Orch Backend
|
||||
|
||||
**Approach**: Oracle UI that calls orch CLI under the hood
|
||||
|
||||
**Pros**:
|
||||
- Leverage orch's features through nice UI
|
||||
- Single source of truth (orch)
|
||||
- Can expose orch modes/options in UI
|
||||
|
||||
**Cons**:
|
||||
- Subprocess overhead
|
||||
- Hard to share conversation context (orch doesn't expect serialized conversations)
|
||||
- Awkward impedance mismatch
|
||||
|
||||
**Implementation challenges**:
|
||||
```typescript
|
||||
// How to pass conversation context to orch?
|
||||
// Orch expects a prompt, not a conversation history
|
||||
|
||||
// Option A: Serialize entire conversation to temp file
|
||||
const contextFile = "/tmp/pi-conversation.txt";
|
||||
fs.writeFileSync(contextFile, formatConversation(history));
|
||||
await pi.exec("orch", ["consensus", prompt, ...models, "--file", contextFile]);
|
||||
|
||||
// Option B: Inject context into prompt
|
||||
const augmentedPrompt = `
|
||||
Given this conversation:
|
||||
${formatConversation(history)}
|
||||
|
||||
Answer this question: ${prompt}
|
||||
`;
|
||||
await pi.exec("orch", ["consensus", augmentedPrompt, ...models]);
|
||||
```
|
||||
|
||||
Both are awkward because orch's input model doesn't match pi's conversation model.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
### Short Term: Option 3 (Hybrid)
|
||||
|
||||
**Rationale**:
|
||||
1. **Keep orch CLI** for its strengths:
|
||||
- 423 models (way more than Oracle)
|
||||
- Voting/synthesis/serial strategies
|
||||
- CLI flexibility (piping, sessions, export)
|
||||
- Already works, well-tested
|
||||
|
||||
2. **Add Oracle-style extension** for its strengths:
|
||||
- Interactive UI (model picker, results display)
|
||||
- Conversation context sharing
|
||||
- Add-to-context workflow
|
||||
- Quick keys, better UX
|
||||
|
||||
3. **Clear division of labor**:
|
||||
- `/oracle` → quick second opinion, inherits conversation, nice UI
|
||||
- `orch_consensus` tool → agent programmatic access, structured voting
|
||||
- `orch` CLI → advanced features (synthesis, serial strategies, sessions)
|
||||
|
||||
### Long Term: Option 2 (Native Integration) + Orch as Fallback
|
||||
|
||||
**Rationale**:
|
||||
Eventually, we want:
|
||||
1. Native pi tool with full UI integration
|
||||
2. Access to orch's model registry (import from config)
|
||||
3. Voting, synthesis, comparison built into UI
|
||||
4. Conversation context sharing by default
|
||||
|
||||
But keep `orch` CLI for:
|
||||
- Session management
|
||||
- Export/archival
|
||||
- Scripting/automation
|
||||
- Features not yet in pi extension
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Oracle Extension (Week 1)
|
||||
|
||||
**Goal**: Interactive second opinion with conversation context
|
||||
|
||||
**Tasks**:
|
||||
1. Port Oracle extension from shitty-extensions
|
||||
2. Add model aliases from orch config
|
||||
3. Implement model picker with multi-select
|
||||
4. Conversation context serialization
|
||||
5. Add-to-context workflow
|
||||
6. Test with flash/gemini/gpt/claude
|
||||
|
||||
**Deliverable**: `/oracle` command for quick second opinions
|
||||
|
||||
### Phase 2: Orch Tool Wrapper (Week 2)
|
||||
|
||||
**Goal**: Agent can invoke orch programmatically
|
||||
|
||||
**Tasks**:
|
||||
1. Register `orch_consensus` tool
|
||||
2. Map tool parameters to orch CLI args
|
||||
3. Serialize conversation context to temp file
|
||||
4. Parse orch output (vote results)
|
||||
5. Format for agent consumption
|
||||
|
||||
**Deliverable**: Agent can call orch for structured consensus
|
||||
|
||||
### Phase 3: Enhanced Oracle UI (Week 3-4)
|
||||
|
||||
**Goal**: Side-by-side comparison and voting
|
||||
|
||||
**Tasks**:
|
||||
1. Multi-model query in parallel
|
||||
2. Split-pane result display
|
||||
3. Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
|
||||
4. Consensus gauge visualization
|
||||
5. Diff highlighting (show disagreements)
|
||||
6. Cost preview before query
|
||||
|
||||
**Deliverable**: Rich consensus UI with voting
|
||||
|
||||
### Phase 4: Advanced Features (Month 2)
|
||||
|
||||
**Goal**: Match orch's advanced features
|
||||
|
||||
**Tasks**:
|
||||
1. Synthesis mode (aggregate responses)
|
||||
2. Serial strategies (refine, debate)
|
||||
3. Stance modifiers (:for/:against)
|
||||
4. Response caching (don't re-query)
|
||||
5. Model recommendations based on query
|
||||
6. Export to file/issue
|
||||
|
||||
**Deliverable**: Feature parity with orch CLI
|
||||
|
||||
---
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Model Registry Sharing
|
||||
|
||||
**Current state**: Orch has 423 models in Python config
|
||||
|
||||
**Options**:
|
||||
1. **Import orch config** - Parse orch's model registry
|
||||
2. **Duplicate registry** - Maintain separate TypeScript registry
|
||||
3. **Query orch** - Call `orch models` and parse output
|
||||
|
||||
**Recommendation**: Start with (3), migrate to (1) later
|
||||
|
||||
```typescript
|
||||
async function getOrchModels(): Promise<ModelAlias[]> {
|
||||
const { stdout } = await pi.exec("orch", ["models"]);
|
||||
return parseOrchModels(stdout);
|
||||
}
|
||||
```
|
||||
|
||||
### Conversation Context Serialization
|
||||
|
||||
**Challenge**: Pi's conversation format ≠ standard chat format
|
||||
|
||||
**Solution**: Use pi's built-in `serializeConversation()` and `convertToLlm()`
|
||||
|
||||
```typescript
|
||||
import { serializeConversation, convertToLlm } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
const history = ctx.sessionManager.getBranch();
|
||||
const serialized = serializeConversation(history);
|
||||
const llmMessages = serialized.map(convertToLlm);
|
||||
|
||||
// Now compatible with any model's chat API
|
||||
const response = await complete(model, llmMessages, apiKey);
|
||||
```
|
||||
|
||||
### Add-to-Context Workflow
|
||||
|
||||
**UI Flow**:
|
||||
1. Show consensus results
|
||||
2. Prompt: "Add responses to conversation context?"
|
||||
3. Options:
|
||||
- YES - Add all responses (verbose)
|
||||
- SUMMARY - Add synthesized summary (concise)
|
||||
- NO - Don't add
|
||||
|
||||
**Implementation**:
|
||||
```typescript
|
||||
const choice = await ctx.ui.select("Add to context?", [
|
||||
"Yes, add all responses",
|
||||
"Yes, add synthesized summary",
|
||||
"No, keep separate"
|
||||
]);
|
||||
|
||||
if (choice === 0) {
|
||||
// Append all model responses
|
||||
for (const result of results) {
|
||||
ctx.sessionManager.appendMessage({
|
||||
role: "assistant",
|
||||
content: `[${result.modelName}]: ${result.response}`
|
||||
});
|
||||
}
|
||||
} else if (choice === 1) {
|
||||
// Synthesize and append
|
||||
const summary = await synthesize(results, "gemini");
|
||||
ctx.sessionManager.appendMessage({
|
||||
role: "assistant",
|
||||
content: `[Consensus]: ${summary}`
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Vote Parsing
|
||||
|
||||
**Challenge**: Extract SUPPORT/OPPOSE/NEUTRAL from freeform responses
|
||||
|
||||
**Strategies**:
|
||||
1. **Prompt engineering** - Ask models to start response with verdict
|
||||
2. **Regex matching** - Parse structured output
|
||||
3. **Secondary query** - Ask "classify this response as SUPPORT/OPPOSE/NEUTRAL"
|
||||
|
||||
**Recommendation**: (1) with (3) as fallback
|
||||
|
||||
```typescript
|
||||
const votePrompt = `${originalPrompt}
|
||||
|
||||
Respond with your verdict first: SUPPORT, OPPOSE, or NEUTRAL
|
||||
Then explain your reasoning.`;
|
||||
|
||||
const response = await complete(model, [...history, { role: "user", content: votePrompt }]);
|
||||
|
||||
const match = response.match(/^(SUPPORT|OPPOSE|NEUTRAL)/i);
|
||||
const verdict = match ? match[1].toUpperCase() : "NEUTRAL";
|
||||
```
|
||||
|
||||
### Cost Estimation
|
||||
|
||||
**Orch approach**: Uses pricing data in model registry
|
||||
|
||||
**Implementation**:
|
||||
```typescript
|
||||
interface ModelInfo {
|
||||
id: string;
|
||||
name: string;
|
||||
inputCostPer1M: number;
|
||||
outputCostPer1M: number;
|
||||
}
|
||||
|
||||
function estimateCost(prompt: string, history: Message[], models: ModelInfo[]): number {
|
||||
const inputTokens = estimateTokens([...history, { role: "user", content: prompt }]);
|
||||
const outputTokens = 1000; // Estimate
|
||||
|
||||
return models.reduce((total, m) => {
|
||||
const inputCost = (inputTokens / 1_000_000) * m.inputCostPer1M;
|
||||
const outputCost = (outputTokens / 1_000_000) * m.outputCostPer1M;
|
||||
return total + inputCost + outputCost;
|
||||
}, 0);
|
||||
}
|
||||
|
||||
// Show before querying
|
||||
const cost = estimateCost(prompt, history, selectedModels);
|
||||
const confirmed = await ctx.ui.confirm(`Estimated cost: $${cost.toFixed(3)}. Continue?`);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design Questions
|
||||
|
||||
### 1. Should Oracle query multiple models or just one?
|
||||
|
||||
**Current Oracle**: One model at a time
|
||||
**Orch**: Multiple models in parallel
|
||||
|
||||
**Recommendation**: Support both
|
||||
- `/oracle <prompt>` → single model picker (quick second opinion)
|
||||
- `/oracle-consensus <prompt>` → multi-select picker (true consensus)
|
||||
|
||||
Or:
|
||||
- `/oracle` with Shift+Enter for multi-select
|
||||
|
||||
### 2. Should results auto-add to context or always prompt?
|
||||
|
||||
**Current Oracle**: Always prompts
|
||||
**Orch**: No context, just output
|
||||
|
||||
**Recommendation**: Make it configurable
|
||||
- Default: always prompt
|
||||
- Setting: `oracle.autoAddToContext = true` to skip prompt
|
||||
- ESC = don't add (quick exit)
|
||||
|
||||
### 3. How to handle expensive models?
|
||||
|
||||
**Orch**: Requires `--allow-expensive` flag
|
||||
|
||||
**Recommendation**: Show cost and prompt
|
||||
- Model picker shows cost per model
|
||||
- Selecting opus/r1 shows warning: "This is expensive ($X per query). Continue?"
|
||||
- Can disable in settings
|
||||
|
||||
### 4. Should we cache responses?
|
||||
|
||||
**Problem**: Querying same prompt to same model multiple times wastes money
|
||||
|
||||
**Recommendation**: Short-term cache
|
||||
- Cache key: `hash(model + conversation_context + prompt)`
|
||||
- TTL: 5 minutes
|
||||
- Show indicator: "(cached)" in results
|
||||
- Option to force refresh
|
||||
|
||||
### 5. How to visualize consensus?
|
||||
|
||||
**Options**:
|
||||
1. List view (like orch) - each model's response sequentially
|
||||
2. Side-by-side - split screen with responses in columns
|
||||
3. Gauge - visual consensus meter (% support)
|
||||
4. Diff view - highlight agreements/disagreements
|
||||
|
||||
**Recommendation**: Progressive disclosure
|
||||
- Initial: Gauge + vote counts
|
||||
- Expand: List view with reasoning
|
||||
- Advanced: Side-by-side diff view
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Prototype Oracle extension** (today)
|
||||
- Port from shitty-extensions
|
||||
- Test with flash/gemini
|
||||
- Verify conversation context sharing
|
||||
|
||||
2. **Design consensus UI** (tomorrow)
|
||||
- Sketch multi-model result layout
|
||||
- Decide on vote visualization
|
||||
- Mock up add-to-context flow
|
||||
|
||||
3. **Implement model picker** (day 3)
|
||||
- Multi-select support
|
||||
- Quick keys (1-9 for single, checkboxes for multi)
|
||||
- Show cost/capabilities
|
||||
- Filter by authenticated models
|
||||
|
||||
4. **Build comparison view** (day 4-5)
|
||||
- Parallel query execution
|
||||
- Progress indicators
|
||||
- Side-by-side results
|
||||
- Diff highlighting
|
||||
|
||||
5. **Add orch tool wrapper** (day 6)
|
||||
- Register tool for agent use
|
||||
- Map parameters to CLI args
|
||||
- Parse vote output
|
||||
|
||||
6. **Integration testing** (day 7)
|
||||
- Test with real conversations
|
||||
- Verify context sharing works
|
||||
- Check cost estimates
|
||||
- Test with slow models (timeout handling)
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
**Must Have**:
|
||||
- [ ] `/oracle` command works with conversation context
|
||||
- [ ] Model picker shows authenticated models only
|
||||
- [ ] Results display with add-to-context option
|
||||
- [ ] Multi-model query in parallel
|
||||
- [ ] Vote parsing (SUPPORT/OPPOSE/NEUTRAL)
|
||||
- [ ] Cost estimation before query
|
||||
|
||||
**Nice to Have**:
|
||||
- [ ] Side-by-side comparison view
|
||||
- [ ] Diff highlighting for disagreements
|
||||
- [ ] Response caching (5min TTL)
|
||||
- [ ] Model recommendations based on query
|
||||
- [ ] Export consensus to file/issue
|
||||
- [ ] Serial strategies (refine, debate)
|
||||
|
||||
**Stretch Goals**:
|
||||
- [ ] Synthesis mode with custom prompts
|
||||
- [ ] Confidence scoring
|
||||
- [ ] Conversation branching
|
||||
- [ ] Historical consensus tracking
|
||||
- [ ] Model capability filtering (vision/reasoning/coding)
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [orch CLI](https://github.com/yourusername/orch) - Current implementation
|
||||
- [shitty-extensions/oracle.ts](https://github.com/hjanuschka/shitty-extensions/blob/main/extensions/oracle.ts)
|
||||
- [pi-mono extension docs](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/extensions.md)
|
||||
- [pi-mono TUI docs](https://github.com/badlogic/pi-mono/blob/main/packages/tui/README.md)
|
||||
510
docs/research/pi-extension-ecosystem-research.md
Normal file
510
docs/research/pi-extension-ecosystem-research.md
Normal file
|
|
@ -0,0 +1,510 @@
|
|||
# Pi Coding Agent Extension Ecosystem Research
|
||||
|
||||
**Date**: 2026-01-22
|
||||
**Purpose**: Survey the pi-coding-agent extension landscape for ideas to incorporate into dotfiles
|
||||
|
||||
## Overview
|
||||
|
||||
Pi has a vibrant extension ecosystem with ~56 GitHub projects and 52 official examples. Extensions range from practical productivity tools to creative experiments.
|
||||
|
||||
## Notable Community Extensions
|
||||
|
||||
### 🌟 High-Value Extensions
|
||||
|
||||
#### 1. pi-interactive-shell (⭐ 56)
|
||||
**Author**: nicobailon
|
||||
**Use Case**: Run interactive CLIs (vim, psql, htop) in observable overlay
|
||||
|
||||
**Key Features**:
|
||||
- PTY emulation, no tmux dependency
|
||||
- User can watch agent work, take over anytime
|
||||
- Hands-free mode for long-running processes (dev servers)
|
||||
- Auto-exit on quiet for single-task delegations
|
||||
- Session management with query/kill
|
||||
|
||||
**Interesting Patterns**:
|
||||
- `interactive_shell({ command: 'vim config.yaml' })`
|
||||
- Token-efficient approach: agent spawns subprocess, user observes
|
||||
- Rate-limited status queries (60s)
|
||||
- Timeout mode for TUI apps that don't exit
|
||||
|
||||
**Steal-worthy**:
|
||||
- Observable subprocess pattern for Nix builds
|
||||
- Session management with named IDs
|
||||
- Auto-exit detection for fire-and-forget tasks
|
||||
|
||||
---
|
||||
|
||||
#### 2. pi-mcp-adapter (⭐ 16)
|
||||
**Author**: nicobailon
|
||||
**Use Case**: Use MCP servers without burning context window
|
||||
|
||||
**Key Innovation**: Solves Mario's critique of MCP verbosity
|
||||
- Single proxy tool (~200 tokens) instead of hundreds
|
||||
- On-demand tool discovery: `mcp({ search: "screenshot" })`
|
||||
- Then call: `mcp({ tool: "...", args: '...' })`
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
mcp({ search: "query" }) // discover tools
|
||||
mcp({ tool: "name", args: jsonString }) // invoke
|
||||
```
|
||||
|
||||
**Steal-worthy**:
|
||||
- Lazy tool loading pattern
|
||||
- Search-then-invoke flow
|
||||
- Token budget consciousness
|
||||
|
||||
---
|
||||
|
||||
#### 3. shitty-extensions (⭐ 25)
|
||||
**Author**: hjanuschka
|
||||
**Collection**: 10+ extensions + 2 skills
|
||||
|
||||
**Standout Extensions**:
|
||||
|
||||
**oracle.ts** - Second opinions from other models
|
||||
- Inherits conversation context
|
||||
- Model picker UI with quick keys
|
||||
- "Add to context?" after response
|
||||
- Excludes current model from picker
|
||||
|
||||
**memory-mode.ts** - Save instructions to AGENTS.md
|
||||
- Location selector: local/project/global
|
||||
- AI-assisted integration (smart merge)
|
||||
- Preview before save
|
||||
|
||||
**plan-mode.ts** - Claude Code-style read-only exploration
|
||||
- Toggle with `/plan` or Shift+P
|
||||
- Safe code exploration without mutations
|
||||
|
||||
**handoff.ts** - Transfer context to new sessions
|
||||
- Generate context-aware prompt for fresh session
|
||||
|
||||
**usage-bar.ts** - AI provider usage statistics
|
||||
- Multi-provider support (Claude, Copilot, Gemini, Codex, Kiro, z.ai)
|
||||
- Status polling with outage detection
|
||||
- Reset countdowns, visual progress bars
|
||||
|
||||
**speedreading.ts** - RSVP speed reader (Spritz-style)
|
||||
- ORP (Optimal Recognition Point) highlighting
|
||||
- Adaptive timing for longer words
|
||||
- Big ASCII art font mode
|
||||
- Speed control, seek, progress tracking
|
||||
|
||||
**loop.ts** (by mitsuhiko) - Conditional loops
|
||||
- Loop until breakout condition (tests pass, custom, self-decided)
|
||||
- Status widget with turn count
|
||||
- Compaction-safe state preservation
|
||||
|
||||
**Steal-worthy**:
|
||||
- Multi-model consensus pattern (oracle)
|
||||
- Smart AGENTS.md integration (memory-mode)
|
||||
- Provider usage tracking (usage-bar)
|
||||
- Loop-until-done pattern (loop.ts)
|
||||
|
||||
---
|
||||
|
||||
#### 4. pi-review-loop (⭐ 11)
|
||||
**Author**: nicobailon
|
||||
**Use Case**: Automated code review loop until clean
|
||||
|
||||
**Pattern**:
|
||||
```
|
||||
/review-start
|
||||
→ agent reviews, finds bugs, fixes
|
||||
→ auto-prompt for another review
|
||||
→ loop until "No issues found"
|
||||
→ auto-exit
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Smart exit detection (won't be fooled by "Fixed 3 issues. No further issues found.")
|
||||
- Auto-trigger on phrases like "implement the plan"
|
||||
- Configurable max iterations (default 7)
|
||||
- Prompt templates: `/double-check`, `/double-check-plan`
|
||||
|
||||
**Steal-worthy**:
|
||||
- "Keep going until clean" automation
|
||||
- Smart multi-pass detection (catches different issues each time)
|
||||
- Pre/post implementation workflow
|
||||
|
||||
---
|
||||
|
||||
#### 5. pi-powerline-footer (⭐ 7)
|
||||
**Author**: nicobailon
|
||||
**Inspiration**: oh-my-pi
|
||||
|
||||
**Features**:
|
||||
- Welcome overlay with gradient logo
|
||||
- Rounded box design in editor border
|
||||
- Live thinking level indicator (rainbow shimmer for high/xhigh)
|
||||
- Git integration with async fetching, 1s cache TTL
|
||||
- Context awareness (color warnings at 70%/90%)
|
||||
- Token intelligence (1.2k, 45M formatting)
|
||||
- Nerd Font auto-detection with ASCII fallback
|
||||
|
||||
**Presets**: default, minimal, compact, full, nerd, ascii
|
||||
|
||||
**Segments**: model, thinking, path, git, subagents, tokens, cost, context, time, session, hostname, cache
|
||||
|
||||
**Steal-worthy**:
|
||||
- Nerd Font detection pattern
|
||||
- Async git status caching
|
||||
- Preset system for different contexts
|
||||
- Thinking level visualization
|
||||
|
||||
---
|
||||
|
||||
#### 6. pi-model-switch (⭐ 6)
|
||||
**Author**: nicobailon
|
||||
**Use Case**: Agent can switch models autonomously
|
||||
|
||||
**Features**:
|
||||
- Alias configuration: `{ "cheap": "google/gemini-2.5-flash", "coding": "anthropic/claude-opus-4-5" }`
|
||||
- Fallback chains: `"budget": ["openai/gpt-5-mini", "google/gemini-2.5-flash"]`
|
||||
- Natural language: "switch to a cheaper model", "use Claude for this"
|
||||
- Tool: `switch_model({ action: "list|search|switch", search: "term" })`
|
||||
|
||||
**Steal-worthy**:
|
||||
- Alias system for model shortcuts
|
||||
- Fallback chain pattern
|
||||
- Agent-driven model selection
|
||||
|
||||
---
|
||||
|
||||
#### 7. piception (⭐ 1)
|
||||
**Author**: otahontas
|
||||
**Inspiration**: Claudeception
|
||||
|
||||
**Use Case**: Meta-learning - save debugging discoveries as skills
|
||||
|
||||
**Workflow**:
|
||||
1. Debug something complex
|
||||
2. Say "save this as a skill"
|
||||
3. Interactive wizard (edit name, description, content, location)
|
||||
4. Skill loads automatically next time based on semantic matching
|
||||
|
||||
**Triggers**:
|
||||
- Keywords: "save this as a skill", "extract a skill"
|
||||
- Session end offer if significant debugging happened
|
||||
|
||||
**Steal-worthy**:
|
||||
- Meta-learning loop pattern
|
||||
- Skill extraction from conversation
|
||||
- Semantic matching for auto-loading
|
||||
|
||||
---
|
||||
|
||||
## Official Examples (52 total)
|
||||
|
||||
### Practical Examples
|
||||
|
||||
**git-checkpoint.ts** - Git stash checkpoints at each turn
|
||||
- `/fork` can restore code state
|
||||
- Offers to restore on fork
|
||||
- Tracks entry ID → stash ref mapping
|
||||
|
||||
**protected-paths.ts** - Block writes to sensitive files
|
||||
- Intercepts `write` and `edit` tools
|
||||
- Configurable protected path list
|
||||
- Shows notification on block
|
||||
|
||||
**tools.ts** - Enable/disable tools interactively
|
||||
- `/tools` command with UI selector
|
||||
- Persists across reloads
|
||||
- Respects branch navigation
|
||||
- Settings list with enabled/disabled toggle
|
||||
|
||||
**modal-editor.ts** - Vim-like modal editing
|
||||
- Normal/insert mode toggle
|
||||
- hjkl navigation, vim keybindings
|
||||
- Mode indicator in border
|
||||
|
||||
**auto-commit-on-exit.ts** - Auto-commit on session end
|
||||
|
||||
**dirty-repo-guard.ts** - Warn if starting with uncommitted changes
|
||||
|
||||
**file-trigger.ts** - Trigger actions on file events
|
||||
|
||||
**input-transform.ts** - Transform user input before sending
|
||||
|
||||
**trigger-compact.ts** - Auto-compact at thresholds
|
||||
|
||||
**custom-compaction.ts** - Custom compaction strategies
|
||||
|
||||
**confirm-destructive.ts** - Require confirmation for dangerous ops
|
||||
|
||||
**permission-gate.ts** - Permission system for tools
|
||||
|
||||
**tool-override.ts** - Override tool implementations
|
||||
|
||||
**truncated-tool.ts** - Truncate tool outputs
|
||||
|
||||
### UI/UX Examples
|
||||
|
||||
**custom-header.ts** - Custom header component
|
||||
|
||||
**custom-footer.ts** - Custom footer component
|
||||
|
||||
**status-line.ts** - Status line widget
|
||||
|
||||
**widget-placement.ts** - Control widget positioning
|
||||
|
||||
**rainbow-editor.ts** - Rainbow syntax theme
|
||||
|
||||
**mac-system-theme.ts** - Follow macOS light/dark mode
|
||||
|
||||
### Interactive Examples
|
||||
|
||||
**doom-overlay/** - Full Doom game in overlay (!)
|
||||
- WAD file finder
|
||||
- Doom engine
|
||||
- Custom keybindings
|
||||
|
||||
**snake.ts** - Snake game
|
||||
|
||||
**qna.ts** - Q&A framework
|
||||
|
||||
**questionnaire.ts** - Multi-question forms
|
||||
|
||||
**question.ts** - Single question prompts
|
||||
|
||||
**overlay-test.ts** - Overlay testing
|
||||
|
||||
### Communication Examples
|
||||
|
||||
**notify.ts** - System notifications
|
||||
|
||||
**ssh.ts** - SSH connection management
|
||||
|
||||
**send-user-message.ts** - Programmatic user messages
|
||||
|
||||
**shutdown-command.ts** - Shutdown handlers
|
||||
|
||||
### Development Examples
|
||||
|
||||
**chalk-logger.ts** - Colored logging
|
||||
|
||||
**model-status.ts** - Model availability status
|
||||
|
||||
**preset.ts** - Configuration presets
|
||||
|
||||
**summarize.ts** - Conversation summarization
|
||||
|
||||
**handoff.ts** - Context transfer
|
||||
|
||||
**pirate.ts** - Pirate speak translator (fun example)
|
||||
|
||||
**timed-confirm.ts** - Confirmation with timeout
|
||||
|
||||
**todo.ts** - TODO tracking
|
||||
|
||||
**claude-rules.ts** - Claude-specific rules integration
|
||||
|
||||
---
|
||||
|
||||
## Patterns Worth Stealing
|
||||
|
||||
### 1. Multi-Model Consensus
|
||||
- oracle.ts: second opinions without switching contexts
|
||||
- Model picker UI with inheritance
|
||||
- "Add to context?" after response
|
||||
|
||||
### 2. Meta-Learning Loop
|
||||
- piception: save discoveries as skills
|
||||
- Semantic matching for auto-loading
|
||||
- Interactive extraction wizard
|
||||
|
||||
### 3. Token Budget Consciousness
|
||||
- pi-mcp-adapter: lazy tool discovery
|
||||
- Search-then-invoke pattern
|
||||
- Proxy tools instead of full schemas
|
||||
|
||||
### 4. Observable Subprocess Control
|
||||
- pi-interactive-shell: watch agent work
|
||||
- Session management (query/kill)
|
||||
- Auto-exit on quiet
|
||||
|
||||
### 5. Smart Persistence
|
||||
- tools.ts: branch-aware state
|
||||
- git-checkpoint.ts: stash per turn
|
||||
- Compaction-safe storage
|
||||
|
||||
### 6. Review Loops
|
||||
- pi-review-loop: keep going until clean
|
||||
- Smart exit detection
|
||||
- Multi-pass catching different issues
|
||||
|
||||
### 7. Adaptive UI
|
||||
- powerline-footer: Nerd Font detection
|
||||
- Preset system for contexts
|
||||
- Thinking level visualization
|
||||
- Async git caching
|
||||
|
||||
### 8. Safety Guards
|
||||
- protected-paths.ts: block dangerous writes
|
||||
- dirty-repo-guard.ts: warn on uncommitted changes
|
||||
- confirm-destructive.ts: require confirmation
|
||||
|
||||
### 9. Model Management
|
||||
- pi-model-switch: agent-driven switching
|
||||
- Alias system with fallbacks
|
||||
- Natural language selection
|
||||
|
||||
### 10. Memory/Instruction Management
|
||||
- memory-mode.ts: AI-assisted AGENTS.md merge
|
||||
- Location selector (local/project/global)
|
||||
- Preview before save
|
||||
|
||||
---
|
||||
|
||||
## Ideas for Dotfiles Integration
|
||||
|
||||
### High Priority
|
||||
|
||||
1. **Multi-agent consensus** - `/orch` equivalent as extension
|
||||
- Already have orch CLI, could wrap as tool
|
||||
- Modal picker UI for model selection
|
||||
- "Add to context?" option
|
||||
|
||||
2. **Nix build observer** - Interactive-shell pattern
|
||||
- Watch long Nix builds in overlay
|
||||
- Take over if needed
|
||||
- Auto-exit on completion
|
||||
|
||||
3. **Review loop integration** - Work with nix-review skill
|
||||
- `/nix-review-loop` command
|
||||
- Keep reviewing until no issues
|
||||
- Multi-lens passes
|
||||
|
||||
4. **Protected paths for NixOS** - Prevent accidental mutations
|
||||
- Block writes to `/secrets/*.yaml` (use sops edit)
|
||||
- Block direct writes to `/nix/store`
|
||||
- Warn on `/etc/nixos` (use modules/)
|
||||
|
||||
5. **Git checkpoint auto-restore** - Already have good git hygiene
|
||||
- Track changes per turn
|
||||
- Offer restore on fork
|
||||
- Persist with session
|
||||
|
||||
### Medium Priority
|
||||
|
||||
6. **Beads integration** - Native issue tracking
|
||||
- `/beads` command for issue operations
|
||||
- Tool registration for agent-created issues
|
||||
- Smart linking to commits/files
|
||||
|
||||
7. **Model switcher with aliases**
|
||||
- `cheap: gemini-2.5-flash`
|
||||
- `expensive: claude-opus-4-5`
|
||||
- `nix: claude-sonnet-4-5` (good at Nix)
|
||||
- Agent decides based on task
|
||||
|
||||
8. **Usage tracking** - Anthropic, OpenAI, Gemini quotas
|
||||
- Footer widget with remaining tokens
|
||||
- Warning at 80% usage
|
||||
- Cost tracking per session
|
||||
|
||||
9. **Sops secret guard** - Prevent accidental leaks
|
||||
- Intercept tool calls with secret patterns
|
||||
- Require confirmation for copying secrets
|
||||
- Never write secrets to non-sops files
|
||||
|
||||
10. **Skill extraction** - Piception pattern
|
||||
- Save debugging sessions as skills
|
||||
- Auto-populate `~/.pi/agent/skills/`
|
||||
- Semantic matching for future loads
|
||||
|
||||
### Low Priority
|
||||
|
||||
11. **Niri window capture integration** - Already have skill
|
||||
- Tool registration for agent use
|
||||
- Screenshot before/after comparisons
|
||||
- Visual regression testing
|
||||
|
||||
12. **Powerline footer** - NixOS-specific widgets
|
||||
- Flake lock status (outdated inputs)
|
||||
- Rebuild needed indicator
|
||||
- System generation count
|
||||
|
||||
13. **Speed reader** - For long outputs
|
||||
- Nix build logs
|
||||
- Test results
|
||||
- Documentation
|
||||
|
||||
14. **Plan mode** - Safe exploration
|
||||
- Read-only for large refactors
|
||||
- Preview changes before applying
|
||||
- "/plan" toggle
|
||||
|
||||
---
|
||||
|
||||
## Architecture Notes
|
||||
|
||||
### Extension Hooks (from examples)
|
||||
|
||||
**Lifecycle**:
|
||||
- `session_start` - initialization
|
||||
- `session_end` - cleanup
|
||||
- `agent_start` - before agent turn
|
||||
- `agent_end` - after agent turn
|
||||
- `turn_start` - before turn processing
|
||||
- `turn_end` - after turn completion
|
||||
|
||||
**Interaction**:
|
||||
- `tool_call` - intercept before execution (can block)
|
||||
- `tool_result` - after execution (can modify)
|
||||
- `user_message` - intercept user input
|
||||
- `ai_message` - intercept AI output
|
||||
|
||||
**Session**:
|
||||
- `session_before_fork` - before creating fork
|
||||
- `session_fork` - after fork created
|
||||
- `session_tree` - on tree navigation
|
||||
- `session_compact` - during compaction
|
||||
|
||||
**UI**:
|
||||
- `ctx.ui.notify()` - system notifications
|
||||
- `ctx.ui.select()` - picker UI
|
||||
- `ctx.ui.confirm()` - yes/no prompts
|
||||
- `ctx.ui.custom()` - full custom components
|
||||
- `ctx.ui.setEditorComponent()` - replace editor
|
||||
|
||||
**State**:
|
||||
- `pi.appendEntry<T>(type, data)` - persist to session
|
||||
- `ctx.sessionManager.getBranch()` - get current branch
|
||||
- `ctx.sessionManager.getLeafEntry()` - get current entry
|
||||
|
||||
**Tools**:
|
||||
- `pi.registerTool()` - add new tools
|
||||
- `pi.setActiveTools()` - filter available tools
|
||||
- `pi.getAllTools()` - list all tools
|
||||
- `pi.getActiveTools()` - list active tools
|
||||
|
||||
**Commands**:
|
||||
- `pi.registerCommand(name, { description, handler })` - add `/command`
|
||||
|
||||
**Execution**:
|
||||
- `pi.exec(cmd, args)` - run subprocess
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Review dotfiles' current extension setup (if any)
|
||||
2. Prioritize extensions to implement
|
||||
3. Start with git-checkpoint (simple, high value)
|
||||
4. Add protected-paths for secrets
|
||||
5. Build beads tool integration
|
||||
6. Consider multi-agent consensus wrapper
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [pi-mono GitHub](https://github.com/badlogic/pi-mono)
|
||||
- [Pi extensions docs](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/extensions.md)
|
||||
- [shitty-extensions](https://github.com/hjanuschka/shitty-extensions)
|
||||
- [nicobailon's extensions](https://github.com/nicobailon?tab=repositories&q=pi-)
|
||||
- [Mario's "What if you don't need MCP"](https://mariozechner.at/posts/2025-11-02-what-if-you-dont-need-mcp/)
|
||||
849
docs/research/pi-ui-ecosystem-research.md
Normal file
849
docs/research/pi-ui-ecosystem-research.md
Normal file
|
|
@ -0,0 +1,849 @@
|
|||
# Pi Coding Agent UI/TUI Ecosystem Research
|
||||
|
||||
**Date**: 2026-01-22
|
||||
**Purpose**: Survey pi-coding-agent UI/TUI patterns and components for dotfiles integration
|
||||
|
||||
## Official TUI Package (@mariozechner/pi-tui)
|
||||
|
||||
### Core Features
|
||||
|
||||
**Differential Rendering**:
|
||||
- Three-strategy system (first render, width change, normal update)
|
||||
- Synchronized output with CSI 2026 for flicker-free updates
|
||||
- Only updates changed lines
|
||||
|
||||
**Component Architecture**:
|
||||
```typescript
|
||||
interface Component {
|
||||
render(width: number): string[]; // Must not exceed width!
|
||||
handleInput?(data: string): void; // Keyboard input
|
||||
invalidate?(): void; // Clear cached state
|
||||
}
|
||||
```
|
||||
|
||||
**Focusable Interface** (IME Support):
|
||||
```typescript
|
||||
interface Focusable {
|
||||
focused: boolean; // Set by TUI when focus changes
|
||||
}
|
||||
```
|
||||
- Emit `CURSOR_MARKER` right before fake cursor
|
||||
- TUI positions hardware cursor at marker
|
||||
- Enables IME candidate windows (CJK input)
|
||||
|
||||
**Overlay System**:
|
||||
```typescript
|
||||
const handle = tui.showOverlay(component, {
|
||||
width: 60 | "80%",
|
||||
maxHeight: 20 | "50%",
|
||||
anchor: 'center' | 'top-left' | 'bottom-right',
|
||||
offsetX: 2, offsetY: -1,
|
||||
row: 5 | "25%", col: 10 | "50%",
|
||||
margin: 2 | { top, right, bottom, left },
|
||||
visible: (termWidth, termHeight) => termWidth >= 100
|
||||
});
|
||||
handle.hide();
|
||||
handle.setHidden(true); // Temporarily hide
|
||||
handle.isHidden();
|
||||
```
|
||||
|
||||
**Anchor values**: center, top-left, top-right, bottom-left, bottom-right, top-center, bottom-center, left-center, right-center
|
||||
|
||||
### Built-in Components
|
||||
|
||||
**Layout**:
|
||||
- `Container` - Groups children
|
||||
- `Box` - Container with padding + background
|
||||
- `Spacer` - Empty lines
|
||||
|
||||
**Text**:
|
||||
- `Text` - Multi-line with word wrap
|
||||
- `TruncatedText` - Single line with truncation
|
||||
- `Markdown` - Full markdown rendering with syntax highlight
|
||||
|
||||
**Input**:
|
||||
- `Input` - Single-line text input with scrolling
|
||||
- `Editor` - Multi-line editor with autocomplete, paste handling, vertical scrolling
|
||||
|
||||
**Selection**:
|
||||
- `SelectList` - Interactive picker with keyboard nav
|
||||
- `SettingsList` - Settings panel with value cycling + submenus
|
||||
|
||||
**Feedback**:
|
||||
- `Loader` - Animated spinner
|
||||
- `CancellableLoader` - Loader with Escape + AbortSignal
|
||||
|
||||
**Media**:
|
||||
- `Image` - Inline images (Kitty/iTerm2 protocol, fallback to placeholder)
|
||||
|
||||
### Key Detection
|
||||
|
||||
```typescript
|
||||
import { matchesKey, Key } from "@mariozechner/pi-tui";
|
||||
|
||||
if (matchesKey(data, Key.ctrl("c"))) process.exit(0);
|
||||
if (matchesKey(data, Key.enter)) submit();
|
||||
if (matchesKey(data, Key.escape)) cancel();
|
||||
if (matchesKey(data, Key.up)) moveUp();
|
||||
if (matchesKey(data, Key.ctrlShift("p"))) command();
|
||||
```
|
||||
|
||||
**Key helpers**:
|
||||
- Basic: `Key.enter`, `Key.escape`, `Key.tab`, `Key.space`, `Key.backspace`, `Key.delete`, `Key.home`, `Key.end`
|
||||
- Arrows: `Key.up`, `Key.down`, `Key.left`, `Key.right`
|
||||
- Modifiers: `Key.ctrl("c")`, `Key.shift("tab")`, `Key.alt("left")`, `Key.ctrlShift("p")`
|
||||
- String format: `"enter"`, `"ctrl+c"`, `"shift+tab"`, `"ctrl+shift+p"`
|
||||
|
||||
### Utilities
|
||||
|
||||
```typescript
|
||||
import { visibleWidth, truncateToWidth, wrapTextWithAnsi } from "@mariozechner/pi-tui";
|
||||
|
||||
// Visible width (ignoring ANSI)
|
||||
const w = visibleWidth("\x1b[31mHello\x1b[0m"); // 5
|
||||
|
||||
// Truncate with ellipsis (preserves ANSI)
|
||||
const t = truncateToWidth("Hello World", 8); // "Hello..."
|
||||
const t2 = truncateToWidth("Hello World", 8, ""); // "Hello Wo"
|
||||
|
||||
// Wrap text (preserves ANSI across lines)
|
||||
const lines = wrapTextWithAnsi("Long line...", 20);
|
||||
```
|
||||
|
||||
### Autocomplete
|
||||
|
||||
```typescript
|
||||
import { CombinedAutocompleteProvider } from "@mariozechner/pi-tui";
|
||||
|
||||
const provider = new CombinedAutocompleteProvider(
|
||||
[
|
||||
{ name: "help", description: "Show help" },
|
||||
{ name: "clear", description: "Clear screen" },
|
||||
],
|
||||
process.cwd() // base path for file completion
|
||||
);
|
||||
|
||||
editor.setAutocompleteProvider(provider);
|
||||
// Type "/" for slash commands
|
||||
// Press Tab for file paths (~/, ./, ../, @)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## UI Extension Examples
|
||||
|
||||
### Header/Footer Customization
|
||||
|
||||
#### custom-header.ts
|
||||
|
||||
Replaces built-in header with custom component (pi mascot ASCII art).
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
ctx.ui.setHeader((_tui, theme) => ({
|
||||
render(_width: number): string[] {
|
||||
return [...mascotLines, subtitle];
|
||||
},
|
||||
invalidate() {}
|
||||
}));
|
||||
});
|
||||
|
||||
pi.registerCommand("builtin-header", {
|
||||
handler: async (_args, ctx) => {
|
||||
ctx.ui.setHeader(undefined); // Restore built-in
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Steal-worthy**:
|
||||
- ASCII art rendering
|
||||
- Dynamic theme-aware coloring
|
||||
- Toggle command to restore defaults
|
||||
|
||||
#### custom-footer.ts
|
||||
|
||||
Custom footer with token stats + git branch.
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
ctx.ui.setFooter((tui, theme, footerData) => {
|
||||
const unsub = footerData.onBranchChange(() => tui.requestRender());
|
||||
|
||||
return {
|
||||
dispose: unsub,
|
||||
render(width: number): string[] {
|
||||
const branch = footerData.getGitBranch(); // Not otherwise accessible!
|
||||
const left = theme.fg("dim", `↑${input} ↓${output} $${cost}`);
|
||||
const right = theme.fg("dim", `${model}${branchStr}`);
|
||||
const pad = " ".repeat(width - visibleWidth(left) - visibleWidth(right));
|
||||
return [truncateToWidth(left + pad + right, width)];
|
||||
}
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Key APIs**:
|
||||
- `footerData.getGitBranch()` - Current branch (not in ctx)
|
||||
- `footerData.getExtensionStatuses()` - Status texts from `ctx.ui.setStatus()`
|
||||
- `footerData.onBranchChange(callback)` - Subscribe to branch changes
|
||||
|
||||
**Steal-worthy**:
|
||||
- Git integration pattern
|
||||
- Token/cost tracking
|
||||
- Left/right alignment with padding
|
||||
|
||||
---
|
||||
|
||||
### Editor Customization
|
||||
|
||||
#### modal-editor.ts
|
||||
|
||||
Vim-like modal editing.
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
import { CustomEditor, matchesKey } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
class ModalEditor extends CustomEditor {
|
||||
private mode: "normal" | "insert" = "insert";
|
||||
|
||||
handleInput(data: string): void {
|
||||
if (matchesKey(data, "escape")) {
|
||||
if (this.mode === "insert") {
|
||||
this.mode = "normal";
|
||||
} else {
|
||||
super.handleInput(data); // Abort agent
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (this.mode === "insert") {
|
||||
super.handleInput(data);
|
||||
return;
|
||||
}
|
||||
|
||||
// Normal mode key mappings
|
||||
const NORMAL_KEYS = {
|
||||
h: "\x1b[D", j: "\x1b[B", k: "\x1b[A", l: "\x1b[C",
|
||||
"0": "\x01", $: "\x05", x: "\x1b[3~",
|
||||
i: null, a: null
|
||||
};
|
||||
|
||||
if (data in NORMAL_KEYS) {
|
||||
const seq = NORMAL_KEYS[data];
|
||||
if (data === "i") this.mode = "insert";
|
||||
else if (data === "a") {
|
||||
this.mode = "insert";
|
||||
super.handleInput("\x1b[C"); // Move right first
|
||||
} else if (seq) {
|
||||
super.handleInput(seq);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
render(width: number): string[] {
|
||||
const lines = super.render(width);
|
||||
const label = this.mode === "normal" ? " NORMAL " : " INSERT ";
|
||||
// Add mode indicator to bottom border
|
||||
lines[lines.length - 1] = truncateToWidth(
|
||||
lines[lines.length - 1],
|
||||
width - label.length
|
||||
) + label;
|
||||
return lines;
|
||||
}
|
||||
}
|
||||
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.on("session_start", (_event, ctx) => {
|
||||
ctx.ui.setEditorComponent((tui, theme, kb) =>
|
||||
new ModalEditor(tui, theme, kb)
|
||||
);
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Steal-worthy**:
|
||||
- Modal editing pattern
|
||||
- Custom key mapping layer
|
||||
- Mode indicator in border
|
||||
- Pass-through to super for unmapped keys
|
||||
|
||||
#### rainbow-editor.ts
|
||||
|
||||
Animated rainbow "ultrathink" effect.
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
class RainbowEditor extends CustomEditor {
|
||||
private animationTimer?: ReturnType<typeof setInterval>;
|
||||
private frame = 0;
|
||||
|
||||
private startAnimation(): void {
|
||||
this.animationTimer = setInterval(() => {
|
||||
this.frame++;
|
||||
this.tui.requestRender();
|
||||
}, 60);
|
||||
}
|
||||
|
||||
handleInput(data: string): void {
|
||||
super.handleInput(data);
|
||||
if (/ultrathink/i.test(this.getText())) {
|
||||
this.startAnimation();
|
||||
} else {
|
||||
this.stopAnimation();
|
||||
}
|
||||
}
|
||||
|
||||
render(width: number): string[] {
|
||||
const cycle = this.frame % 20;
|
||||
const shinePos = cycle < 10 ? cycle : -1;
|
||||
|
||||
return super.render(width).map(line =>
|
||||
line.replace(/ultrathink/gi, m => colorize(m, shinePos))
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
function colorize(text: string, shinePos: number): string {
|
||||
const COLORS = [[233,137,115], [228,186,103], [141,192,122], ...];
|
||||
return [...text].map((c, i) => {
|
||||
const baseColor = COLORS[i % COLORS.length];
|
||||
let factor = 0;
|
||||
const dist = Math.abs(i - shinePos);
|
||||
if (dist === 0) factor = 0.7;
|
||||
else if (dist === 1) factor = 0.35;
|
||||
return `${brighten(baseColor, factor)}${c}`;
|
||||
}).join("") + RESET;
|
||||
}
|
||||
```
|
||||
|
||||
**Steal-worthy**:
|
||||
- Animation timing with setInterval
|
||||
- Frame-based shine cycling
|
||||
- RGB brightening for shimmer effect
|
||||
- Text replacement in rendered output
|
||||
|
||||
---
|
||||
|
||||
### Widget Management
|
||||
|
||||
#### widget-placement.ts
|
||||
|
||||
Control widget positioning.
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
const applyWidgets = (ctx: ExtensionContext) => {
|
||||
if (!ctx.hasUI) return;
|
||||
|
||||
ctx.ui.setWidget("widget-above", ["Above editor widget"]);
|
||||
|
||||
ctx.ui.setWidget("widget-below",
|
||||
["Below editor widget"],
|
||||
{ placement: "belowEditor" }
|
||||
);
|
||||
};
|
||||
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.on("session_start", (_event, ctx) => applyWidgets(ctx));
|
||||
pi.on("session_switch", (_event, ctx) => applyWidgets(ctx));
|
||||
}
|
||||
```
|
||||
|
||||
**API**:
|
||||
- `ctx.ui.setWidget(id, lines, { placement?: "aboveEditor" | "belowEditor" })`
|
||||
- Default: aboveEditor
|
||||
- Persists across session switches
|
||||
|
||||
**Steal-worthy**:
|
||||
- Placement control pattern
|
||||
- Multi-event registration (start + switch)
|
||||
|
||||
---
|
||||
|
||||
### Overlay Patterns
|
||||
|
||||
#### overlay-test.ts
|
||||
|
||||
Comprehensive overlay testing with inline inputs.
|
||||
|
||||
**Features**:
|
||||
- Inline text inputs within menu items
|
||||
- Edge case tests (wide chars, styled text, emoji)
|
||||
- Focusable interface for IME support
|
||||
- Border rendering with box drawing chars
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
pi.registerCommand("overlay-test", {
|
||||
handler: async (_args, ctx) => {
|
||||
const result = await ctx.ui.custom<Result>(
|
||||
(tui, theme, kb, done) => new OverlayTestComponent(theme, done),
|
||||
{ overlay: true }
|
||||
);
|
||||
if (result) ctx.ui.notify(result.action, "info");
|
||||
}
|
||||
});
|
||||
|
||||
class OverlayTestComponent implements Focusable {
|
||||
readonly width = 70;
|
||||
focused = false; // Set by TUI
|
||||
|
||||
handleInput(data: string): void {
|
||||
if (matchesKey(data, "escape")) {
|
||||
this.done(undefined);
|
||||
return;
|
||||
}
|
||||
|
||||
const current = this.items[this.selected];
|
||||
|
||||
if (matchesKey(data, "return")) {
|
||||
this.done({ action: current.label, query: current.text });
|
||||
} else if (current.hasInput) {
|
||||
// Handle text input for inline field
|
||||
if (matchesKey(data, "backspace")) { /* ... */ }
|
||||
else if (data.charCodeAt(0) >= 32) {
|
||||
current.text = current.text.slice(0, current.cursor)
|
||||
+ data
|
||||
+ current.text.slice(current.cursor);
|
||||
current.cursor++;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
render(width: number): string[] {
|
||||
const lines = [];
|
||||
lines.push(theme.fg("border", `╭${"─".repeat(innerW)}╮`));
|
||||
lines.push(row(` ${theme.fg("accent", "🧪 Overlay Test")}`));
|
||||
|
||||
for (const item of this.items) {
|
||||
if (item.hasInput) {
|
||||
let inputDisplay = item.text;
|
||||
if (isSelected) {
|
||||
const marker = this.focused ? CURSOR_MARKER : "";
|
||||
inputDisplay = `${before}${marker}\x1b[7m${cursorChar}\x1b[27m${after}`;
|
||||
}
|
||||
lines.push(row(`${prefix}${label} ${inputDisplay}`));
|
||||
} else {
|
||||
lines.push(row(prefix + label));
|
||||
}
|
||||
}
|
||||
|
||||
lines.push(theme.fg("border", `╰${"─".repeat(innerW)}╯`));
|
||||
return lines;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Steal-worthy**:
|
||||
- Inline input fields in menus
|
||||
- IME support with CURSOR_MARKER
|
||||
- Box drawing character borders
|
||||
- Edge case testing (wide chars, emoji, styled text)
|
||||
|
||||
#### doom-overlay
|
||||
|
||||
Full DOOM game in overlay (35 FPS).
|
||||
|
||||
**Features**:
|
||||
- WebAssembly game engine
|
||||
- Half-block character rendering (▀) with 24-bit color
|
||||
- 90% width, 80% max height, centered
|
||||
- Maintains 3.2:1 aspect ratio
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
const handle = tui.showOverlay(doomComponent, {
|
||||
width: "90%",
|
||||
maxHeight: "80%",
|
||||
anchor: "center"
|
||||
});
|
||||
|
||||
// Render loop
|
||||
setInterval(() => {
|
||||
// Get frame from WASM
|
||||
const frame = doomEngine.getFrame();
|
||||
// Convert to half-blocks with fg/bg colors
|
||||
const lines = renderHalfBlocks(frame);
|
||||
component.invalidate();
|
||||
tui.requestRender();
|
||||
}, 1000 / 35);
|
||||
```
|
||||
|
||||
**Steal-worthy**:
|
||||
- Percentage-based sizing
|
||||
- Real-time rendering in overlay
|
||||
- Half-block technique for pixel rendering
|
||||
- WebAssembly integration
|
||||
|
||||
---
|
||||
|
||||
### Theme Management
|
||||
|
||||
#### mac-system-theme.ts
|
||||
|
||||
Auto-sync theme with macOS appearance.
|
||||
|
||||
**Pattern**:
|
||||
```typescript
|
||||
async function isDarkMode(): Promise<boolean> {
|
||||
const { stdout } = await execAsync(
|
||||
'osascript -e "tell application \\"System Events\\" to tell appearance preferences to return dark mode"'
|
||||
);
|
||||
return stdout.trim() === "true";
|
||||
}
|
||||
|
||||
export default function (pi: ExtensionAPI) {
|
||||
let intervalId: ReturnType<typeof setInterval> | null = null;
|
||||
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
let currentTheme = await isDarkMode() ? "dark" : "light";
|
||||
ctx.ui.setTheme(currentTheme);
|
||||
|
||||
intervalId = setInterval(async () => {
|
||||
const newTheme = await isDarkMode() ? "dark" : "light";
|
||||
if (newTheme !== currentTheme) {
|
||||
currentTheme = newTheme;
|
||||
ctx.ui.setTheme(currentTheme);
|
||||
}
|
||||
}, 2000);
|
||||
});
|
||||
|
||||
pi.on("session_shutdown", () => {
|
||||
if (intervalId) {
|
||||
clearInterval(intervalId);
|
||||
intervalId = null;
|
||||
}
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Steal-worthy**:
|
||||
- System appearance detection (macOS AppleScript)
|
||||
- Polling pattern for external state
|
||||
- Theme switching API
|
||||
- Cleanup on shutdown
|
||||
|
||||
---
|
||||
|
||||
## Community UI Extensions
|
||||
|
||||
### pi-powerline-footer (⭐ 7)
|
||||
|
||||
Powerline-style status bar with welcome overlay.
|
||||
|
||||
**Features**:
|
||||
- Branded splash screen (gradient logo, stats, keybindings)
|
||||
- Rounded box design in editor border
|
||||
- Live thinking level indicator (rainbow shimmer for high/xhigh)
|
||||
- Async git status (1s cache TTL, invalidates on file writes)
|
||||
- Context warnings (70% yellow, 90% red)
|
||||
- Token intelligence (1.2k, 45M formatting)
|
||||
- Nerd Font auto-detection (iTerm, WezTerm, Kitty, Ghostty, Alacritty)
|
||||
|
||||
**Presets**:
|
||||
- `default` - Model, thinking, path, git, context, tokens, cost
|
||||
- `minimal` - Path, git, context
|
||||
- `compact` - Model, git, cost, context
|
||||
- `full` - Everything (hostname, time, abbreviated path)
|
||||
- `nerd` - Maximum detail for Nerd Fonts
|
||||
- `ascii` - Safe for any terminal
|
||||
|
||||
**Segments**: pi, model, thinking, path, git, subagents, token_in, token_out, token_total, cost, context_pct, context_total, time_spent, time, session, hostname, cache_read, cache_write
|
||||
|
||||
**Separators**: powerline, powerline-thin, slash, pipe, dot, chevron, star, block, none, ascii
|
||||
|
||||
**Path modes**:
|
||||
- `basename` - Just directory name
|
||||
- `abbreviated` - Full path with home abbreviated, length limit
|
||||
- `full` - Complete path with home abbreviated
|
||||
|
||||
**Thinking level display**:
|
||||
- off: gray
|
||||
- minimal: purple-gray
|
||||
- low: blue
|
||||
- medium: teal
|
||||
- high: 🌈 rainbow
|
||||
- xhigh: 🌈 rainbow
|
||||
|
||||
**Steal-worthy**:
|
||||
- Welcome overlay pattern
|
||||
- Nerd Font detection
|
||||
- Git caching strategy
|
||||
- Preset system
|
||||
- Segment composability
|
||||
- Thinking level visualization
|
||||
|
||||
---
|
||||
|
||||
## Patterns Worth Stealing
|
||||
|
||||
### 1. Custom Editor Extensions
|
||||
|
||||
**Modal Editing**:
|
||||
- Layer vim-like modes on top of editor
|
||||
- Map keys to escape sequences
|
||||
- Mode indicator in border
|
||||
- Pass-through for unmapped keys
|
||||
|
||||
**Animated Effects**:
|
||||
- setInterval-based animation
|
||||
- Frame counter for cycling
|
||||
- Pattern matching in render()
|
||||
- RGB color manipulation
|
||||
|
||||
### 2. Header/Footer Customization
|
||||
|
||||
**Custom Header**:
|
||||
- ASCII art rendering
|
||||
- Theme-aware coloring
|
||||
- Toggle command for defaults
|
||||
|
||||
**Custom Footer**:
|
||||
- Git branch integration
|
||||
- Token/cost tracking
|
||||
- Left/right alignment
|
||||
- Dynamic status updates
|
||||
|
||||
### 3. Overlay Patterns
|
||||
|
||||
**Inline Input Menus**:
|
||||
- Focusable interface for IME
|
||||
- CURSOR_MARKER for cursor positioning
|
||||
- Box drawing borders
|
||||
- Edge case handling
|
||||
|
||||
**Game/Animation Overlays**:
|
||||
- Percentage-based sizing
|
||||
- Real-time rendering loops
|
||||
- Half-block pixel technique
|
||||
|
||||
### 4. Widget Management
|
||||
|
||||
**Placement Control**:
|
||||
- aboveEditor vs belowEditor
|
||||
- Multi-event registration
|
||||
- Persistent across switches
|
||||
|
||||
### 5. Theme Integration
|
||||
|
||||
**System Sync**:
|
||||
- OS appearance detection
|
||||
- Polling for external state
|
||||
- Theme switching API
|
||||
- Cleanup handlers
|
||||
|
||||
### 6. Powerline Pattern
|
||||
|
||||
**Segment Composability**:
|
||||
- Modular segment system
|
||||
- Preset configurations
|
||||
- Separator styles
|
||||
- Font detection
|
||||
|
||||
**Smart Caching**:
|
||||
- TTL-based git status
|
||||
- Invalidate on file events
|
||||
- Async fetching
|
||||
|
||||
**Progressive Enhancement**:
|
||||
- Nerd Font detection
|
||||
- ASCII fallbacks
|
||||
- Responsive visibility
|
||||
|
||||
---
|
||||
|
||||
## Ideas for Dotfiles Integration
|
||||
|
||||
### High Priority
|
||||
|
||||
1. **NixOS-aware footer** - Extend powerline pattern
|
||||
- Segments: flake-lock-age, rebuild-needed, generation-count, last-build-status
|
||||
- Git branch with dirty indicator
|
||||
- Nix eval cost (tokens used for config generation)
|
||||
- Auto-compact indicator
|
||||
|
||||
2. **Nix build overlay** - Long-running build visualization
|
||||
- Show build progress in overlay
|
||||
- Stream build log with auto-scroll
|
||||
- Color-coded output (errors red, warnings yellow)
|
||||
- Escape to background, status in widget
|
||||
|
||||
3. **Beads issue selector** - Overlay with inline filtering
|
||||
- Show issues with priority/status
|
||||
- Filter by label, search
|
||||
- Inline preview of issue description
|
||||
- Quick actions (update status, add comment)
|
||||
|
||||
4. **Multi-model consensus UI** - Extend oracle pattern
|
||||
- Model picker with Nix-aware descriptions
|
||||
- Show model capabilities (nix, general, vision)
|
||||
- Side-by-side response comparison
|
||||
- Vote/merge UI
|
||||
|
||||
### Medium Priority
|
||||
|
||||
5. **Sops secret editor** - Protected inline editing
|
||||
- Overlay for secret selection
|
||||
- Inline decryption/editing
|
||||
- Re-encrypt on save
|
||||
- Never show in main editor
|
||||
|
||||
6. **Niri window grid** - Visual window picker
|
||||
- ASCII art grid of workspaces
|
||||
- Window thumbnails (if terminal supports images)
|
||||
- Keyboard navigation
|
||||
- Launch window in context
|
||||
|
||||
7. **Git checkpoint visualizer** - Tree view overlay
|
||||
- Show checkpoint stash refs
|
||||
- Visual diff preview
|
||||
- One-key restore
|
||||
- Fork visualization
|
||||
|
||||
8. **Plan mode indicator** - Visual read-only state
|
||||
- Header banner when in plan mode
|
||||
- Different border color
|
||||
- Disable write/edit tools
|
||||
- Clear toggle status
|
||||
|
||||
### Low Priority
|
||||
|
||||
9. **Skill extraction wizard** - Piception pattern
|
||||
- Detect debugging sessions
|
||||
- Offer extraction at session end
|
||||
- Interactive editor for skill content
|
||||
- Auto-populate metadata
|
||||
|
||||
10. **Usage quota widget** - Above-editor status
|
||||
- Anthropic 5h/week countdown
|
||||
- OpenAI rate limits
|
||||
- Gemini quota
|
||||
- Color-coded warnings
|
||||
|
||||
11. **Rainbow ultrathink** - Fun effect
|
||||
- Shimmer animation for thinking states
|
||||
- Configurable trigger words
|
||||
- Gradient colors
|
||||
|
||||
12. **ASCII art loader** - NixOS theme
|
||||
- Snowflake logo animation
|
||||
- Nix build status messages
|
||||
- Progress bar for long operations
|
||||
|
||||
---
|
||||
|
||||
## Architecture Notes
|
||||
|
||||
### UI Extension Hooks
|
||||
|
||||
**Lifecycle**:
|
||||
- `session_start` - Set up UI components
|
||||
- `session_shutdown` - Clean up timers, resources
|
||||
|
||||
**UI Customization**:
|
||||
- `ctx.ui.setHeader(factory)` - Replace header
|
||||
- `ctx.ui.setFooter(factory)` - Replace footer
|
||||
- `ctx.ui.setEditorComponent(factory)` - Replace editor
|
||||
- `ctx.ui.setWidget(id, lines, { placement })` - Add widget
|
||||
- `ctx.ui.setTheme(name)` - Change theme
|
||||
|
||||
**UI Interactions**:
|
||||
- `ctx.ui.notify(message, level)` - Show notification
|
||||
- `ctx.ui.select(prompt, options)` - Picker dialog
|
||||
- `ctx.ui.confirm(prompt)` - Yes/no dialog
|
||||
- `ctx.ui.custom(factory, { overlay })` - Custom component
|
||||
|
||||
**Footer Data** (only in setFooter):
|
||||
- `footerData.getGitBranch()` - Current branch
|
||||
- `footerData.getExtensionStatuses()` - Status texts
|
||||
- `footerData.onBranchChange(callback)` - Subscribe to changes
|
||||
|
||||
### Component Best Practices
|
||||
|
||||
**Line Width Constraint**:
|
||||
- Each line MUST NOT exceed `width` parameter
|
||||
- Use `truncateToWidth()` to ensure compliance
|
||||
- TUI will error on overflow
|
||||
|
||||
**ANSI Handling**:
|
||||
- `visibleWidth()` ignores ANSI codes
|
||||
- `truncateToWidth()` preserves ANSI codes
|
||||
- `wrapTextWithAnsi()` maintains styling across wraps
|
||||
- TUI appends SGR reset + OSC 8 reset per line
|
||||
|
||||
**Caching**:
|
||||
- Cache rendered output when possible
|
||||
- Invalidate on state changes
|
||||
- Check cached width matches current width
|
||||
|
||||
**IME Support**:
|
||||
- Implement `Focusable` interface
|
||||
- Set `focused` property
|
||||
- Emit `CURSOR_MARKER` before fake cursor
|
||||
- Container components must propagate focus
|
||||
|
||||
### Overlay Positioning
|
||||
|
||||
**Resolution Order**:
|
||||
1. `minWidth` floor after width calculation
|
||||
2. Position: absolute > percentage > anchor
|
||||
3. `margin` clamps to terminal bounds
|
||||
4. `visible` callback controls rendering
|
||||
|
||||
**Sizing**:
|
||||
- Numbers = absolute columns/rows
|
||||
- Strings = percentages ("50%", "80%")
|
||||
- `maxHeight`, `maxWidth` limits
|
||||
- `minWidth` floor
|
||||
|
||||
**Positioning**:
|
||||
- `anchor` + `offsetX`/`offsetY` (simple)
|
||||
- `row`/`col` percentages (responsive)
|
||||
- Absolute `row`/`col` (precise)
|
||||
- `margin` for edge padding
|
||||
|
||||
### Key Detection Patterns
|
||||
|
||||
**Kitty Protocol Support**:
|
||||
- Use `Key` helper for autocomplete
|
||||
- String literals also work
|
||||
- Handles Shift, Ctrl, Alt modifiers
|
||||
- Gracefully degrades on non-Kitty terminals
|
||||
|
||||
**Common Patterns**:
|
||||
```typescript
|
||||
// Navigation
|
||||
if (matchesKey(data, Key.up)) moveUp();
|
||||
if (matchesKey(data, Key.down)) moveDown();
|
||||
|
||||
// Submission
|
||||
if (matchesKey(data, Key.enter)) submit();
|
||||
if (matchesKey(data, Key.escape)) cancel();
|
||||
|
||||
// Modifiers
|
||||
if (matchesKey(data, Key.ctrl("c"))) abort();
|
||||
if (matchesKey(data, Key.shift("tab"))) back();
|
||||
if (matchesKey(data, Key.ctrlShift("p"))) command();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Implement NixOS-aware footer extension
|
||||
2. Create Nix build overlay for long operations
|
||||
3. Add beads issue selector overlay
|
||||
4. Prototype multi-model consensus UI
|
||||
5. Build git checkpoint visualizer
|
||||
6. Add plan mode visual indicator
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [pi-mono TUI package](https://github.com/badlogic/pi-mono/tree/main/packages/tui)
|
||||
- [pi-mono extension examples](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent/examples/extensions)
|
||||
- [pi-powerline-footer](https://github.com/nicobailon/pi-powerline-footer)
|
||||
- [@mariozechner/pi-tui README](https://github.com/badlogic/pi-mono/blob/main/packages/tui/README.md)
|
||||
202
docs/work/2026-01-22-ralph-iteration-counter-bug.md
Normal file
202
docs/work/2026-01-22-ralph-iteration-counter-bug.md
Normal file
|
|
@ -0,0 +1,202 @@
|
|||
# Bug Report: Ralph Loop Iteration Counter Not Incrementing
|
||||
|
||||
**Date**: 2026-01-22
|
||||
**Repo**: dotfiles (using skills flake's ralph-wiggum extension)
|
||||
**Extension**: `~/.pi/agent/extensions/ralph-wiggum/index.ts`
|
||||
|
||||
## Summary
|
||||
|
||||
The Ralph loop iteration counter stays stuck at 1 even when the agent completes work and calls `ralph_done`. The iteration prompt shows "Iteration 1/50" throughout the entire session, never advancing.
|
||||
|
||||
## Observed Behavior
|
||||
|
||||
1. Started ralph loop with `ralph_start` tool
|
||||
2. Completed 7 categories of review work (35 lens passes)
|
||||
3. Called `ralph_done` multiple times after completing work
|
||||
4. Each `ralph_done` call returned: `"Pending messages already queued. Skipping ralph_done."`
|
||||
5. Iteration counter never incremented past 1
|
||||
6. Work completed successfully but loop showed "Iteration 1/50" the entire time
|
||||
7. Final completion banner showed "1 iterations" despite doing ~7 logical iterations of work
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
In `ralph_done` tool execute function (line ~460):
|
||||
|
||||
```typescript
|
||||
async execute(_toolCallId, _params, _onUpdate, ctx) {
|
||||
if (!currentLoop) {
|
||||
return { content: [{ type: "text", text: "No active Ralph loop." }], details: {} };
|
||||
}
|
||||
|
||||
const state = loadState(ctx, currentLoop);
|
||||
if (!state || state.status !== "active") {
|
||||
return { content: [{ type: "text", text: "Ralph loop is not active." }], details: {} };
|
||||
}
|
||||
|
||||
// THIS IS THE PROBLEM
|
||||
if (ctx.hasPendingMessages()) {
|
||||
return {
|
||||
content: [{ type: "text", text: "Pending messages already queued. Skipping ralph_done." }],
|
||||
details: {},
|
||||
};
|
||||
}
|
||||
|
||||
// Iteration only increments AFTER the pending messages check
|
||||
state.iteration++;
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
The `ctx.hasPendingMessages()` check returns `true` when:
|
||||
- Other tool calls are batched with `ralph_done`
|
||||
- Follow-up messages are queued from previous operations
|
||||
- Any async operations have pending responses
|
||||
|
||||
**In practice**, this guard ALWAYS triggers during normal agent operation because:
|
||||
1. Agent makes multiple tool calls (read files, run commands, file issues)
|
||||
2. Agent then calls `ralph_done`
|
||||
3. Previous tool responses create "pending messages"
|
||||
4. Guard triggers, iteration skipped
|
||||
|
||||
## Impact
|
||||
|
||||
- **User confusion**: Progress appears stuck at iteration 1
|
||||
- **No reflection checkpoints**: `reflectEvery` never triggers since iteration never advances
|
||||
- **Incorrect completion stats**: Final banner shows wrong iteration count
|
||||
- **Work document diverges**: Agent's actual progress doesn't match Ralph's iteration state
|
||||
|
||||
## Reproduction Steps
|
||||
|
||||
1. Start a ralph loop:
|
||||
```
|
||||
/ralph start test-loop --items-per-iteration 5
|
||||
```
|
||||
|
||||
2. Have the agent do ANY work involving multiple tool calls:
|
||||
```
|
||||
- Read a few files
|
||||
- Run some bash commands
|
||||
- Call ralph_done
|
||||
```
|
||||
|
||||
3. Observe: `ralph_done` returns "Pending messages already queued"
|
||||
|
||||
4. Check state file:
|
||||
```bash
|
||||
cat .ralph/test-loop.state.json | jq .iteration
|
||||
# Always returns 1
|
||||
```
|
||||
|
||||
## Proposed Fixes
|
||||
|
||||
### Option A: Remove the guard entirely
|
||||
|
||||
The guard's purpose seems to be preventing duplicate iteration messages, but it's too aggressive:
|
||||
|
||||
```typescript
|
||||
// Remove this block entirely
|
||||
if (ctx.hasPendingMessages()) {
|
||||
return { ... };
|
||||
}
|
||||
```
|
||||
|
||||
**Risk**: Might cause duplicate prompts if agent calls ralph_done multiple times.
|
||||
|
||||
### Option B: Increment iteration regardless, only skip prompt delivery
|
||||
|
||||
```typescript
|
||||
// Always increment
|
||||
state.iteration++;
|
||||
saveState(ctx, state);
|
||||
updateUI(ctx);
|
||||
|
||||
// Only skip the PROMPT delivery if there are pending messages
|
||||
if (ctx.hasPendingMessages()) {
|
||||
return {
|
||||
content: [{ type: "text", text: `Iteration ${state.iteration} recorded. Prompt deferred due to pending messages.` }],
|
||||
details: {},
|
||||
};
|
||||
}
|
||||
|
||||
// Continue with prompt delivery...
|
||||
```
|
||||
|
||||
**Benefit**: Counter stays accurate even if prompt is deferred.
|
||||
|
||||
### Option C: Check for pending USER messages only
|
||||
|
||||
If `hasPendingMessages()` can distinguish message types:
|
||||
|
||||
```typescript
|
||||
if (ctx.hasPendingUserMessages?.()) { // More specific check
|
||||
return { ... };
|
||||
}
|
||||
```
|
||||
|
||||
**Benefit**: Tool responses wouldn't block iteration.
|
||||
|
||||
### Option D: Use a flag to prevent re-entry
|
||||
|
||||
```typescript
|
||||
// At module level
|
||||
let ralph_done_in_progress = false;
|
||||
|
||||
// In execute
|
||||
if (ralph_done_in_progress) {
|
||||
return { content: [{ type: "text", text: "ralph_done already in progress." }], details: {} };
|
||||
}
|
||||
ralph_done_in_progress = true;
|
||||
try {
|
||||
// ... do the work
|
||||
} finally {
|
||||
ralph_done_in_progress = false;
|
||||
}
|
||||
```
|
||||
|
||||
**Benefit**: Prevents actual re-entry without blocking on unrelated pending messages.
|
||||
|
||||
## Recommended Fix
|
||||
|
||||
**Option B** seems safest:
|
||||
- Iteration counter always reflects actual progress
|
||||
- UI stays accurate
|
||||
- Prompt delivery can be deferred without losing state
|
||||
- Backwards compatible
|
||||
|
||||
## Additional Context
|
||||
|
||||
### State file after "completion" (iteration stuck at 1):
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "nix-modules-review",
|
||||
"taskFile": ".ralph/nix-modules-review.md",
|
||||
"iteration": 1,
|
||||
"maxIterations": 50,
|
||||
"itemsPerIteration": 5,
|
||||
"reflectEvery": 0,
|
||||
"active": false,
|
||||
"status": "completed",
|
||||
"startedAt": "2026-01-22T22:49:53.055Z",
|
||||
"completedAt": "2026-01-22T22:55:10.628Z"
|
||||
}
|
||||
```
|
||||
|
||||
### Actual work completed:
|
||||
- 7 module categories reviewed
|
||||
- 5 lenses per category = 35 review passes
|
||||
- 14 issues filed in beads
|
||||
- Epic created and closed
|
||||
|
||||
The iteration should have been ~7-8, not 1.
|
||||
|
||||
## Questions for Investigation
|
||||
|
||||
1. What exactly does `ctx.hasPendingMessages()` check? Is it documented in pi's ExtensionAPI?
|
||||
2. Is this guard necessary for correctness, or just a precaution?
|
||||
3. Are there other extensions using similar patterns that work correctly?
|
||||
4. Should `ralph_done` be designed to be called as the ONLY tool in a response (documented behavior)?
|
||||
|
||||
## Workaround (Current)
|
||||
|
||||
Agent can manually copy the completed work doc to `.ralph/` and output `<promise>COMPLETE</promise>` to trigger completion detection via the `agent_end` event handler, bypassing `ralph_done` entirely. This is what happened in the observed session.
|
||||
Loading…
Reference in a new issue