diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 7856894..b8ccf02 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -65,7 +65,7 @@ {"id":"skills-5vg","title":"spec-review: Add context/assumptions step to prompts","description":"Reviews can become speculative without establishing context first.\n\nAdd to prompts:\n- List assumptions being made\n- Distinguish: missing from doc vs implied vs out of scope\n- Ask clarifying questions if critical context missing","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-15T00:23:25.681448596-08:00","updated_at":"2025-12-15T14:06:15.415750911-08:00","closed_at":"2025-12-15T14:06:15.415750911-08:00"} {"id":"skills-5x2o","title":"Extract msToUnix helper for repeated div 1000","description":"[SMELL] LOW state.nim - 'div 1000' for ms to seconds conversion repeated 8 times. Add helper proc msToUnix(ms: int64): int64 in types.nim.","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-10T19:49:52.505245039-08:00","created_by":"dan","updated_at":"2026-01-10T20:32:28.362386563-08:00","closed_at":"2026-01-10T20:32:28.362386563-08:00","close_reason":"Created utils.nim with common helpers"} {"id":"skills-5xkg","title":"Document Intent/Approach/Work workflow","description":"Write user-facing documentation for structured beads.\n\n## Deliverable\n- How-to guide\n- Template reference\n- Examples at different scales\n\n## Sections\n- Why structure? (vs just doing the thing)\n- The three phases: Intent / Approach / Work\n- Full template vs minimal template\n- When to use each\n- Examples: small fix, medium feature, large epic\n- Integration with bd commands","status":"closed","priority":3,"issue_type":"task","owner":"dan@delpad","created_at":"2026-01-18T08:13:59.050133558-08:00","created_by":"dan","updated_at":"2026-01-18T20:20:47.00512145-08:00","closed_at":"2026-01-18T20:20:47.00512145-08:00","close_reason":"Docs written to docs/intent-approach-work.md","dependencies":[{"issue_id":"skills-5xkg","depends_on_id":"skills-oh8m","type":"blocks","created_at":"2026-01-18T08:14:32.866401069-08:00","created_by":"dan"},{"issue_id":"skills-5xkg","depends_on_id":"skills-ankb","type":"blocks","created_at":"2026-01-18T08:14:45.264952521-08:00","created_by":"dan"},{"issue_id":"skills-5xkg","depends_on_id":"skills-sx8u","type":"blocks","created_at":"2026-01-18T08:14:45.375561869-08:00","created_by":"dan"},{"issue_id":"skills-5xkg","depends_on_id":"skills-4ecn","type":"blocks","created_at":"2026-01-18T08:26:55.34104244-08:00","created_by":"dan"}]} -{"id":"skills-5ycq","title":"Implement /synod multi-model consensus extension for pi","description":"## Overview\n\nImplement a `/synod` command for pi-coding-agent that provides multi-model consensus with conversation context inheritance and interactive UI.\n\n## Background\n\nResearch conducted 2026-01-22 analyzing:\n- Orch CLI capabilities (423 models, voting, synthesis, serial strategies)\n- Pi Oracle extension from shitty-extensions (conversation context, add-to-context flow)\n- Gap between the two approaches\n\n## Core Concept\n\n`/synod` = Assembly of AI models convened to deliberate. One command covering both:\n- **Parallel voting** (conclave-style): Independent opinions, tally votes\n- **Serial discussion** (council-style): Models build on each other's responses\n\n## Usage Design\n\n```bash\n/synod \"Should we use Rust?\" flash gemini claude # Parallel vote (default)\n/synod \"Should we use Rust?\" --debate # Serial discussion\n/synod \"Should we use Rust?\" --brainstorm # Generative mode\n/synod \"Should we use Rust?\" --vote # Explicit parallel vote\n```\n\n## Key Features Required\n\n### Must Have\n- [ ] Multi-select model picker with quick keys\n- [ ] Conversation context inheritance (serialize pi conversation to models)\n- [ ] Parallel query execution with progress indicators\n- [ ] Vote parsing (SUPPORT/OPPOSE/NEUTRAL) from responses\n- [ ] Results display with scrolling\n- [ ] Add-to-context workflow (YES/SUMMARY/NO)\n- [ ] Cost estimation before query\n\n### Nice to Have\n- [ ] Side-by-side comparison view\n- [ ] Diff highlighting for disagreements\n- [ ] Response caching (5min TTL)\n- [ ] Model recommendations based on query type\n- [ ] Synthesis mode (aggregate responses)\n- [ ] Serial strategies (refine, debate)\n\n## Architecture Decision: Hybrid Approach\n\n1. **Keep orch CLI** for advanced features (423 models, synthesis, sessions, serial strategies)\n2. **Add /synod extension** for interactive queries with conversation context\n3. **Register orch_consensus tool** for agent programmatic access\n\n### Why Hybrid?\n- Orch: No conversation context sharing, no interactive UI\n- Oracle: Only one model at a time, no voting\n- Synod: Best of both - context inheritance + multi-model + voting + UI\n\n## Technical Implementation\n\n### Conversation Context Serialization\n```typescript\nimport { serializeConversation, convertToLlm } from \"@mariozechner/pi-coding-agent\";\n\nconst history = ctx.sessionManager.getBranch();\nconst serialized = serializeConversation(history);\nconst llmMessages = serialized.map(convertToLlm);\n```\n\n### Model Registry\nStart by querying orch: `orch models` and parse output.\nLater: import orch's config directly.\n\n### Vote Parsing\nPrompt engineering approach:\n```\nRespond with your verdict first: SUPPORT, OPPOSE, or NEUTRAL\nThen explain your reasoning.\n```\nParse with regex, fallback to secondary classification query.\n\n### Add-to-Context Options\n1. YES - Add all model responses verbatim\n2. SUMMARY - Synthesize and add summary only\n3. NO - Don't add to conversation\n\n## UI Patterns (from research)\n\n### Model Picker\n- Multi-select with checkboxes\n- Quick keys 1-9 for fast selection\n- Show cost per model\n- Filter by authenticated models only\n- Exclude current model\n\n### Results Display\n- Progressive disclosure: Gauge → List → Side-by-side\n- Vote counts: SUPPORT: 2, OPPOSE: 1, NEUTRAL: 0\n- Scrollable reasoning for each model\n- Box drawing character borders\n\n### Key Detection\n```typescript\nimport { matchesKey, Key } from \"@mariozechner/pi-tui\";\nif (matchesKey(data, Key.enter)) submit();\nif (matchesKey(data, Key.escape)) cancel();\n```\n\n## Implementation Plan\n\n### Phase 1: Basic /synod (Week 1)\n1. Port Oracle extension structure\n2. Add model aliases from orch\n3. Multi-select model picker\n4. Parallel query execution\n5. Basic results display\n6. Add-to-context workflow\n\n### Phase 2: Voting \u0026 Comparison (Week 2)\n1. Vote parsing from responses\n2. Consensus gauge visualization\n3. Side-by-side comparison view\n4. Cost preview before query\n\n### Phase 3: Advanced Features (Week 3-4)\n1. Serial strategies (--debate, --refine)\n2. Synthesis mode\n3. Response caching\n4. orch_consensus tool wrapper for agent\n\n## Research References\n\nFull research documents:\n- /tmp/pi-extension-ecosystem-research.md (14KB)\n- /tmp/pi-ui-ecosystem-research.md (22KB)\n- /tmp/multi-model-consensus-analysis.md (22KB)\n\nKey sources:\n- shitty-extensions/oracle.ts - UI patterns, context serialization\n- pi-mono/packages/tui - Component architecture\n- pi-mono/examples/extensions - Official patterns\n- nicobailon/pi-* - Community extensions\n\n## Design Questions Resolved\n\n1. **Single vs Multi model?** Support both via modes\n2. **Auto-add to context?** Always prompt (configurable)\n3. **Expensive models?** Show cost warning, require confirmation\n4. **Caching?** 5min TTL with hash(model+context+prompt)\n5. **Visualization?** Progressive disclosure (gauge → list → diff)","status":"open","priority":2,"issue_type":"feature","owner":"dan@delpad","created_at":"2026-01-22T22:35:32.203497461-08:00","created_by":"dan","updated_at":"2026-01-22T22:35:32.203497461-08:00","labels":["multi-model","pi-extension","synod"]} +{"id":"skills-5ycq","title":"Implement /synod multi-model consensus extension for pi","description":"## Overview\n\nImplement a `/synod` command for pi-coding-agent that provides multi-model consensus with conversation context inheritance and interactive UI.\n\n## Background\n\nResearch conducted 2026-01-22 analyzing:\n- Orch CLI capabilities (423 models, voting, synthesis, serial strategies)\n- Pi Oracle extension from shitty-extensions (conversation context, add-to-context flow)\n- Gap between the two approaches\n\n## Core Concept\n\n`/synod` = Assembly of AI models convened to deliberate. One command covering both:\n- **Parallel voting** (conclave-style): Independent opinions, tally votes\n- **Serial discussion** (council-style): Models build on each other's responses\n\n## Usage Design\n\n```bash\n/synod \"Should we use Rust?\" flash gemini claude # Parallel vote (default)\n/synod \"Should we use Rust?\" --debate # Serial discussion\n/synod \"Should we use Rust?\" --brainstorm # Generative mode\n/synod \"Should we use Rust?\" --vote # Explicit parallel vote\n```\n\n## Key Features Required\n\n### Must Have\n- [ ] Multi-select model picker with quick keys\n- [ ] Conversation context inheritance (serialize pi conversation to models)\n- [ ] Parallel query execution with progress indicators\n- [ ] Vote parsing (SUPPORT/OPPOSE/NEUTRAL) from responses\n- [ ] Results display with scrolling\n- [ ] Add-to-context workflow (YES/SUMMARY/NO)\n- [ ] Cost estimation before query\n\n### Nice to Have\n- [ ] Side-by-side comparison view\n- [ ] Diff highlighting for disagreements\n- [ ] Response caching (5min TTL)\n- [ ] Model recommendations based on query type\n- [ ] Synthesis mode (aggregate responses)\n- [ ] Serial strategies (refine, debate)\n\n## Architecture Decision: Hybrid Approach\n\n1. **Keep orch CLI** for advanced features (423 models, synthesis, sessions, serial strategies)\n2. **Add /synod extension** for interactive queries with conversation context\n3. **Register orch_consensus tool** for agent programmatic access\n\n### Why Hybrid?\n- Orch: No conversation context sharing, no interactive UI\n- Oracle: Only one model at a time, no voting\n- Synod: Best of both - context inheritance + multi-model + voting + UI\n\n## Technical Implementation\n\n### Conversation Context Serialization\n```typescript\nimport { serializeConversation, convertToLlm } from \"@mariozechner/pi-coding-agent\";\n\nconst history = ctx.sessionManager.getBranch();\nconst serialized = serializeConversation(history);\nconst llmMessages = serialized.map(convertToLlm);\n```\n\n### Model Registry\nStart by querying orch: `orch models` and parse output.\nLater: import orch's config directly.\n\n### Vote Parsing\nPrompt engineering approach:\n```\nRespond with your verdict first: SUPPORT, OPPOSE, or NEUTRAL\nThen explain your reasoning.\n```\nParse with regex, fallback to secondary classification query.\n\n### Add-to-Context Options\n1. YES - Add all model responses verbatim\n2. SUMMARY - Synthesize and add summary only\n3. NO - Don't add to conversation\n\n## UI Patterns (from research)\n\n### Model Picker\n- Multi-select with checkboxes\n- Quick keys 1-9 for fast selection\n- Show cost per model\n- Filter by authenticated models only\n- Exclude current model\n\n### Results Display\n- Progressive disclosure: Gauge → List → Side-by-side\n- Vote counts: SUPPORT: 2, OPPOSE: 1, NEUTRAL: 0\n- Scrollable reasoning for each model\n- Box drawing character borders\n\n### Key Detection\n```typescript\nimport { matchesKey, Key } from \"@mariozechner/pi-tui\";\nif (matchesKey(data, Key.enter)) submit();\nif (matchesKey(data, Key.escape)) cancel();\n```\n\n## Implementation Plan\n\n### Phase 1: Basic /synod (Week 1)\n1. Port Oracle extension structure\n2. Add model aliases from orch\n3. Multi-select model picker\n4. Parallel query execution\n5. Basic results display\n6. Add-to-context workflow\n\n### Phase 2: Voting \u0026 Comparison (Week 2)\n1. Vote parsing from responses\n2. Consensus gauge visualization\n3. Side-by-side comparison view\n4. Cost preview before query\n\n### Phase 3: Advanced Features (Week 3-4)\n1. Serial strategies (--debate, --refine)\n2. Synthesis mode\n3. Response caching\n4. orch_consensus tool wrapper for agent\n\n## Research References\n\nFull research documents:\n- /tmp/pi-extension-ecosystem-research.md (14KB)\n- /tmp/pi-ui-ecosystem-research.md (22KB)\n- /tmp/multi-model-consensus-analysis.md (22KB)\n\nKey sources:\n- shitty-extensions/oracle.ts - UI patterns, context serialization\n- pi-mono/packages/tui - Component architecture\n- pi-mono/examples/extensions - Official patterns\n- nicobailon/pi-* - Community extensions\n\n## Design Questions Resolved\n\n1. **Single vs Multi model?** Support both via modes\n2. **Auto-add to context?** Always prompt (configurable)\n3. **Expensive models?** Show cost warning, require confirmation\n4. **Caching?** 5min TTL with hash(model+context+prompt)\n5. **Visualization?** Progressive disclosure (gauge → list → diff)","status":"open","priority":2,"issue_type":"feature","owner":"dan@delpad","created_at":"2026-01-22T22:35:32.203497461-08:00","created_by":"dan","updated_at":"2026-01-22T22:35:32.203497461-08:00","labels":["multi-model","pi-extension","synod"],"comments":[{"id":23,"issue_id":"skills-5ycq","author":"dan","text":"## Research Findings (2026-01-25)\n\n### Key Open Source Tools Discovered\n\n#### 1. **llm-council** (jersobh/consensus)\n- Langchain-compatible framework for deliberative decision-making\n- **Voting modes**: majority, ranked-choice, weighted confidence, veto\n- **Multi-round reasoning** with peer feedback\n- **Parallel execution** support\n- Self-correction mechanisms\n\n#### 2. **Oracle Extension** (hjanuschka/shitty-extensions)\n- **Existing pi-agent extension** for second opinions\n- Single model at a time (not multi-model consensus)\n- Key patterns to adopt:\n - `serializeConversation()` + `convertToLlm()` for context inheritance\n - Model picker with quick keys (1-9)\n - Add-to-context workflow (YES/NO)\n - `BorderedLoader` for async operations\n - `ctx.ui.custom()` for full TUI components\n\n#### 3. **Routing/Orchestration Tools**\n| Tool | Type | Key Feature |\n|------|------|-------------|\n| **RouteLLM** | OSS | Cost-based routing, 85% savings |\n| **LiteLLM** | OSS | Unified API, Python SDK |\n| **Portkey** | Commercial | Conditional routing, observability |\n| **NotDiamond** | Commercial | Predictive model selection |\n\n#### 4. **Consensus Patterns**\n- **Voting/Council**: Independent votes, tally results\n- **Debate**: Models critique each other iteratively\n- **Mixture of Agents (MoA)**: Layered proposer→aggregator\n- **Self-Refine**: Single model iterates with self-feedback\n- **LLM-as-Judge**: One model evaluates others\n\n### Design Considerations\n\n#### Already Well-Aligned\n- Voting mechanism (SUPPORT/OPPOSE/NEUTRAL) ✓\n- Parallel query execution ✓\n- Model aliases via orch CLI ✓\n- Cost preview ✓\n\n#### Consider Adding\n1. **Voting modes** beyond majority: ranked-choice, weighted, veto\n2. **Confidence scores** - low confidence triggers more models\n3. **Model recommendations** based on query type\n4. **MoA-style aggregation** - synthesize insights, not just tally\n5. **Disagreement highlighting** - often the most valuable signal\n\n### Open Questions\n1. Debate mode: synchronous (wait all) vs streaming (show as respond)?\n2. How to handle ties in voting?\n3. Should disagreement be surfaced prominently?\n4. Oracle extends single model → synod extends to N models. Reuse oracle UI patterns?\n5. Integration: wrap orch CLI vs native pi-ai calls?\n\n### Reference Implementations\n- `hjanuschka/shitty-extensions/oracle.ts` - Context serialization, model picker UI\n- `jersobh/consensus` - Voting strategies, multi-round deliberation\n- `qualisero/awesome-pi-agent` - Extension ecosystem overview\n- pi-mono `subagent/index.ts` - Parallel execution, streaming results\n\n### Next Steps\n- [ ] Spike: Port oracle.ts patterns to multi-model\n- [ ] Evaluate: orch CLI wrapper vs native implementation\n- [ ] Design: Voting mode UX (how to select majority vs ranked-choice)\n- [ ] Prototype: Disagreement visualization","created_at":"2026-01-25T07:27:54Z"},{"id":24,"issue_id":"skills-5ycq","author":"dan","text":"## Ecosystem Context\n\n### awesome-pi-agent Highlights\nKey extensions relevant to synod:\n- **oracle** - Second opinion from alt models (single model, context inheritance)\n- **handoff** - Transfer context to new sessions\n- **memory-mode** - Save instructions to AGENTS.md\n- **subagent** - Delegate to specialized agents (parallel, chain modes)\n\n### pi-mono Patterns to Leverage\nFrom official examples:\n- `questionnaire.ts` - Tab-based multi-select, option navigation\n- `subagent/index.ts` - Parallel execution, progress streaming, usage stats\n- `ctx.ui.custom()` - Full TUI components with keyboard input\n\n### Potential Architecture\n\n```\n/synod \"question\" [models...]\n\n┌─────────────────────────────────────────┐\n│ 🔮 Synod - Multi-Model Consensus │\n├─────────────────────────────────────────┤\n│ Q: Should we use Rust for this service? │\n├─────────────────────────────────────────┤\n│ Models: flash, gemini, claude (3) │\n│ Estimated cost: ~$0.02 │\n├─────────────────────────────────────────┤\n│ [Query All] [Edit Models] [Cancel] │\n└─────────────────────────────────────────┘\n\n ↓ parallel queries ↓\n\n┌─────────────────────────────────────────┐\n│ 🗳️ Results (2 SUPPORT, 1 OPPOSE) │\n├─────────────────────────────────────────┤\n│ ✓ flash: SUPPORT │\n│ \"Rust's safety guarantees...\" │\n│ ✓ gemini: SUPPORT │\n│ \"Memory safety without GC...\" │\n│ ✗ claude: OPPOSE │\n│ \"Team expertise in Go...\" ← DISSENT │\n├─────────────────────────────────────────┤\n│ Add to context? [All] [Summary] [None] │\n└─────────────────────────────────────────┘\n```\n\n### Key Differentiator from Oracle\n- Oracle: 1 model, 1 opinion, add to context\n- Synod: N models, vote tally, highlight disagreement, synthesize","created_at":"2026-01-25T07:28:08Z"},{"id":25,"issue_id":"skills-5ycq","author":"dan","text":"## Technical Insights: Confidence \u0026 Disagreement\n\n### Confidence Calibration\nResearch shows LLM confidence often doesn't match actual accuracy. Key approaches:\n- **Multicalibration** - Calibrate across data groupings correlated with correctness\n- **LENS (Learning Ensemble Confidence from Neural States)** - Analyze internal representations\n- **Self-consistency ensembles** - Aggregate confidence across runs\n\n### Disagreement as Signal\nDisagreement between models is valuable information:\n- High disagreement → uncertain/nuanced topic → surface to user\n- Low disagreement → high confidence consensus\n- Systematic bias detection → models from same family may share blind spots\n\n### Implementation Ideas\n1. **Confidence-weighted voting** - Models report confidence, weight votes accordingly\n2. **Disagreement highlighting** - When models disagree, show reasoning side-by-side\n3. **Family diversity** - Recommend models from different providers (OpenAI + Anthropic + Google)\n4. **Tie-breaker escalation** - On tie, optionally query additional model\n\n### Cost vs Quality Tradeoffs\n- Start with cheap models (flash, haiku) for initial vote\n- Escalate to expensive models (opus, gpt-4) only on low confidence/disagreement\n- Cache identical queries across sessions (5min TTL)\n\n### Risk: Overconfidence\nLLMs as judges tend toward overconfidence. Mitigations:\n- Always show reasoning, not just vote\n- Highlight when all models agree (groupthink risk)\n- Option to query \"devil's advocate\" model explicitly","created_at":"2026-01-25T07:28:40Z"},{"id":26,"issue_id":"skills-5ycq","author":"dan","text":"## Design Decisions (2026-01-25 discussion)\n\n### 1. Implementation Strategy: Native pi-ai calls\n\n**Decision:** Write native pi extension using pi-ai directly, not wrapping orch/llm CLIs.\n\n**Rationale:**\n- pi already has unified model registry (`ctx.modelRegistry`)\n- Oracle extension shows the pattern: `complete()` from `@mariozechner/pi-ai`\n- Avoids subprocess overhead and parsing CLI output\n- Full access to streaming, usage stats, abort signals\n- Can leverage pi's API key management\n\n**Trade-off:** Lose orch's 424 model aliases, but pi's registry is sufficient for common models. Can add aliases later.\n\n### 2. Results Delivery: All at once (parallel)\n\n**Decision:** Query in parallel, show results when all complete.\n\n**Rationale:**\n- Simpler UX - one moment of decision, not drip-feed\n- Agent (us) can see full picture before commenting\n- Avoids cognitive load of \"wait, there's more coming\"\n- Streaming progress indicators show activity during wait\n\n**Alternative considered:** Show results as they arrive, agent comments incrementally. Rejected because:\n- Creates pressure to react before full context\n- Harder to synthesize/compare\n- More complex state management\n\n**Exception:** Debate mode (--debate) is inherently serial - models respond to each other.\n\n### 3. Tie Handling: Explicit acknowledgment\n\n**Decision:** Say \"It's a tie\" and show the split.\n\n**Implementation:**\n```\n🗳️ Results: TIE (1 SUPPORT, 1 OPPOSE, 1 NEUTRAL)\n\nNo clear consensus. The models are split:\n- flash: SUPPORT - \"Performance benefits...\"\n- claude: OPPOSE - \"Complexity cost...\" \n- gemini: NEUTRAL - \"Depends on team...\"\n\nConsider: Query additional model? Reframe question?\n```\n\n**Rationale:**\n- Ties ARE the answer sometimes - the question is genuinely contested\n- Forcing a winner hides valuable signal\n- User/agent can decide next action (add model, rephrase, accept ambiguity)\n\n### 4. Summarization: Agent synthesizes\n\n**Decision:** Agent (the one calling /synod) summarizes results, not the tool.\n\n**Rationale:**\n- Agent has full conversation context\n- Agent can weigh results against prior discussion\n- Tool returns structured data, agent interprets\n- Keeps tool simple and composable\n\n**Add-to-context options:**\n1. **All** - Raw responses verbatim (agent can summarize in next turn)\n2. **Summary** - Tool generates brief summary (backup if agent doesn't want to)\n3. **None** - Don't pollute context\n\n**Summary prompt (for option 2):**\n```\nSynthesize these model responses into 2-3 sentences:\n- Note the consensus (if any)\n- Highlight key disagreements\n- Don't pick a winner, present the landscape\n```","created_at":"2026-01-25T07:31:58Z"},{"id":27,"issue_id":"skills-5ycq","author":"dan","text":"### Addendum: Streaming vs Batch UX\n\nWhile we wait for all results before presenting, the **progress UI should stream**:\n\n```\n🔮 Synod - Querying 3 models...\n\n ✓ flash (0.3s) \n ⏳ gemini (1.2s...)\n ⏳ claude (0.8s...)\n```\n\nThis gives:\n- Feedback that something is happening\n- Sense of which models are fast/slow\n- Ability to abort if taking too long\n\nWhen all complete, transition to results view.\n\n### Alternative: \"Reveal as ready\" mode (future)\n\nCould add `--stream` flag for power users who want to see results as they arrive:\n```\n/synod --stream \"question\" flash gemini claude\n```\n\nBut default is batch for cleaner UX.","created_at":"2026-01-25T07:32:06Z"},{"id":28,"issue_id":"skills-5ycq","author":"dan","text":"## llm Plugin Ecosystem Research\n\n### Available Plugins (no consensus/voting built-in)\n\nCurrently installed:\n- `llm-anthropic` - Claude models\n- `llm-openrouter` - OpenRouter gateway (250+ models)\n- `llm-gemini` - Google Gemini\n\nFull directory (50+ plugins): https://llm.datasette.io/en/stable/plugins/directory.html\n\n**Notable: No native ensemble/voting plugin in core llm.**\n\n### Key Discovery: llm-consortium\n\n**GitHub:** irthomasthomas/llm-consortium\n**PyPI:** `llm install llm-consortium`\n\nInspired by Karpathy's insight:\n\u003e \"Your best performance will come from just asking all the models, and then getting them to come to a consensus.\"\n\n**Features:**\n- Multi-model orchestration in parallel\n- Iterative refinement until confidence threshold met\n- Arbiter model synthesizes responses\n- Configurable confidence thresholds (default 0.8)\n- Instance counts per model (`gpt-4o:2` = 2 instances)\n- Conversation continuation support\n- SQLite logging\n\n**Usage:**\n```bash\nllm consortium \"Your complex query\" \\\n -m o3-mini:1 \\\n -m gpt-4o:2 \\\n -m gemini-2:3 \\\n --arbiter gemini-2 \\\n --confidence-threshold 0.9 \\\n --max-iterations 4\n```\n\n**Programmatic:**\n```python\nfrom llm_consortium import create_consortium\n\norchestrator = create_consortium(\n models=[\"o3-mini:1\", \"gpt-4o:2\"],\n confidence_threshold=0.9,\n arbiter=\"gemini-2\"\n)\nresult = orchestrator.orchestrate(\"Your prompt\")\n```\n\n### Other Relevant PyPI Packages\n\n| Package | Description |\n|---------|-------------|\n| `llm-consensus` | Langchain-compatible, voting strategies |\n| `multi-llm-consensus` | Moderator-based consensus |\n| `llm-multi` (0.1.0) | Basic multi-model prompting |\n| `nons` | Majority voting, ensemble decisions |\n| `agorai` | Social choice theory aggregation |\n\n### Implications for /synod\n\n**Option A: Wrap llm-consortium**\n- Pros: Battle-tested, iterative refinement, arbiter synthesis\n- Cons: Subprocess overhead, no pi context inheritance\n\n**Option B: Port llm-consortium patterns to pi-ai**\n- Pros: Native integration, context inheritance, streaming\n- Cons: More implementation work\n\n**Option C: Hybrid - use llm-consortium for synthesis logic**\n- Query models via pi-ai (context inheritance)\n- Use llm-consortium's arbiter prompt pattern for synthesis\n- Best of both worlds?\n\n### llm-consortium Architecture Worth Adopting\n\n1. **Arbiter model** - Dedicated model to synthesize/evaluate\n2. **Confidence threshold** - Iterate until confident enough\n3. **Instance counts** - Run same model multiple times for diversity\n4. **Iterative refinement** - Not just one-shot consensus","created_at":"2026-01-25T07:34:02Z"}]} {"id":"skills-69sz","title":"Fix P1 security bugs (genOid, HeartbeatThread)","description":"Two critical security/safety issues:\n\n1. genOid() - skills-0wk\n - Currently uses rand(25) without randomize()\n - IDs are predictable/deterministic\n - Fix: Use std/sysrand for crypto-safe randomness, or call randomize() at startup\n\n2. HeartbeatThread - skills-bk7x \n - Uses manual alloc0/dealloc\n - Risk of memory leak if startup fails, use-after-free if caller holds reference\n - Fix: Use 'ref HeartbeatThread' with GC management\n\nParent: skills-g2wa","status":"closed","priority":1,"issue_type":"task","created_at":"2026-01-10T20:18:49.759721333-08:00","created_by":"dan","updated_at":"2026-01-10T20:24:36.613555221-08:00","closed_at":"2026-01-10T20:24:36.613555221-08:00","close_reason":"Both P1 security bugs fixed: genOid uses sysrand, HeartbeatThread uses ref type"} {"id":"skills-6ae","title":"Create ui-query skill for AT-SPI integration","description":"Create a skill that provides programmatic UI tree access via AT-SPI.\n\n## Context\nAT-SPI is now enabled in dotfiles (services.gnome.at-spi2-core + QT_LINUX_ACCESSIBILITY_ALWAYS_ON).\nThis complements niri-window-capture (visual) with semantic UI data.\n\n## Capabilities\n- Read text from GTK/Qt widgets directly (no OCR)\n- Find UI elements by role (button, text-field, menu)\n- Query element states (focused, enabled, checked)\n- Get element positions for potential input simulation\n- Navigate parent/child relationships\n\n## Suggested structure\nskills/ui-query/\n├── SKILL.md\n├── scripts/\n│ ├── list-windows.py # Windows with AT-SPI info\n│ ├── get-text.py # Extract text from window/element\n│ ├── find-element.py # Find by role/name\n│ └── query-state.py # Element states\n└── README.md\n\n## Notes\n- Start simple: list windows, get text\n- pyatspi available via python3Packages.pyatspi\n- Use accerciser (now installed) to explore the tree","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-29T15:37:55.592793763-05:00","created_by":"dan","updated_at":"2026-01-15T14:19:42.092890404-08:00","closed_at":"2026-01-15T14:19:42.092890404-08:00","close_reason":"Complete: list-windows, get-text, find-element, query-state all implemented","comments":[{"id":17,"issue_id":"skills-6ae","author":"dan","text":"Initial implementation: list-windows.py working. Shows apps, windows, geometry, states. Remaining: get-text.py, find-element.py, query-state.py","created_at":"2026-01-15T19:57:15Z"}]} {"id":"skills-6e3","title":"Searchable Claude Code conversation history","description":"## Context\nClaude Code persists full conversations in `~/.claude/projects/\u003cproject\u003e/\u003cuuid\u003e.jsonl`. This is complete but not searchable - can't easily find \"that session where we solved X\".\n\n## Goal\nMake conversation history searchable without requiring manual worklogs.\n\n## Approach\n\n### Index structure\n```\n~/.claude/projects/\u003cproject\u003e/\n \u003cuuid\u003e.jsonl # raw conversation (existing)\n index.jsonl # session metadata + summaries (new)\n```\n\n### Index entry format\n```json\n{\n \"uuid\": \"f9a4c161-...\",\n \"date\": \"2025-12-17\",\n \"project\": \"/home/dan/proj/skills\",\n \"summary\": \"Explored Wayland desktop automation, AT-SPI investigation, vision model benchmark\",\n \"keywords\": [\"wayland\", \"niri\", \"at-spi\", \"automation\", \"seeing-problem\"],\n \"commits\": [\"906f2bc\", \"0b97155\"],\n \"duration_minutes\": 90,\n \"message_count\": 409\n}\n```\n\n### Features needed\n1. **Index builder** - Parse JSONL, extract/generate summary + keywords\n2. **Search CLI** - `claude-search \"AT-SPI wayland\"` → matching sessions\n3. **Auto-index hook** - Update index on session end or compaction\n\n## Questions\n- Generate summaries via AI or extract heuristically?\n- Index per-project or global?\n- How to handle very long sessions (multiple topics)?\n\n## Value\n- Find past solutions without remembering dates\n- Model reflection: include relevant past sessions in context\n- Replace manual worklogs with auto-generated metadata","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-17T15:56:50.913766392-08:00","updated_at":"2025-12-29T18:35:56.530154004-05:00","closed_at":"2025-12-29T18:35:56.530154004-05:00","close_reason":"Prototype complete: bin/claude-search indexes 122 sessions, searches by keyword. Future: auto-index hook, full-text search, keyword extraction."} diff --git a/docs/approach/2026-01-24-session-hygiene.md b/docs/approach/2026-01-24-session-hygiene.md new file mode 100644 index 0000000..67b9c05 --- /dev/null +++ b/docs/approach/2026-01-24-session-hygiene.md @@ -0,0 +1,158 @@ +# Approach: Session Hygiene Extension + +## Strategy + +**Core philosophy**: Ambient awareness, not active management. + +The extension provides a persistent footer widget showing git state. The user glances at it when they want to. A `/commit` command offers a guided flow with auto-drafted messages when they're ready to commit. No nudges, no prompts, no interruptions. + +**Key Decisions**: + +1. **Widget vs Status**: Widget (multi-character, always visible) vs setStatus (footer slot, subtle) + → **Widget** — needs to be glanceable without hunting for it + +2. **Polling vs Events**: Poll git status periodically vs hook into tool_result events + → **Hook tool_result** — only re-check after bash/write/edit tools that might change files. Avoids polling overhead. + +3. **Grouping strategy**: No grouping vs LLM-driven grouping + → **LLM-driven grouping** — LLM sees changed files + session context, proposes logical groups with conventional commit messages. Always runs, even for 1-3 files. + +4. **Confirmation flow**: Always confirm vs LLM discretion + → **LLM discretion** — LLM decides when to ask questions (ambiguous grouping, orphan files) vs proceed. User already invoked `/commit`, so trust the intent. + +5. **Orphan files**: Auto-bucket into "misc" vs ask + → **Ask** — if a file doesn't fit any logical group, LLM should ask user where it belongs. + +6. **Staging**: Auto-stage all vs let user stage manually + → **Auto-stage all (`git add -A`)** — matches "just commit everything" simplicity. User can unstage manually before `/commit` if needed. + +## Architecture + +### New Components + +``` +~/.pi/agent/extensions/session-hygiene/ +├── index.ts # Extension entry point +└── git.ts # Git helpers (status, commit, etc.) +``` + +### Extension Structure + +```typescript +// index.ts +export default function(pi: ExtensionAPI) { + // State + let dirtyCount = 0; + + // 1. Widget: show dirty count in footer + pi.on("session_start", updateWidget); + pi.on("tool_result", maybeUpdateWidget); // Only after bash/write/edit + + // 2. Command: /commit + pi.registerCommand("commit", { handler: commitFlow }); +} +``` + +### Data Flow + +``` +[tool_result event] + │ + ▼ + is bash/write/edit? + │ yes + ▼ + git status --porcelain + │ + ▼ + count changed files + │ + ▼ + ctx.ui.setWidget("hygiene", ["● 14 files"]) +``` + +``` +[/commit command] + │ + ▼ + git status --porcelain → list of changed files + │ + ▼ + extract session context: + - recent messages (user prompts, assistant responses) + - file touchpoints (which files were read/written/edited when) + │ + ▼ + LLM prompt: + "Here are the changed files and session context. + Group into logical commits. For each group: + - list files + - conventional commit message + If a file doesn't fit, ask the user. + If grouping is ambiguous, ask. + Otherwise, proceed and execute commits." + │ + ▼ + LLM executes commits via tool calls (git add , git commit -m "...") + │ + ▼ + update widget (now shows 0 or remaining) +``` + +### Commit Tool + +The `/commit` command injects context and lets the LLM drive. It needs a `git_commit` tool: + +```typescript +pi.registerTool({ + name: "git_commit", + description: "Stage specific files and commit with a message", + parameters: Type.Object({ + files: Type.Array(Type.String(), { description: "Files to stage (relative paths)" }), + message: Type.String({ description: "Commit message (conventional format)" }), + }), + async execute(toolCallId, params, onUpdate, ctx, signal) { + // git add + // git commit -m + // return success/failure + }, +}); +``` + +This lets the LLM make multiple commits in sequence, asking questions in between if needed. + +## Risks + +### Known Unknowns + +- **Widget placement**: `setWidget` defaults to above editor. Need to verify `belowEditor` placement looks right for a small status indicator. +- **LLM latency**: Drafting commit message adds a few seconds. Acceptable? Could show "Drafting..." in UI. +- **Model availability**: Need a model for commit message drafting. What if user doesn't have API key for it? + +### Failure Modes + +- **Not a git repo**: `git status` returns non-zero. Extension silently does nothing (no widget, `/commit` shows error). +- **Detached HEAD / merge conflict**: Unusual git states. `/commit` should detect and warn rather than corrupt state. +- **Empty commit**: All changes already staged and committed. `/commit` should detect "nothing to commit" and notify. + +### Blast Radius + +- **Minimal**: Extension only reads git state and runs `git add -A` + `git commit`. No force pushes, no rebase, no destructive operations. +- **Worst case**: User commits something they didn't want to. Recoverable via `git reset HEAD~1`. + +## Phases + +### Phase 1: Widget + git_commit Tool +- Footer widget showing dirty file count +- Updates after file-mutating tools +- `git_commit` tool registered (LLM can use it anytime) + +### Phase 2: /commit Command +- `/commit` command injects context and triggers LLM-driven grouping +- LLM proposes groups, asks questions if uncertain, executes commits +- Widget updates as commits land + +### Phase 3: Polish (Future) +- Stash support (`/stash`) +- Undo last commit (`/uncommit`) +- Integration with worklog skill (prompt to commit after worklog) diff --git a/docs/intent/2026-01-24-session-hygiene.md b/docs/intent/2026-01-24-session-hygiene.md new file mode 100644 index 0000000..4a221b8 --- /dev/null +++ b/docs/intent/2026-01-24-session-hygiene.md @@ -0,0 +1,38 @@ +# Intent: Session Hygiene Extension + +## Motivation + +Working in pi across long sessions, it's easy to lose track of what's changed. You finish a session with dozens of uncommitted files, unclear what goes with what, and the commit history becomes a mess of grab-bag commits. The problem isn't catastrophic — nothing is lost — but it erodes organization over time. + +## Need + +Ambient awareness of git state while working, so commits happen naturally at good moments rather than as panicked cleanup at session end. + +## Use-Cases + +- **Mid-session glance**: You're deep in a refactor, glance at the footer, see "14 files" — mental note that there's a chunk of work building up. You might commit now, or keep going. Either way, you're aware. + +- **Natural stopping point**: You finish a logical unit of work. The footer reminds you there's uncommitted work. You run `/commit`, get a suggested message based on what we discussed, and commit cleanly. + +- **Session end**: You're about to close pi. Footer shows dirty state. You either commit, stash, or consciously leave it — but you're not surprised by 48 files later. + +## Success Criteria + +- Footer widget shows uncommitted file count for current repo at all times +- `/commit` command triggers guided flow with auto-drafted commit message from conversation context +- User never feels nagged, blocked, or guilty — just informed +- Commits end up logical and well-messaged because awareness came early + +## Constraints + +- Single repo only (the one we're in) +- Must work as a pi extension (TypeScript, pi extension API) +- No external dependencies beyond git + +## Anti-Goals + +- **No auto-commit**: Never commit without explicit user action +- **No blocking prompts**: Never interrupt flow with modal dialogs +- **No guilt mechanics**: No "you should commit" nudges, red warnings, or escalating alerts +- **No multi-repo tracking**: Don't watch repos outside current working directory +- **No push**: This is about local commits only diff --git a/docs/work/2026-01-24-session-hygiene.md b/docs/work/2026-01-24-session-hygiene.md new file mode 100644 index 0000000..d902de5 --- /dev/null +++ b/docs/work/2026-01-24-session-hygiene.md @@ -0,0 +1,49 @@ +# Work: Session Hygiene Extension + +## Intent +Link to: [docs/intent/2026-01-24-session-hygiene.md](../intent/2026-01-24-session-hygiene.md) + +## Approach +Link to: [docs/approach/2026-01-24-session-hygiene.md](../approach/2026-01-24-session-hygiene.md) + +## Checklist + +### Phase 1: Widget + git_commit Tool + +- [x] **W001**: Create extension directory structure + - Verification: `ls ~/.pi/agent/extensions/session-hygiene/index.ts` + +- [x] **W002**: Implement git status helper + - Verification: `pi -e ~/.pi/agent/extensions/session-hygiene -p "test" 2>&1 | head -5` (no syntax errors) + +- [ ] **W003**: Implement footer widget showing dirty file count + - Verification: Start pi in a dirty repo, observe widget shows file count + +- [ ] **W004**: Hook tool_result to update widget after bash/write/edit + - Verification: In pi, write a file, observe widget count increases + +- [ ] **W005**: Implement git_commit tool (stage files + commit) + - Verification: `pi -p "Use git_commit to commit README.md with message 'test: verify tool'"` in test repo + +### Phase 2: /commit Command + +- [ ] **W006**: Implement session context extraction (recent messages, file touchpoints) + - Verification: `/commit` in pi shows context being gathered (log or notify) + +- [ ] **W007**: Implement /commit command that injects context and triggers LLM + - Verification: `/commit` in dirty repo triggers LLM response with grouping proposal + +- [ ] **W008**: Verify full flow: /commit → LLM groups → git_commit calls → widget updates + - Verification: End-to-end test in a repo with 5+ changed files across different paths + +## Verification Evidence + +- (2026-01-24 23:xx) W001: `ls ~/.pi/agent/extensions/session-hygiene/index.ts` → exists +- (2026-01-24 23:xx) W002: jiti load fails on missing module (expected) — syntax valid + +## Notes + +- Extension location: `~/.pi/agent/extensions/session-hygiene/` +- Will use `belowEditor` widget placement — need to verify it looks right +- For /commit context injection, use `pi.sendUserMessage()` or `before_agent_start` message injection +- Model for grouping: use whatever model is currently active (no separate API key needed)