From f8db8771ea193636f23d742fec04eda8f55c8277 Mon Sep 17 00:00:00 2001
From: dan <dan@delpad>
Date: Wed, 24 Dec 2025 01:30:03 -0500
Subject: [PATCH] orch skill: sync with CLI v0.1.0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Update model aliases (gpt-5.2, claude-opus-4.5, etc.)
- Add new models: deepseek, r1, qwen, glm, sonar
- Document --synthesize, --websearch, --serial flags
- Document stdin piping, orch models, orch sessions
- Add --allow-expensive usage guidance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 .beads/issues.jsonl  |   1 +
 skills/orch/SKILL.md | 104 +++++++++++++++++++++++++++++++++----------
 2 files changed, 81 insertions(+), 24 deletions(-)

diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl
index 9f36d36..e8024d2 100644
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
@@ -1,3 +1,4 @@
+{"id":"skills-0nl","title":"Update orch skill to match CLI v0.1.0","description":"The orch skill (in ~/.claude/skills/orch/) is out of sync with the orch CLI.\n\nNeeded updates:\n- Fix model aliases: gpt-5 → gpt-5.2, claude-opus-4.1 → claude-opus-4.5\n- Add new aliases: deepseek, r1, qwen, qwen-fast, glm, sonar, sonar-pro\n- Document --synthesize flag for response aggregation\n- Document stdin piping support\n- Document orch models command\n- Document orch sessions command\n- Add --websearch, --serial, --allow-expensive options\n\nReference: ~/proj/orch/README.md and src/orch/models_registry.py","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T21:11:46.294285184-05:00","updated_at":"2025-12-24T01:29:54.408882125-05:00","closed_at":"2025-12-24T01:29:54.408882125-05:00","close_reason":"Updated skill with current CLI features and model aliases"}
 {"id":"skills-0og","title":"spec-review: Define output capture and audit trail","description":"Reviews happen in terminal then disappear. No audit trail, no diffable history.\n\nAdd:\n- Guidance to tee output to review file (e.g., specs/{branch}/review.md)\n- Standard location for gate check results\n- Template for recording decisions and rationale","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-15T00:23:23.705164812-08:00","updated_at":"2025-12-15T13:02:32.313084337-08:00","closed_at":"2025-12-15T13:02:32.313084337-08:00"}
 {"id":"skills-1ig","title":"Brainstorm agent-friendly doc conventions","description":"# Agent-Friendly Doc Conventions - Hybrid Architecture\n\n## FINAL ARCHITECTURE: Vale + LLM Hybrid\n\n### Insight\n\u003e \"Good old deterministic testing (dumb robots) is the best way to keep in check LLMs (smart robots) at volume.\"\n\n### Split by Tool\n\n| Category | Rubrics | Tool |\n|----------|---------|------|\n| Vale-only | Format Integrity, Deterministic Instructions, Terminology Strictness, Token Efficiency | Fast, deterministic, CI-friendly |\n| Vale + LLM | Semantic Headings, Configuration Precision, Security Boundaries | Vale flags, LLM suggests fixes |\n| LLM-only | Contextual Independence, Code Executability, Execution Verification | Semantic understanding required |\n\n### Pipeline\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│ Stage 1: Vale (deterministic, fast, free)                   │\n│ - Runs in CI on every commit                                │\n│ - Catches 40% of issues instantly                           │\n│ - No LLM cost for clean docs                                │\n└─────────────────────┬───────────────────────────────────────┘\n                      │ only if Vale passes\n                      ▼\n┌─────────────────────────────────────────────────────────────┐\n│ Stage 2: LLM Triage (cheap model)                           │\n│ - Evaluates 3 semantic rubrics                              │\n│ - Identifies which need patches                             │\n└─────────────────────┬───────────────────────────────────────┘\n                      │ only if issues found\n                      ▼\n┌─────────────────────────────────────────────────────────────┐\n│ Stage 3: LLM Specialists (capable model)                    │\n│ - One agent per failed rubric                               │\n│ - Generates patches                                         │\n└─────────────────────────────────────────────────────────────┘\n```\n\n### Why This Works\n- Vale is battle-tested, fast, CI-native\n- LLM only fires when needed (adaptive cost)\n- Deterministic rules catch predictable issues\n- LLM handles semantic/contextual issues\n\n---\n\n## Vale Rules Needed\n\n### Format Integrity\n- Existence: code blocks without language tags\n- Regex for unclosed fences\n\n### Deterministic Instructions  \n- Existence: hedging words (\"might\", \"may want to\", \"consider\", \"you could\")\n\n### Terminology Strictness\n- Consistency: flag term variations\n\n### Token Efficiency\n- Existence: filler phrases (\"In this section we will...\", \"As you may know...\")\n\n### Semantic Headings (partial)\n- Existence: banned headings (\"Overview\", \"Introduction\", \"Getting Started\")\n\n### Configuration Precision (partial)\n- Existence: vague versions (\"Python 3.x\", \"recent version\")\n\n### Security Boundaries (partial)\n- Existence: hardcoded API key patterns\n\n---\n\n## NEXT STEPS\n\n1. Create Vale style for doc-review rubrics\n2. Test Vale on sample docs\n3. Design LLM prompts for semantic rubrics only\n4. Wire into orch or standalone","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-04T14:02:04.898026177-08:00","updated_at":"2025-12-04T16:43:53.0608948-08:00","closed_at":"2025-12-04T16:43:53.0608948-08:00"}
 {"id":"skills-1n3","title":"Set up agent skills for Gemini CLI","description":"The AI agent skills (worklog, web-search, etc.) configured in .skills are not currently working when using the Gemini CLI. \\n\\nObserved behavior:\\n- 'worklog' command not found even after 'direnv reload'.\\n- .envrc sources ~/proj/skills/bin/use-skills.sh, but skills are not accessible in the Gemini agent session.\\n\\nNeed to:\\n1. Investigate how Gemini CLI loads its environment compared to Claude Code.\\n2. Update 'use-skills.sh' or direnv configuration to support Gemini CLI.\\n3. Ensure skill symlinks/binaries are correctly in the PATH for Gemini.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-22T17:39:28.106296919-05:00","updated_at":"2025-12-22T17:39:28.106296919-05:00"}
diff --git a/skills/orch/SKILL.md b/skills/orch/SKILL.md
index 7f729f7..a499ff3 100644
--- a/skills/orch/SKILL.md
+++ b/skills/orch/SKILL.md
@@ -39,25 +39,37 @@ orch consensus "PROMPT" MODEL1 MODEL2 [MODEL3...]
 ```
 
 **Model Aliases** (use these):
-- `flash` → gemini-2.5-flash-preview (fast, cheap)
-- `gemini` → gemini-3-pro-preview (strong reasoning)
-- `qwen` → qwen3-8b (fast, cheap)
-- `deepseek` → deepseek-v3 (balanced)
-- `r1` → deepseek-r1 (strongest reasoning)
-- `gpt` / `gpt5` → gpt-5.1 (strong reasoning)
-- `gpt4` → gpt-4o (legacy)
+
+| Alias | Model | Notes |
+|-------|-------|-------|
+| `flash` | gemini-3-flash-preview | Fast, free |
+| `gemini` | gemini-3-pro-preview | Strong reasoning, free |
+| `gpt` / `gpt5` | gpt-5.2 | Strong reasoning |
+| `gpt4` | gpt-4o | Legacy |
+| `claude` / `sonnet` | claude-sonnet-4.5 | Balanced (via OpenRouter) |
+| `haiku` | claude-haiku-4.5 | Fast, cheap |
+| `opus` | claude-opus-4.5 | Strongest, expensive |
+| `deepseek` | deepseek-v3.2 | Good value |
+| `r1` | deepseek-r1-0528 | Reasoning model, expensive |
+| `qwen` | qwen3-235b-a22b | Good value |
+| `qwen-fast` | qwen3-8b | Very fast/cheap |
+| `glm` | glm-4.7 | Reasoning capable |
+| `sonar` | perplexity/sonar | Web search built-in |
+| `sonar-pro` | perplexity/sonar-pro | Better web search |
+
+Use `orch models` to see all available models with pricing and status.
 
 ## Model Selection
 
-**Quick sanity check**: Use `flash qwen` for fast, cheap validation. Good for "am I missing something obvious?" checks.
+**Quick sanity check**: Use `flash qwen-fast` for fast, cheap validation. Good for "am I missing something obvious?" checks.
 
 **Standard consensus**: Use `flash gemini deepseek` for balanced perspectives across providers. Default for most decisions.
 
-**Deep analysis**: Include `r1` or `gpt` when stakes are high or reasoning is complex. These models think longer but cost more.
+**Deep analysis**: Include `r1` or `gpt` when stakes are high or reasoning is complex. These models think longer but cost more. Use `--allow-expensive` for r1/opus.
 
-**Diverse viewpoints**: Mix providers (Google + DeepSeek + OpenAI) rather than multiple models from one provider. Different training leads to genuinely different perspectives.
+**Diverse viewpoints**: Mix providers (Google + DeepSeek + OpenAI + Anthropic) rather than multiple models from one provider. Different training leads to genuinely different perspectives.
 
-**Cost-conscious**: `flash` and `qwen` are 10-20x cheaper than premium models. Start cheap, escalate if needed.
+**Cost-conscious**: `flash` and `qwen-fast` are 10-100x cheaper than premium models. Start cheap, escalate if needed.
 
 **Options**:
 - `--mode vote` (default) - Models give Support/Oppose/Neutral verdict
@@ -65,13 +77,23 @@ orch consensus "PROMPT" MODEL1 MODEL2 [MODEL3...]
 - `--mode critique` - Find flaws and weaknesses
 - `--mode open` - Freeform responses, no structured output
 - `--temperature 0.1` - Lower = more focused (default 0.1)
-- `--file PATH` - Include file as context
-- `--enhance` - Use AI to improve prompt before querying
+- `--file PATH` - Include file as context (can use multiple times)
+- `--websearch` - Enable web search (Gemini models only)
+- `--serial` - Run models in sequence instead of parallel
+- `--strategy` - Serial strategy: neutral (default), refine, debate, brainstorm
+- `--synthesize MODEL` - Aggregate all responses into summary using MODEL
+- `--allow-expensive` - Allow expensive/slow models (opus, r1)
+- `--timeout SECS` - Timeout per model (default 300)
 
 **Stances** (devil's advocate):
 Append `:for`, `:against`, or `:neutral` to bias a model's perspective:
 ```bash
-orch consensus "Should we rewrite in Rust?" gpt5:for deepseek:against gemini:neutral
+orch consensus "Should we rewrite in Rust?" gpt:for claude:against gemini:neutral
+```
+
+**Stdin piping**:
+```bash
+cat code.py | orch consensus "Is this implementation correct?" flash gemini
 ```
 
 ### orch chat
@@ -81,6 +103,31 @@ Single-model conversation (when you don't need consensus):
 orch chat "MESSAGE" --model gemini
 ```
 
+Options:
+- `--model MODEL` - Model to use (default: gemini)
+- `--session ID` - Continue a session
+- `--file PATH` - Attach file
+- `--websearch` / `--no-websearch` - Toggle search (default: on)
+- `--allow-expensive` - Allow expensive models
+
+### orch models
+
+List and inspect available models:
+```bash
+orch models                   # List all models with status
+orch models resolve <alias>   # Show details for specific alias
+```
+
+### orch sessions
+
+Manage conversation sessions:
+```bash
+orch sessions list              # List all sessions
+orch sessions show <id>         # Show session details
+orch sessions clean 7d          # Delete sessions older than 7 days
+orch sessions export <id>       # Export session as JSON
+```
+
 ## Usage Patterns
 
 ### Quick Second Opinion
@@ -92,7 +139,7 @@ orch consensus "I think we should use SQLite for this because [reasons]. Is this
 ### Architecture Decision
 When facing a tradeoff:
 ```bash
-orch consensus "Microservices vs monolith for a 3-person team building an e-commerce site?" flash gemini deepseek --mode vote
+orch consensus "Microservices vs monolith for a 3-person team building an e-commerce site?" flash gemini gpt --mode vote
 ```
 
 ### Code Review
@@ -104,7 +151,7 @@ orch consensus "Is this error handling approach correct and complete?" flash gem
 ### Devil's Advocate
 Get opposing viewpoints deliberately:
 ```bash
-orch consensus "Should we adopt Kubernetes?" gpt5:for deepseek:against flash:neutral
+orch consensus "Should we adopt Kubernetes?" gpt:for claude:against flash:neutral
 ```
 
 ### Brainstorm
@@ -119,6 +166,18 @@ Find weaknesses before presenting:
 orch consensus "What are the flaws in this API design?" flash gemini --file api-spec.yaml --mode critique
 ```
 
+### Synthesize Responses
+Get a unified summary from multiple perspectives:
+```bash
+orch consensus "Evaluate this architecture" flash gemini gpt --synthesize gemini
+```
+
+### Use Reasoning Models
+For complex analysis requiring deep thinking:
+```bash
+orch consensus "Analyze the security implications" r1 gemini --allow-expensive
+```
+
 ## Output Format
 
 Vote mode returns structured verdicts:
@@ -128,13 +187,13 @@ Vote mode returns structured verdicts:
 │                   SUPPORT: 2  OPPOSE: 1  NEUTRAL: 0          │
 └─────────────────────────────────────────────────────────────┘
 
-[flash] gemini-2.5-flash - SUPPORT
+[flash] gemini-3-flash-preview - SUPPORT
 Reasoning: ...
 
 [gemini] gemini-3-pro-preview - SUPPORT
 Reasoning: ...
 
-[deepseek] deepseek-v3 - OPPOSE
+[claude] claude-sonnet-4.5 - OPPOSE
 Reasoning: ...
 ```
 
@@ -142,15 +201,12 @@ Reasoning: ...
 
 1. **Use for genuine uncertainty** - Don't use orch for trivial decisions or to avoid thinking
 2. **Provide context** - Better prompts get better consensus; use `--file` when relevant
-3. **Choose models wisely** - flash/qwen for quick checks, r1/gpt for complex reasoning
+3. **Choose models wisely** - flash/qwen-fast for quick checks, r1/opus for complex reasoning
 4. **Consider stances** - Devil's advocate is powerful for stress-testing ideas
 5. **Parse the reasoning** - The verdict matters less than understanding the reasoning
+6. **Mind the cost** - opus and r1 require `--allow-expensive`; use cheaper models for iteration
 
 ## Requirements
 
 - `orch` CLI installed (via home-manager or system packages)
-- API keys configured (OPENROUTER_KEY, GOOGLE_API_KEY, OPENAI_API_KEY)
-
-## Examples
-
-See `examples/` directory for sample outputs from different consensus modes.
+- API keys configured: GEMINI_API_KEY, OPENAI_API_KEY, OPENROUTER_KEY