From f8db8771ea193636f23d742fec04eda8f55c8277 Mon Sep 17 00:00:00 2001 From: dan Date: Wed, 24 Dec 2025 01:30:03 -0500 Subject: [PATCH] orch skill: sync with CLI v0.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Update model aliases (gpt-5.2, claude-opus-4.5, etc.) - Add new models: deepseek, r1, qwen, glm, sonar - Document --synthesize, --websearch, --serial flags - Document stdin piping, orch models, orch sessions - Add --allow-expensive usage guidance šŸ¤– Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- .beads/issues.jsonl | 1 + skills/orch/SKILL.md | 104 +++++++++++++++++++++++++++++++++---------- 2 files changed, 81 insertions(+), 24 deletions(-) diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 9f36d36..e8024d2 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -1,3 +1,4 @@ +{"id":"skills-0nl","title":"Update orch skill to match CLI v0.1.0","description":"The orch skill (in ~/.claude/skills/orch/) is out of sync with the orch CLI.\n\nNeeded updates:\n- Fix model aliases: gpt-5 → gpt-5.2, claude-opus-4.1 → claude-opus-4.5\n- Add new aliases: deepseek, r1, qwen, qwen-fast, glm, sonar, sonar-pro\n- Document --synthesize flag for response aggregation\n- Document stdin piping support\n- Document orch models command\n- Document orch sessions command\n- Add --websearch, --serial, --allow-expensive options\n\nReference: ~/proj/orch/README.md and src/orch/models_registry.py","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T21:11:46.294285184-05:00","updated_at":"2025-12-24T01:29:54.408882125-05:00","closed_at":"2025-12-24T01:29:54.408882125-05:00","close_reason":"Updated skill with current CLI features and model aliases"} {"id":"skills-0og","title":"spec-review: Define output capture and audit trail","description":"Reviews happen in terminal then disappear. No audit trail, no diffable history.\n\nAdd:\n- Guidance to tee output to review file (e.g., specs/{branch}/review.md)\n- Standard location for gate check results\n- Template for recording decisions and rationale","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-15T00:23:23.705164812-08:00","updated_at":"2025-12-15T13:02:32.313084337-08:00","closed_at":"2025-12-15T13:02:32.313084337-08:00"} {"id":"skills-1ig","title":"Brainstorm agent-friendly doc conventions","description":"# Agent-Friendly Doc Conventions - Hybrid Architecture\n\n## FINAL ARCHITECTURE: Vale + LLM Hybrid\n\n### Insight\n\u003e \"Good old deterministic testing (dumb robots) is the best way to keep in check LLMs (smart robots) at volume.\"\n\n### Split by Tool\n\n| Category | Rubrics | Tool |\n|----------|---------|------|\n| Vale-only | Format Integrity, Deterministic Instructions, Terminology Strictness, Token Efficiency | Fast, deterministic, CI-friendly |\n| Vale + LLM | Semantic Headings, Configuration Precision, Security Boundaries | Vale flags, LLM suggests fixes |\n| LLM-only | Contextual Independence, Code Executability, Execution Verification | Semantic understanding required |\n\n### Pipeline\n\n```\nā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”\n│ Stage 1: Vale (deterministic, fast, free) │\n│ - Runs in CI on every commit │\n│ - Catches 40% of issues instantly │\n│ - No LLM cost for clean docs │\nā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜\n │ only if Vale passes\n ā–¼\nā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”\n│ Stage 2: LLM Triage (cheap model) │\n│ - Evaluates 3 semantic rubrics │\n│ - Identifies which need patches │\nā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜\n │ only if issues found\n ā–¼\nā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”\n│ Stage 3: LLM Specialists (capable model) │\n│ - One agent per failed rubric │\n│ - Generates patches │\nā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜\n```\n\n### Why This Works\n- Vale is battle-tested, fast, CI-native\n- LLM only fires when needed (adaptive cost)\n- Deterministic rules catch predictable issues\n- LLM handles semantic/contextual issues\n\n---\n\n## Vale Rules Needed\n\n### Format Integrity\n- Existence: code blocks without language tags\n- Regex for unclosed fences\n\n### Deterministic Instructions \n- Existence: hedging words (\"might\", \"may want to\", \"consider\", \"you could\")\n\n### Terminology Strictness\n- Consistency: flag term variations\n\n### Token Efficiency\n- Existence: filler phrases (\"In this section we will...\", \"As you may know...\")\n\n### Semantic Headings (partial)\n- Existence: banned headings (\"Overview\", \"Introduction\", \"Getting Started\")\n\n### Configuration Precision (partial)\n- Existence: vague versions (\"Python 3.x\", \"recent version\")\n\n### Security Boundaries (partial)\n- Existence: hardcoded API key patterns\n\n---\n\n## NEXT STEPS\n\n1. Create Vale style for doc-review rubrics\n2. Test Vale on sample docs\n3. Design LLM prompts for semantic rubrics only\n4. Wire into orch or standalone","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-04T14:02:04.898026177-08:00","updated_at":"2025-12-04T16:43:53.0608948-08:00","closed_at":"2025-12-04T16:43:53.0608948-08:00"} {"id":"skills-1n3","title":"Set up agent skills for Gemini CLI","description":"The AI agent skills (worklog, web-search, etc.) configured in .skills are not currently working when using the Gemini CLI. \\n\\nObserved behavior:\\n- 'worklog' command not found even after 'direnv reload'.\\n- .envrc sources ~/proj/skills/bin/use-skills.sh, but skills are not accessible in the Gemini agent session.\\n\\nNeed to:\\n1. Investigate how Gemini CLI loads its environment compared to Claude Code.\\n2. Update 'use-skills.sh' or direnv configuration to support Gemini CLI.\\n3. Ensure skill symlinks/binaries are correctly in the PATH for Gemini.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-22T17:39:28.106296919-05:00","updated_at":"2025-12-22T17:39:28.106296919-05:00"} diff --git a/skills/orch/SKILL.md b/skills/orch/SKILL.md index 7f729f7..a499ff3 100644 --- a/skills/orch/SKILL.md +++ b/skills/orch/SKILL.md @@ -39,25 +39,37 @@ orch consensus "PROMPT" MODEL1 MODEL2 [MODEL3...] ``` **Model Aliases** (use these): -- `flash` → gemini-2.5-flash-preview (fast, cheap) -- `gemini` → gemini-3-pro-preview (strong reasoning) -- `qwen` → qwen3-8b (fast, cheap) -- `deepseek` → deepseek-v3 (balanced) -- `r1` → deepseek-r1 (strongest reasoning) -- `gpt` / `gpt5` → gpt-5.1 (strong reasoning) -- `gpt4` → gpt-4o (legacy) + +| Alias | Model | Notes | +|-------|-------|-------| +| `flash` | gemini-3-flash-preview | Fast, free | +| `gemini` | gemini-3-pro-preview | Strong reasoning, free | +| `gpt` / `gpt5` | gpt-5.2 | Strong reasoning | +| `gpt4` | gpt-4o | Legacy | +| `claude` / `sonnet` | claude-sonnet-4.5 | Balanced (via OpenRouter) | +| `haiku` | claude-haiku-4.5 | Fast, cheap | +| `opus` | claude-opus-4.5 | Strongest, expensive | +| `deepseek` | deepseek-v3.2 | Good value | +| `r1` | deepseek-r1-0528 | Reasoning model, expensive | +| `qwen` | qwen3-235b-a22b | Good value | +| `qwen-fast` | qwen3-8b | Very fast/cheap | +| `glm` | glm-4.7 | Reasoning capable | +| `sonar` | perplexity/sonar | Web search built-in | +| `sonar-pro` | perplexity/sonar-pro | Better web search | + +Use `orch models` to see all available models with pricing and status. ## Model Selection -**Quick sanity check**: Use `flash qwen` for fast, cheap validation. Good for "am I missing something obvious?" checks. +**Quick sanity check**: Use `flash qwen-fast` for fast, cheap validation. Good for "am I missing something obvious?" checks. **Standard consensus**: Use `flash gemini deepseek` for balanced perspectives across providers. Default for most decisions. -**Deep analysis**: Include `r1` or `gpt` when stakes are high or reasoning is complex. These models think longer but cost more. +**Deep analysis**: Include `r1` or `gpt` when stakes are high or reasoning is complex. These models think longer but cost more. Use `--allow-expensive` for r1/opus. -**Diverse viewpoints**: Mix providers (Google + DeepSeek + OpenAI) rather than multiple models from one provider. Different training leads to genuinely different perspectives. +**Diverse viewpoints**: Mix providers (Google + DeepSeek + OpenAI + Anthropic) rather than multiple models from one provider. Different training leads to genuinely different perspectives. -**Cost-conscious**: `flash` and `qwen` are 10-20x cheaper than premium models. Start cheap, escalate if needed. +**Cost-conscious**: `flash` and `qwen-fast` are 10-100x cheaper than premium models. Start cheap, escalate if needed. **Options**: - `--mode vote` (default) - Models give Support/Oppose/Neutral verdict @@ -65,13 +77,23 @@ orch consensus "PROMPT" MODEL1 MODEL2 [MODEL3...] - `--mode critique` - Find flaws and weaknesses - `--mode open` - Freeform responses, no structured output - `--temperature 0.1` - Lower = more focused (default 0.1) -- `--file PATH` - Include file as context -- `--enhance` - Use AI to improve prompt before querying +- `--file PATH` - Include file as context (can use multiple times) +- `--websearch` - Enable web search (Gemini models only) +- `--serial` - Run models in sequence instead of parallel +- `--strategy` - Serial strategy: neutral (default), refine, debate, brainstorm +- `--synthesize MODEL` - Aggregate all responses into summary using MODEL +- `--allow-expensive` - Allow expensive/slow models (opus, r1) +- `--timeout SECS` - Timeout per model (default 300) **Stances** (devil's advocate): Append `:for`, `:against`, or `:neutral` to bias a model's perspective: ```bash -orch consensus "Should we rewrite in Rust?" gpt5:for deepseek:against gemini:neutral +orch consensus "Should we rewrite in Rust?" gpt:for claude:against gemini:neutral +``` + +**Stdin piping**: +```bash +cat code.py | orch consensus "Is this implementation correct?" flash gemini ``` ### orch chat @@ -81,6 +103,31 @@ Single-model conversation (when you don't need consensus): orch chat "MESSAGE" --model gemini ``` +Options: +- `--model MODEL` - Model to use (default: gemini) +- `--session ID` - Continue a session +- `--file PATH` - Attach file +- `--websearch` / `--no-websearch` - Toggle search (default: on) +- `--allow-expensive` - Allow expensive models + +### orch models + +List and inspect available models: +```bash +orch models # List all models with status +orch models resolve # Show details for specific alias +``` + +### orch sessions + +Manage conversation sessions: +```bash +orch sessions list # List all sessions +orch sessions show # Show session details +orch sessions clean 7d # Delete sessions older than 7 days +orch sessions export # Export session as JSON +``` + ## Usage Patterns ### Quick Second Opinion @@ -92,7 +139,7 @@ orch consensus "I think we should use SQLite for this because [reasons]. Is this ### Architecture Decision When facing a tradeoff: ```bash -orch consensus "Microservices vs monolith for a 3-person team building an e-commerce site?" flash gemini deepseek --mode vote +orch consensus "Microservices vs monolith for a 3-person team building an e-commerce site?" flash gemini gpt --mode vote ``` ### Code Review @@ -104,7 +151,7 @@ orch consensus "Is this error handling approach correct and complete?" flash gem ### Devil's Advocate Get opposing viewpoints deliberately: ```bash -orch consensus "Should we adopt Kubernetes?" gpt5:for deepseek:against flash:neutral +orch consensus "Should we adopt Kubernetes?" gpt:for claude:against flash:neutral ``` ### Brainstorm @@ -119,6 +166,18 @@ Find weaknesses before presenting: orch consensus "What are the flaws in this API design?" flash gemini --file api-spec.yaml --mode critique ``` +### Synthesize Responses +Get a unified summary from multiple perspectives: +```bash +orch consensus "Evaluate this architecture" flash gemini gpt --synthesize gemini +``` + +### Use Reasoning Models +For complex analysis requiring deep thinking: +```bash +orch consensus "Analyze the security implications" r1 gemini --allow-expensive +``` + ## Output Format Vote mode returns structured verdicts: @@ -128,13 +187,13 @@ Vote mode returns structured verdicts: │ SUPPORT: 2 OPPOSE: 1 NEUTRAL: 0 │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ -[flash] gemini-2.5-flash - SUPPORT +[flash] gemini-3-flash-preview - SUPPORT Reasoning: ... [gemini] gemini-3-pro-preview - SUPPORT Reasoning: ... -[deepseek] deepseek-v3 - OPPOSE +[claude] claude-sonnet-4.5 - OPPOSE Reasoning: ... ``` @@ -142,15 +201,12 @@ Reasoning: ... 1. **Use for genuine uncertainty** - Don't use orch for trivial decisions or to avoid thinking 2. **Provide context** - Better prompts get better consensus; use `--file` when relevant -3. **Choose models wisely** - flash/qwen for quick checks, r1/gpt for complex reasoning +3. **Choose models wisely** - flash/qwen-fast for quick checks, r1/opus for complex reasoning 4. **Consider stances** - Devil's advocate is powerful for stress-testing ideas 5. **Parse the reasoning** - The verdict matters less than understanding the reasoning +6. **Mind the cost** - opus and r1 require `--allow-expensive`; use cheaper models for iteration ## Requirements - `orch` CLI installed (via home-manager or system packages) -- API keys configured (OPENROUTER_KEY, GOOGLE_API_KEY, OPENAI_API_KEY) - -## Examples - -See `examples/` directory for sample outputs from different consensus modes. +- API keys configured: GEMINI_API_KEY, OPENAI_API_KEY, OPENROUTER_KEY