orch skill: sync with CLI v0.1.0

- Update model aliases (gpt-5.2, claude-opus-4.5, etc.)
- Add new models: deepseek, r1, qwen, glm, sonar
- Document --synthesize, --websearch, --serial flags
- Document stdin piping, orch models, orch sessions
- Add --allow-expensive usage guidance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
dan 2025-12-24 01:30:03 -05:00
parent c1f644e6a6
commit f8db8771ea
2 changed files with 81 additions and 24 deletions

View file

@ -1,3 +1,4 @@
{"id":"skills-0nl","title":"Update orch skill to match CLI v0.1.0","description":"The orch skill (in ~/.claude/skills/orch/) is out of sync with the orch CLI.\n\nNeeded updates:\n- Fix model aliases: gpt-5 → gpt-5.2, claude-opus-4.1 → claude-opus-4.5\n- Add new aliases: deepseek, r1, qwen, qwen-fast, glm, sonar, sonar-pro\n- Document --synthesize flag for response aggregation\n- Document stdin piping support\n- Document orch models command\n- Document orch sessions command\n- Add --websearch, --serial, --allow-expensive options\n\nReference: ~/proj/orch/README.md and src/orch/models_registry.py","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-23T21:11:46.294285184-05:00","updated_at":"2025-12-24T01:29:54.408882125-05:00","closed_at":"2025-12-24T01:29:54.408882125-05:00","close_reason":"Updated skill with current CLI features and model aliases"}
{"id":"skills-0og","title":"spec-review: Define output capture and audit trail","description":"Reviews happen in terminal then disappear. No audit trail, no diffable history.\n\nAdd:\n- Guidance to tee output to review file (e.g., specs/{branch}/review.md)\n- Standard location for gate check results\n- Template for recording decisions and rationale","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-15T00:23:23.705164812-08:00","updated_at":"2025-12-15T13:02:32.313084337-08:00","closed_at":"2025-12-15T13:02:32.313084337-08:00"}
{"id":"skills-1ig","title":"Brainstorm agent-friendly doc conventions","description":"# Agent-Friendly Doc Conventions - Hybrid Architecture\n\n## FINAL ARCHITECTURE: Vale + LLM Hybrid\n\n### Insight\n\u003e \"Good old deterministic testing (dumb robots) is the best way to keep in check LLMs (smart robots) at volume.\"\n\n### Split by Tool\n\n| Category | Rubrics | Tool |\n|----------|---------|------|\n| Vale-only | Format Integrity, Deterministic Instructions, Terminology Strictness, Token Efficiency | Fast, deterministic, CI-friendly |\n| Vale + LLM | Semantic Headings, Configuration Precision, Security Boundaries | Vale flags, LLM suggests fixes |\n| LLM-only | Contextual Independence, Code Executability, Execution Verification | Semantic understanding required |\n\n### Pipeline\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│ Stage 1: Vale (deterministic, fast, free) │\n│ - Runs in CI on every commit │\n│ - Catches 40% of issues instantly │\n│ - No LLM cost for clean docs │\n└─────────────────────┬───────────────────────────────────────┘\n │ only if Vale passes\n ▼\n┌─────────────────────────────────────────────────────────────┐\n│ Stage 2: LLM Triage (cheap model) │\n│ - Evaluates 3 semantic rubrics │\n│ - Identifies which need patches │\n└─────────────────────┬───────────────────────────────────────┘\n │ only if issues found\n ▼\n┌─────────────────────────────────────────────────────────────┐\n│ Stage 3: LLM Specialists (capable model) │\n│ - One agent per failed rubric │\n│ - Generates patches │\n└─────────────────────────────────────────────────────────────┘\n```\n\n### Why This Works\n- Vale is battle-tested, fast, CI-native\n- LLM only fires when needed (adaptive cost)\n- Deterministic rules catch predictable issues\n- LLM handles semantic/contextual issues\n\n---\n\n## Vale Rules Needed\n\n### Format Integrity\n- Existence: code blocks without language tags\n- Regex for unclosed fences\n\n### Deterministic Instructions \n- Existence: hedging words (\"might\", \"may want to\", \"consider\", \"you could\")\n\n### Terminology Strictness\n- Consistency: flag term variations\n\n### Token Efficiency\n- Existence: filler phrases (\"In this section we will...\", \"As you may know...\")\n\n### Semantic Headings (partial)\n- Existence: banned headings (\"Overview\", \"Introduction\", \"Getting Started\")\n\n### Configuration Precision (partial)\n- Existence: vague versions (\"Python 3.x\", \"recent version\")\n\n### Security Boundaries (partial)\n- Existence: hardcoded API key patterns\n\n---\n\n## NEXT STEPS\n\n1. Create Vale style for doc-review rubrics\n2. Test Vale on sample docs\n3. Design LLM prompts for semantic rubrics only\n4. Wire into orch or standalone","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-04T14:02:04.898026177-08:00","updated_at":"2025-12-04T16:43:53.0608948-08:00","closed_at":"2025-12-04T16:43:53.0608948-08:00"}
{"id":"skills-1n3","title":"Set up agent skills for Gemini CLI","description":"The AI agent skills (worklog, web-search, etc.) configured in .skills are not currently working when using the Gemini CLI. \\n\\nObserved behavior:\\n- 'worklog' command not found even after 'direnv reload'.\\n- .envrc sources ~/proj/skills/bin/use-skills.sh, but skills are not accessible in the Gemini agent session.\\n\\nNeed to:\\n1. Investigate how Gemini CLI loads its environment compared to Claude Code.\\n2. Update 'use-skills.sh' or direnv configuration to support Gemini CLI.\\n3. Ensure skill symlinks/binaries are correctly in the PATH for Gemini.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-22T17:39:28.106296919-05:00","updated_at":"2025-12-22T17:39:28.106296919-05:00"}

View file

@ -39,25 +39,37 @@ orch consensus "PROMPT" MODEL1 MODEL2 [MODEL3...]
```
**Model Aliases** (use these):
- `flash` → gemini-2.5-flash-preview (fast, cheap)
- `gemini` → gemini-3-pro-preview (strong reasoning)
- `qwen` → qwen3-8b (fast, cheap)
- `deepseek` → deepseek-v3 (balanced)
- `r1` → deepseek-r1 (strongest reasoning)
- `gpt` / `gpt5` → gpt-5.1 (strong reasoning)
- `gpt4` → gpt-4o (legacy)
| Alias | Model | Notes |
|-------|-------|-------|
| `flash` | gemini-3-flash-preview | Fast, free |
| `gemini` | gemini-3-pro-preview | Strong reasoning, free |
| `gpt` / `gpt5` | gpt-5.2 | Strong reasoning |
| `gpt4` | gpt-4o | Legacy |
| `claude` / `sonnet` | claude-sonnet-4.5 | Balanced (via OpenRouter) |
| `haiku` | claude-haiku-4.5 | Fast, cheap |
| `opus` | claude-opus-4.5 | Strongest, expensive |
| `deepseek` | deepseek-v3.2 | Good value |
| `r1` | deepseek-r1-0528 | Reasoning model, expensive |
| `qwen` | qwen3-235b-a22b | Good value |
| `qwen-fast` | qwen3-8b | Very fast/cheap |
| `glm` | glm-4.7 | Reasoning capable |
| `sonar` | perplexity/sonar | Web search built-in |
| `sonar-pro` | perplexity/sonar-pro | Better web search |
Use `orch models` to see all available models with pricing and status.
## Model Selection
**Quick sanity check**: Use `flash qwen` for fast, cheap validation. Good for "am I missing something obvious?" checks.
**Quick sanity check**: Use `flash qwen-fast` for fast, cheap validation. Good for "am I missing something obvious?" checks.
**Standard consensus**: Use `flash gemini deepseek` for balanced perspectives across providers. Default for most decisions.
**Deep analysis**: Include `r1` or `gpt` when stakes are high or reasoning is complex. These models think longer but cost more.
**Deep analysis**: Include `r1` or `gpt` when stakes are high or reasoning is complex. These models think longer but cost more. Use `--allow-expensive` for r1/opus.
**Diverse viewpoints**: Mix providers (Google + DeepSeek + OpenAI) rather than multiple models from one provider. Different training leads to genuinely different perspectives.
**Diverse viewpoints**: Mix providers (Google + DeepSeek + OpenAI + Anthropic) rather than multiple models from one provider. Different training leads to genuinely different perspectives.
**Cost-conscious**: `flash` and `qwen` are 10-20x cheaper than premium models. Start cheap, escalate if needed.
**Cost-conscious**: `flash` and `qwen-fast` are 10-100x cheaper than premium models. Start cheap, escalate if needed.
**Options**:
- `--mode vote` (default) - Models give Support/Oppose/Neutral verdict
@ -65,13 +77,23 @@ orch consensus "PROMPT" MODEL1 MODEL2 [MODEL3...]
- `--mode critique` - Find flaws and weaknesses
- `--mode open` - Freeform responses, no structured output
- `--temperature 0.1` - Lower = more focused (default 0.1)
- `--file PATH` - Include file as context
- `--enhance` - Use AI to improve prompt before querying
- `--file PATH` - Include file as context (can use multiple times)
- `--websearch` - Enable web search (Gemini models only)
- `--serial` - Run models in sequence instead of parallel
- `--strategy` - Serial strategy: neutral (default), refine, debate, brainstorm
- `--synthesize MODEL` - Aggregate all responses into summary using MODEL
- `--allow-expensive` - Allow expensive/slow models (opus, r1)
- `--timeout SECS` - Timeout per model (default 300)
**Stances** (devil's advocate):
Append `:for`, `:against`, or `:neutral` to bias a model's perspective:
```bash
orch consensus "Should we rewrite in Rust?" gpt5:for deepseek:against gemini:neutral
orch consensus "Should we rewrite in Rust?" gpt:for claude:against gemini:neutral
```
**Stdin piping**:
```bash
cat code.py | orch consensus "Is this implementation correct?" flash gemini
```
### orch chat
@ -81,6 +103,31 @@ Single-model conversation (when you don't need consensus):
orch chat "MESSAGE" --model gemini
```
Options:
- `--model MODEL` - Model to use (default: gemini)
- `--session ID` - Continue a session
- `--file PATH` - Attach file
- `--websearch` / `--no-websearch` - Toggle search (default: on)
- `--allow-expensive` - Allow expensive models
### orch models
List and inspect available models:
```bash
orch models # List all models with status
orch models resolve <alias> # Show details for specific alias
```
### orch sessions
Manage conversation sessions:
```bash
orch sessions list # List all sessions
orch sessions show <id> # Show session details
orch sessions clean 7d # Delete sessions older than 7 days
orch sessions export <id> # Export session as JSON
```
## Usage Patterns
### Quick Second Opinion
@ -92,7 +139,7 @@ orch consensus "I think we should use SQLite for this because [reasons]. Is this
### Architecture Decision
When facing a tradeoff:
```bash
orch consensus "Microservices vs monolith for a 3-person team building an e-commerce site?" flash gemini deepseek --mode vote
orch consensus "Microservices vs monolith for a 3-person team building an e-commerce site?" flash gemini gpt --mode vote
```
### Code Review
@ -104,7 +151,7 @@ orch consensus "Is this error handling approach correct and complete?" flash gem
### Devil's Advocate
Get opposing viewpoints deliberately:
```bash
orch consensus "Should we adopt Kubernetes?" gpt5:for deepseek:against flash:neutral
orch consensus "Should we adopt Kubernetes?" gpt:for claude:against flash:neutral
```
### Brainstorm
@ -119,6 +166,18 @@ Find weaknesses before presenting:
orch consensus "What are the flaws in this API design?" flash gemini --file api-spec.yaml --mode critique
```
### Synthesize Responses
Get a unified summary from multiple perspectives:
```bash
orch consensus "Evaluate this architecture" flash gemini gpt --synthesize gemini
```
### Use Reasoning Models
For complex analysis requiring deep thinking:
```bash
orch consensus "Analyze the security implications" r1 gemini --allow-expensive
```
## Output Format
Vote mode returns structured verdicts:
@ -128,13 +187,13 @@ Vote mode returns structured verdicts:
│ SUPPORT: 2 OPPOSE: 1 NEUTRAL: 0 │
└─────────────────────────────────────────────────────────────┘
[flash] gemini-2.5-flash - SUPPORT
[flash] gemini-3-flash-preview - SUPPORT
Reasoning: ...
[gemini] gemini-3-pro-preview - SUPPORT
Reasoning: ...
[deepseek] deepseek-v3 - OPPOSE
[claude] claude-sonnet-4.5 - OPPOSE
Reasoning: ...
```
@ -142,15 +201,12 @@ Reasoning: ...
1. **Use for genuine uncertainty** - Don't use orch for trivial decisions or to avoid thinking
2. **Provide context** - Better prompts get better consensus; use `--file` when relevant
3. **Choose models wisely** - flash/qwen for quick checks, r1/gpt for complex reasoning
3. **Choose models wisely** - flash/qwen-fast for quick checks, r1/opus for complex reasoning
4. **Consider stances** - Devil's advocate is powerful for stress-testing ideas
5. **Parse the reasoning** - The verdict matters less than understanding the reasoning
6. **Mind the cost** - opus and r1 require `--allow-expensive`; use cheaper models for iteration
## Requirements
- `orch` CLI installed (via home-manager or system packages)
- API keys configured (OPENROUTER_KEY, GOOGLE_API_KEY, OPENAI_API_KEY)
## Examples
See `examples/` directory for sample outputs from different consensus modes.
- API keys configured: GEMINI_API_KEY, OPENAI_API_KEY, OPENROUTER_KEY