skills/docs/adr/001-skills-molecules-integration.md
dan c7c6bbf796 docs: park ADR-001 skills-molecules integration
Current simpler approach working well:
- Skills as standalone entrypoints
- Agent judgment sufficient for invocation
- Molecules not actively used

Revisit when complex orchestration is needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:27:41 -05:00

189 lines
6.3 KiB
Markdown

# ADR-001: Skills and Molecules Integration
## Status
Parked (2025-12-28)
**Rationale:** Current simpler approach is working well:
- Skills as standalone entrypoints (not molecule steps)
- Agent judgment from description/SKILL.md sufficient for invocation
- Molecules/protos not actively used for workflow orchestration
Revisit when:
- Complex multi-agent orchestration becomes needed
- Steve Yegge's orchestration work provides new patterns
- Programmatic skill invocation has clear use cases
## Context
We have two complementary systems for agent-assisted work:
1. **Skills** (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.
2. **Molecules** (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.
These systems evolved independently but have natural integration points. The question is: how should they connect?
### Current State
**Skills system:**
- Skills are directories under `~/.claude/skills/` (deployed via Nix)
- Each skill has a `SKILL.md` with frontmatter + prompt/instructions
- Skills are invoked by agents via `/skill-name` or automatically based on triggers
- No execution tracking beyond what the agent logs
**Molecules system (beads 0.35):**
- **Proto**: Template epic with `template` label, uses `{{var}}` placeholders
- **Mol**: Instantiated work from a proto (permanent, git-synced)
- **Wisp**: Ephemeral mol for operational work (gitignored, `.beads-wisp/`)
- **Hook**: Agent's attachment point for assigned work
- **Pin**: Assign mol to agent's hook
Key molecule commands:
```
bd mol spawn <proto> # Create mol from proto
bd pour <proto> # Spawn persistent mol
bd wisp create <proto> # Spawn ephemeral mol
bd pin <mol> --for me # Assign to self
bd mol squash <id> # Compress mol → digest
bd mol distill <epic> # Extract proto from ad-hoc epic
```
### Problem Statement
1. Skills have no execution history - we can't replay, debug, or learn from past runs
2. Molecules track work but don't know which skills were used to complete them
3. Successful ad-hoc work patterns can't be easily promoted to reusable skills
4. No connection between "what was done" (mol) and "how it was done" (skill)
## Decision
Link skills and molecules via three mechanisms:
### 1. Skill References in Molecules
Add a `skill:` field to molecule nodes that references skills used during execution:
```yaml
# In a proto template
- title: "Generate worklog for {{session}}"
skill: worklog
description: "Document the work session"
```
When an agent works on a mol step that has a `skill:` reference, it knows which skill to invoke.
### 2. Wisp Execution Traces
Use wisps to capture skill execution traces. When a skill runs within a molecule context:
```yaml
# Wisp execution trace format
skill_ref: worklog
skill_version: "abc123" # git SHA of skill
inputs:
context: "session context..."
env:
PROJECT: "skills"
tool_calls:
- cmd: "extract-metrics.sh"
args: ["--session", "2025-12-23"]
exit_code: 0
duration_ms: 1234
checkpoints:
- step: "metrics_extracted"
summary: "Found 5 commits, 12 file changes"
timestamp: "2025-12-23T19:30:00Z"
outputs:
files_created:
- "docs/worklogs/2025-12-23-session.org"
```
This enables:
- Replay: Re-run a skill with the same inputs
- Diff: Compare two executions of the same skill
- Debug: Understand what happened when something fails
- Regression testing: Detect when skill behavior changes
### 3. Elevation Pipeline
When a molecule completes successfully, offer to "elevate" it to a skill:
```
bd mol squash <mol-id> # Compress execution history
bd elevate <mol-id> # Analyze and generate skill draft
```
The elevation pipeline:
1. Analyze squashed trace for generalizable patterns
2. Extract variable inputs (things that changed between runs)
3. Generate SKILL.md draft with:
- Frontmatter from mol metadata
- Steps derived from trace checkpoints
- Scripts extracted from tool_calls
4. Human approval gate before deployment
### Phase Transitions (Chemistry Metaphor)
```
Proto (solid) → pour → Mol (liquid) → squash → Digest (solid)
Wisp (vapor) ← create ← Proto
execute → Trace
elevate → Skill draft
```
- **Solid**: Static templates (protos, digests, skills)
- **Liquid**: Active work being tracked (mols)
- **Vapor**: Ephemeral execution (wisps, traces)
## Consequences
### Positive
- **Traceability**: Know exactly how work was completed
- **Reusability**: Successful patterns become skills automatically
- **Debugging**: Execution traces make failures understandable
- **Learning**: System improves as more work is tracked
### Negative
- **Overhead**: Capturing traces adds complexity
- **Storage**: Wisp traces need cleanup strategy
- **Coupling**: Skills and beads become interdependent
### Neutral
- Skills remain usable without molecules (standalone invocation)
- Molecules remain usable without skills (manual work)
- Integration is opt-in per-proto via `skill:` field
## Implementation Plan
1. **Phase 1** (this ADR): Document the design
2. **Phase 2**: Define wisp execution trace format (skills-jeb)
3. **Phase 3**: Prototype elevation pipeline (skills-3em)
4. **Phase 4**: Test on worklog skill (skills-rex)
## Anti-Patterns to Avoid
1. **Over-instrumentation**: Don't trace every shell command. Focus on meaningful checkpoints.
2. **Forced coupling**: Don't require molecules to use skills or vice versa.
3. **Premature elevation**: Don't auto-generate skills from single executions. Wait for patterns.
4. **Trace bloat**: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.
## Open Questions
1. How granular should skill_version be? Git SHA? Flake hash? Both?
2. Should traces capture stdout/stderr or just exit codes?
3. What's the minimum number of similar executions before suggesting elevation?
4. How do we handle skills that span multiple mol steps?
## References
- beads 0.35 molecule commands: `bd mol --help`, `bd wisp --help`, `bd pour --help`
- Skills repo: `~/proj/skills/`
- Existing skills: worklog, orch, niri-window-capture, spec-review, etc.