Current simpler approach working well: - Skills as standalone entrypoints - Agent judgment sufficient for invocation - Molecules not actively used Revisit when complex orchestration is needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
189 lines
6.3 KiB
Markdown
189 lines
6.3 KiB
Markdown
# ADR-001: Skills and Molecules Integration
|
|
|
|
## Status
|
|
|
|
Parked (2025-12-28)
|
|
|
|
**Rationale:** Current simpler approach is working well:
|
|
- Skills as standalone entrypoints (not molecule steps)
|
|
- Agent judgment from description/SKILL.md sufficient for invocation
|
|
- Molecules/protos not actively used for workflow orchestration
|
|
|
|
Revisit when:
|
|
- Complex multi-agent orchestration becomes needed
|
|
- Steve Yegge's orchestration work provides new patterns
|
|
- Programmatic skill invocation has clear use cases
|
|
|
|
## Context
|
|
|
|
We have two complementary systems for agent-assisted work:
|
|
|
|
1. **Skills** (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.
|
|
|
|
2. **Molecules** (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.
|
|
|
|
These systems evolved independently but have natural integration points. The question is: how should they connect?
|
|
|
|
### Current State
|
|
|
|
**Skills system:**
|
|
- Skills are directories under `~/.claude/skills/` (deployed via Nix)
|
|
- Each skill has a `SKILL.md` with frontmatter + prompt/instructions
|
|
- Skills are invoked by agents via `/skill-name` or automatically based on triggers
|
|
- No execution tracking beyond what the agent logs
|
|
|
|
**Molecules system (beads 0.35):**
|
|
- **Proto**: Template epic with `template` label, uses `{{var}}` placeholders
|
|
- **Mol**: Instantiated work from a proto (permanent, git-synced)
|
|
- **Wisp**: Ephemeral mol for operational work (gitignored, `.beads-wisp/`)
|
|
- **Hook**: Agent's attachment point for assigned work
|
|
- **Pin**: Assign mol to agent's hook
|
|
|
|
Key molecule commands:
|
|
```
|
|
bd mol spawn <proto> # Create mol from proto
|
|
bd pour <proto> # Spawn persistent mol
|
|
bd wisp create <proto> # Spawn ephemeral mol
|
|
bd pin <mol> --for me # Assign to self
|
|
bd mol squash <id> # Compress mol → digest
|
|
bd mol distill <epic> # Extract proto from ad-hoc epic
|
|
```
|
|
|
|
### Problem Statement
|
|
|
|
1. Skills have no execution history - we can't replay, debug, or learn from past runs
|
|
2. Molecules track work but don't know which skills were used to complete them
|
|
3. Successful ad-hoc work patterns can't be easily promoted to reusable skills
|
|
4. No connection between "what was done" (mol) and "how it was done" (skill)
|
|
|
|
## Decision
|
|
|
|
Link skills and molecules via three mechanisms:
|
|
|
|
### 1. Skill References in Molecules
|
|
|
|
Add a `skill:` field to molecule nodes that references skills used during execution:
|
|
|
|
```yaml
|
|
# In a proto template
|
|
- title: "Generate worklog for {{session}}"
|
|
skill: worklog
|
|
description: "Document the work session"
|
|
```
|
|
|
|
When an agent works on a mol step that has a `skill:` reference, it knows which skill to invoke.
|
|
|
|
### 2. Wisp Execution Traces
|
|
|
|
Use wisps to capture skill execution traces. When a skill runs within a molecule context:
|
|
|
|
```yaml
|
|
# Wisp execution trace format
|
|
skill_ref: worklog
|
|
skill_version: "abc123" # git SHA of skill
|
|
inputs:
|
|
context: "session context..."
|
|
env:
|
|
PROJECT: "skills"
|
|
tool_calls:
|
|
- cmd: "extract-metrics.sh"
|
|
args: ["--session", "2025-12-23"]
|
|
exit_code: 0
|
|
duration_ms: 1234
|
|
checkpoints:
|
|
- step: "metrics_extracted"
|
|
summary: "Found 5 commits, 12 file changes"
|
|
timestamp: "2025-12-23T19:30:00Z"
|
|
outputs:
|
|
files_created:
|
|
- "docs/worklogs/2025-12-23-session.org"
|
|
```
|
|
|
|
This enables:
|
|
- Replay: Re-run a skill with the same inputs
|
|
- Diff: Compare two executions of the same skill
|
|
- Debug: Understand what happened when something fails
|
|
- Regression testing: Detect when skill behavior changes
|
|
|
|
### 3. Elevation Pipeline
|
|
|
|
When a molecule completes successfully, offer to "elevate" it to a skill:
|
|
|
|
```
|
|
bd mol squash <mol-id> # Compress execution history
|
|
bd elevate <mol-id> # Analyze and generate skill draft
|
|
```
|
|
|
|
The elevation pipeline:
|
|
1. Analyze squashed trace for generalizable patterns
|
|
2. Extract variable inputs (things that changed between runs)
|
|
3. Generate SKILL.md draft with:
|
|
- Frontmatter from mol metadata
|
|
- Steps derived from trace checkpoints
|
|
- Scripts extracted from tool_calls
|
|
4. Human approval gate before deployment
|
|
|
|
### Phase Transitions (Chemistry Metaphor)
|
|
|
|
```
|
|
Proto (solid) → pour → Mol (liquid) → squash → Digest (solid)
|
|
↓
|
|
Wisp (vapor) ← create ← Proto
|
|
↓
|
|
execute → Trace
|
|
↓
|
|
elevate → Skill draft
|
|
```
|
|
|
|
- **Solid**: Static templates (protos, digests, skills)
|
|
- **Liquid**: Active work being tracked (mols)
|
|
- **Vapor**: Ephemeral execution (wisps, traces)
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Traceability**: Know exactly how work was completed
|
|
- **Reusability**: Successful patterns become skills automatically
|
|
- **Debugging**: Execution traces make failures understandable
|
|
- **Learning**: System improves as more work is tracked
|
|
|
|
### Negative
|
|
|
|
- **Overhead**: Capturing traces adds complexity
|
|
- **Storage**: Wisp traces need cleanup strategy
|
|
- **Coupling**: Skills and beads become interdependent
|
|
|
|
### Neutral
|
|
|
|
- Skills remain usable without molecules (standalone invocation)
|
|
- Molecules remain usable without skills (manual work)
|
|
- Integration is opt-in per-proto via `skill:` field
|
|
|
|
## Implementation Plan
|
|
|
|
1. **Phase 1** (this ADR): Document the design
|
|
2. **Phase 2**: Define wisp execution trace format (skills-jeb)
|
|
3. **Phase 3**: Prototype elevation pipeline (skills-3em)
|
|
4. **Phase 4**: Test on worklog skill (skills-rex)
|
|
|
|
## Anti-Patterns to Avoid
|
|
|
|
1. **Over-instrumentation**: Don't trace every shell command. Focus on meaningful checkpoints.
|
|
2. **Forced coupling**: Don't require molecules to use skills or vice versa.
|
|
3. **Premature elevation**: Don't auto-generate skills from single executions. Wait for patterns.
|
|
4. **Trace bloat**: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.
|
|
|
|
## Open Questions
|
|
|
|
1. How granular should skill_version be? Git SHA? Flake hash? Both?
|
|
2. Should traces capture stdout/stderr or just exit codes?
|
|
3. What's the minimum number of similar executions before suggesting elevation?
|
|
4. How do we handle skills that span multiple mol steps?
|
|
|
|
## References
|
|
|
|
- beads 0.35 molecule commands: `bd mol --help`, `bd wisp --help`, `bd pour --help`
|
|
- Skills repo: `~/proj/skills/`
|
|
- Existing skills: worklog, orch, niri-window-capture, spec-review, etc.
|