Current simpler approach working well: - Skills as standalone entrypoints - Agent judgment sufficient for invocation - Molecules not actively used Revisit when complex orchestration is needed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.3 KiB
ADR-001: Skills and Molecules Integration
Status
Parked (2025-12-28)
Rationale: Current simpler approach is working well:
- Skills as standalone entrypoints (not molecule steps)
- Agent judgment from description/SKILL.md sufficient for invocation
- Molecules/protos not actively used for workflow orchestration
Revisit when:
- Complex multi-agent orchestration becomes needed
- Steve Yegge's orchestration work provides new patterns
- Programmatic skill invocation has clear use cases
Context
We have two complementary systems for agent-assisted work:
-
Skills (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.
-
Molecules (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.
These systems evolved independently but have natural integration points. The question is: how should they connect?
Current State
Skills system:
- Skills are directories under
~/.claude/skills/(deployed via Nix) - Each skill has a
SKILL.mdwith frontmatter + prompt/instructions - Skills are invoked by agents via
/skill-nameor automatically based on triggers - No execution tracking beyond what the agent logs
Molecules system (beads 0.35):
- Proto: Template epic with
templatelabel, uses{{var}}placeholders - Mol: Instantiated work from a proto (permanent, git-synced)
- Wisp: Ephemeral mol for operational work (gitignored,
.beads-wisp/) - Hook: Agent's attachment point for assigned work
- Pin: Assign mol to agent's hook
Key molecule commands:
bd mol spawn <proto> # Create mol from proto
bd pour <proto> # Spawn persistent mol
bd wisp create <proto> # Spawn ephemeral mol
bd pin <mol> --for me # Assign to self
bd mol squash <id> # Compress mol → digest
bd mol distill <epic> # Extract proto from ad-hoc epic
Problem Statement
- Skills have no execution history - we can't replay, debug, or learn from past runs
- Molecules track work but don't know which skills were used to complete them
- Successful ad-hoc work patterns can't be easily promoted to reusable skills
- No connection between "what was done" (mol) and "how it was done" (skill)
Decision
Link skills and molecules via three mechanisms:
1. Skill References in Molecules
Add a skill: field to molecule nodes that references skills used during execution:
# In a proto template
- title: "Generate worklog for {{session}}"
skill: worklog
description: "Document the work session"
When an agent works on a mol step that has a skill: reference, it knows which skill to invoke.
2. Wisp Execution Traces
Use wisps to capture skill execution traces. When a skill runs within a molecule context:
# Wisp execution trace format
skill_ref: worklog
skill_version: "abc123" # git SHA of skill
inputs:
context: "session context..."
env:
PROJECT: "skills"
tool_calls:
- cmd: "extract-metrics.sh"
args: ["--session", "2025-12-23"]
exit_code: 0
duration_ms: 1234
checkpoints:
- step: "metrics_extracted"
summary: "Found 5 commits, 12 file changes"
timestamp: "2025-12-23T19:30:00Z"
outputs:
files_created:
- "docs/worklogs/2025-12-23-session.org"
This enables:
- Replay: Re-run a skill with the same inputs
- Diff: Compare two executions of the same skill
- Debug: Understand what happened when something fails
- Regression testing: Detect when skill behavior changes
3. Elevation Pipeline
When a molecule completes successfully, offer to "elevate" it to a skill:
bd mol squash <mol-id> # Compress execution history
bd elevate <mol-id> # Analyze and generate skill draft
The elevation pipeline:
- Analyze squashed trace for generalizable patterns
- Extract variable inputs (things that changed between runs)
- Generate SKILL.md draft with:
- Frontmatter from mol metadata
- Steps derived from trace checkpoints
- Scripts extracted from tool_calls
- Human approval gate before deployment
Phase Transitions (Chemistry Metaphor)
Proto (solid) → pour → Mol (liquid) → squash → Digest (solid)
↓
Wisp (vapor) ← create ← Proto
↓
execute → Trace
↓
elevate → Skill draft
- Solid: Static templates (protos, digests, skills)
- Liquid: Active work being tracked (mols)
- Vapor: Ephemeral execution (wisps, traces)
Consequences
Positive
- Traceability: Know exactly how work was completed
- Reusability: Successful patterns become skills automatically
- Debugging: Execution traces make failures understandable
- Learning: System improves as more work is tracked
Negative
- Overhead: Capturing traces adds complexity
- Storage: Wisp traces need cleanup strategy
- Coupling: Skills and beads become interdependent
Neutral
- Skills remain usable without molecules (standalone invocation)
- Molecules remain usable without skills (manual work)
- Integration is opt-in per-proto via
skill:field
Implementation Plan
- Phase 1 (this ADR): Document the design
- Phase 2: Define wisp execution trace format (skills-jeb)
- Phase 3: Prototype elevation pipeline (skills-3em)
- Phase 4: Test on worklog skill (skills-rex)
Anti-Patterns to Avoid
- Over-instrumentation: Don't trace every shell command. Focus on meaningful checkpoints.
- Forced coupling: Don't require molecules to use skills or vice versa.
- Premature elevation: Don't auto-generate skills from single executions. Wait for patterns.
- Trace bloat: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.
Open Questions
- How granular should skill_version be? Git SHA? Flake hash? Both?
- Should traces capture stdout/stderr or just exit codes?
- What's the minimum number of similar executions before suggesting elevation?
- How do we handle skills that span multiple mol steps?
References
- beads 0.35 molecule commands:
bd mol --help,bd wisp --help,bd pour --help - Skills repo:
~/proj/skills/ - Existing skills: worklog, orch, niri-window-capture, spec-review, etc.