skills/docs/adr/001-skills-molecules-integration.md
dan c7c6bbf796 docs: park ADR-001 skills-molecules integration
Current simpler approach working well:
- Skills as standalone entrypoints
- Agent judgment sufficient for invocation
- Molecules not actively used

Revisit when complex orchestration is needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 23:27:41 -05:00

6.3 KiB

ADR-001: Skills and Molecules Integration

Status

Parked (2025-12-28)

Rationale: Current simpler approach is working well:

  • Skills as standalone entrypoints (not molecule steps)
  • Agent judgment from description/SKILL.md sufficient for invocation
  • Molecules/protos not actively used for workflow orchestration

Revisit when:

  • Complex multi-agent orchestration becomes needed
  • Steve Yegge's orchestration work provides new patterns
  • Programmatic skill invocation has clear use cases

Context

We have two complementary systems for agent-assisted work:

  1. Skills (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.

  2. Molecules (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.

These systems evolved independently but have natural integration points. The question is: how should they connect?

Current State

Skills system:

  • Skills are directories under ~/.claude/skills/ (deployed via Nix)
  • Each skill has a SKILL.md with frontmatter + prompt/instructions
  • Skills are invoked by agents via /skill-name or automatically based on triggers
  • No execution tracking beyond what the agent logs

Molecules system (beads 0.35):

  • Proto: Template epic with template label, uses {{var}} placeholders
  • Mol: Instantiated work from a proto (permanent, git-synced)
  • Wisp: Ephemeral mol for operational work (gitignored, .beads-wisp/)
  • Hook: Agent's attachment point for assigned work
  • Pin: Assign mol to agent's hook

Key molecule commands:

bd mol spawn <proto>     # Create mol from proto
bd pour <proto>          # Spawn persistent mol
bd wisp create <proto>   # Spawn ephemeral mol
bd pin <mol> --for me    # Assign to self
bd mol squash <id>       # Compress mol → digest
bd mol distill <epic>    # Extract proto from ad-hoc epic

Problem Statement

  1. Skills have no execution history - we can't replay, debug, or learn from past runs
  2. Molecules track work but don't know which skills were used to complete them
  3. Successful ad-hoc work patterns can't be easily promoted to reusable skills
  4. No connection between "what was done" (mol) and "how it was done" (skill)

Decision

Link skills and molecules via three mechanisms:

1. Skill References in Molecules

Add a skill: field to molecule nodes that references skills used during execution:

# In a proto template
- title: "Generate worklog for {{session}}"
  skill: worklog
  description: "Document the work session"

When an agent works on a mol step that has a skill: reference, it knows which skill to invoke.

2. Wisp Execution Traces

Use wisps to capture skill execution traces. When a skill runs within a molecule context:

# Wisp execution trace format
skill_ref: worklog
skill_version: "abc123"  # git SHA of skill
inputs:
  context: "session context..."
  env:
    PROJECT: "skills"
tool_calls:
  - cmd: "extract-metrics.sh"
    args: ["--session", "2025-12-23"]
    exit_code: 0
    duration_ms: 1234
checkpoints:
  - step: "metrics_extracted"
    summary: "Found 5 commits, 12 file changes"
    timestamp: "2025-12-23T19:30:00Z"
outputs:
  files_created:
    - "docs/worklogs/2025-12-23-session.org"

This enables:

  • Replay: Re-run a skill with the same inputs
  • Diff: Compare two executions of the same skill
  • Debug: Understand what happened when something fails
  • Regression testing: Detect when skill behavior changes

3. Elevation Pipeline

When a molecule completes successfully, offer to "elevate" it to a skill:

bd mol squash <mol-id>     # Compress execution history
bd elevate <mol-id>        # Analyze and generate skill draft

The elevation pipeline:

  1. Analyze squashed trace for generalizable patterns
  2. Extract variable inputs (things that changed between runs)
  3. Generate SKILL.md draft with:
    • Frontmatter from mol metadata
    • Steps derived from trace checkpoints
    • Scripts extracted from tool_calls
  4. Human approval gate before deployment

Phase Transitions (Chemistry Metaphor)

Proto (solid)     →  pour   →  Mol (liquid)     →  squash  →  Digest (solid)
                                    ↓
Wisp (vapor)      ←  create ←  Proto
                       ↓
                   execute  →  Trace
                       ↓
                   elevate  →  Skill draft
  • Solid: Static templates (protos, digests, skills)
  • Liquid: Active work being tracked (mols)
  • Vapor: Ephemeral execution (wisps, traces)

Consequences

Positive

  • Traceability: Know exactly how work was completed
  • Reusability: Successful patterns become skills automatically
  • Debugging: Execution traces make failures understandable
  • Learning: System improves as more work is tracked

Negative

  • Overhead: Capturing traces adds complexity
  • Storage: Wisp traces need cleanup strategy
  • Coupling: Skills and beads become interdependent

Neutral

  • Skills remain usable without molecules (standalone invocation)
  • Molecules remain usable without skills (manual work)
  • Integration is opt-in per-proto via skill: field

Implementation Plan

  1. Phase 1 (this ADR): Document the design
  2. Phase 2: Define wisp execution trace format (skills-jeb)
  3. Phase 3: Prototype elevation pipeline (skills-3em)
  4. Phase 4: Test on worklog skill (skills-rex)

Anti-Patterns to Avoid

  1. Over-instrumentation: Don't trace every shell command. Focus on meaningful checkpoints.
  2. Forced coupling: Don't require molecules to use skills or vice versa.
  3. Premature elevation: Don't auto-generate skills from single executions. Wait for patterns.
  4. Trace bloat: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.

Open Questions

  1. How granular should skill_version be? Git SHA? Flake hash? Both?
  2. Should traces capture stdout/stderr or just exit codes?
  3. What's the minimum number of similar executions before suggesting elevation?
  4. How do we handle skills that span multiple mol steps?

References

  • beads 0.35 molecule commands: bd mol --help, bd wisp --help, bd pour --help
  • Skills repo: ~/proj/skills/
  • Existing skills: worklog, orch, niri-window-capture, spec-review, etc.