skills/docs/adr/001-skills-molecules-integration.md

# ADR-001: Skills and Molecules Integration

## Status

Parked (2025-12-28)

**Rationale:** Current simpler approach is working well:
- Skills as standalone entrypoints (not molecule steps)
- Agent judgment from description/SKILL.md sufficient for invocation
- Molecules/protos not actively used for workflow orchestration

Revisit when:
- Complex multi-agent orchestration becomes needed
- Steve Yegge's orchestration work provides new patterns
- Programmatic skill invocation has clear use cases

## Context

We have two complementary systems for agent-assisted work:

1. **Skills** (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.

2. **Molecules** (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.

These systems evolved independently but have natural integration points. The question is: how should they connect?

### Current State

**Skills system:**
- Skills are directories under `~/.claude/skills/` (deployed via Nix)
- Each skill has a `SKILL.md` with frontmatter + prompt/instructions
- Skills are invoked by agents via `/skill-name` or automatically based on triggers
- No execution tracking beyond what the agent logs

**Molecules system (beads 0.35):**
- **Proto**: Template epic with `template` label, uses `{{var}}` placeholders
- **Mol**: Instantiated work from a proto (permanent, git-synced)
- **Wisp**: Ephemeral mol for operational work (gitignored, `.beads-wisp/`)
- **Hook**: Agent's attachment point for assigned work
- **Pin**: Assign mol to agent's hook

Key molecule commands:
```
bd mol spawn <proto>     # Create mol from proto
bd pour <proto>          # Spawn persistent mol
bd wisp create <proto>   # Spawn ephemeral mol
bd pin <mol> --for me    # Assign to self
bd mol squash <id>       # Compress mol → digest
bd mol distill <epic>    # Extract proto from ad-hoc epic
```

### Problem Statement

1. Skills have no execution history - we can't replay, debug, or learn from past runs
2. Molecules track work but don't know which skills were used to complete them
3. Successful ad-hoc work patterns can't be easily promoted to reusable skills
4. No connection between "what was done" (mol) and "how it was done" (skill)

## Decision

Link skills and molecules via three mechanisms:

### 1. Skill References in Molecules

Add a `skill:` field to molecule nodes that references skills used during execution:

```yaml
# In a proto template
- title: "Generate worklog for {{session}}"
  skill: worklog
  description: "Document the work session"
```

When an agent works on a mol step that has a `skill:` reference, it knows which skill to invoke.

### 2. Wisp Execution Traces

Use wisps to capture skill execution traces. When a skill runs within a molecule context:

```yaml
# Wisp execution trace format
skill_ref: worklog
skill_version: "abc123"  # git SHA of skill
inputs:
  context: "session context..."
  env:
    PROJECT: "skills"
tool_calls:
  - cmd: "extract-metrics.sh"
    args: ["--session", "2025-12-23"]
    exit_code: 0
    duration_ms: 1234
checkpoints:
  - step: "metrics_extracted"
    summary: "Found 5 commits, 12 file changes"
    timestamp: "2025-12-23T19:30:00Z"
outputs:
  files_created:
    - "docs/worklogs/2025-12-23-session.org"
```

This enables:
- Replay: Re-run a skill with the same inputs
- Diff: Compare two executions of the same skill
- Debug: Understand what happened when something fails
- Regression testing: Detect when skill behavior changes

### 3. Elevation Pipeline

When a molecule completes successfully, offer to "elevate" it to a skill:

```
bd mol squash <mol-id>     # Compress execution history
bd elevate <mol-id>        # Analyze and generate skill draft
```

The elevation pipeline:
1. Analyze squashed trace for generalizable patterns
2. Extract variable inputs (things that changed between runs)
3. Generate SKILL.md draft with:
   - Frontmatter from mol metadata
   - Steps derived from trace checkpoints
   - Scripts extracted from tool_calls
4. Human approval gate before deployment

### Phase Transitions (Chemistry Metaphor)

```
Proto (solid)     →  pour   →  Mol (liquid)     →  squash  →  Digest (solid)
                                    ↓
Wisp (vapor)      ←  create ←  Proto
                       ↓
                   execute  →  Trace
                       ↓
                   elevate  →  Skill draft
```

- **Solid**: Static templates (protos, digests, skills)
- **Liquid**: Active work being tracked (mols)
- **Vapor**: Ephemeral execution (wisps, traces)

## Consequences

### Positive

- **Traceability**: Know exactly how work was completed
- **Reusability**: Successful patterns become skills automatically
- **Debugging**: Execution traces make failures understandable
- **Learning**: System improves as more work is tracked

### Negative

- **Overhead**: Capturing traces adds complexity
- **Storage**: Wisp traces need cleanup strategy
- **Coupling**: Skills and beads become interdependent

### Neutral

- Skills remain usable without molecules (standalone invocation)
- Molecules remain usable without skills (manual work)
- Integration is opt-in per-proto via `skill:` field

## Implementation Plan

1. **Phase 1** (this ADR): Document the design
2. **Phase 2**: Define wisp execution trace format (skills-jeb)
3. **Phase 3**: Prototype elevation pipeline (skills-3em)
4. **Phase 4**: Test on worklog skill (skills-rex)

## Anti-Patterns to Avoid

1. **Over-instrumentation**: Don't trace every shell command. Focus on meaningful checkpoints.
2. **Forced coupling**: Don't require molecules to use skills or vice versa.
3. **Premature elevation**: Don't auto-generate skills from single executions. Wait for patterns.
4. **Trace bloat**: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.

## Open Questions

1. How granular should skill_version be? Git SHA? Flake hash? Both?
2. Should traces capture stdout/stderr or just exit codes?
3. What's the minimum number of similar executions before suggesting elevation?
4. How do we handle skills that span multiple mol steps?

## References

- beads 0.35 molecule commands: `bd mol --help`, `bd wisp --help`, `bd pour --help`
- Skills repo: `~/proj/skills/`
- Existing skills: worklog, orch, niri-window-capture, spec-review, etc.