dan/skills

dan c7c6bbf796 docs: park ADR-001 skills-molecules integration

Current simpler approach working well:
- Skills as standalone entrypoints
- Agent judgment sufficient for invocation
- Molecules not actively used

Revisit when complex orchestration is needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-28 23:27:41 -05:00

6.3 KiB

Raw Blame History

ADR-001: Skills and Molecules Integration

Status

Parked (2025-12-28)

Rationale: Current simpler approach is working well:

Skills as standalone entrypoints (not molecule steps)
Agent judgment from description/SKILL.md sufficient for invocation
Molecules/protos not actively used for workflow orchestration

Revisit when:

Complex multi-agent orchestration becomes needed
Steve Yegge's orchestration work provides new patterns
Programmatic skill invocation has clear use cases

Context

We have two complementary systems for agent-assisted work:

Skills (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.
Molecules (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.

These systems evolved independently but have natural integration points. The question is: how should they connect?

Current State

Skills system:

Skills are directories under ~/.claude/skills/ (deployed via Nix)
Each skill has a SKILL.md with frontmatter + prompt/instructions
Skills are invoked by agents via /skill-name or automatically based on triggers
No execution tracking beyond what the agent logs

Molecules system (beads 0.35):

Proto: Template epic with template label, uses {{var}} placeholders
Mol: Instantiated work from a proto (permanent, git-synced)
Wisp: Ephemeral mol for operational work (gitignored, .beads-wisp/)
Hook: Agent's attachment point for assigned work
Pin: Assign mol to agent's hook

Key molecule commands:

bd mol spawn <proto>     # Create mol from proto
bd pour <proto>          # Spawn persistent mol
bd wisp create <proto>   # Spawn ephemeral mol
bd pin <mol> --for me    # Assign to self
bd mol squash <id>       # Compress mol → digest
bd mol distill <epic>    # Extract proto from ad-hoc epic

Problem Statement

Skills have no execution history - we can't replay, debug, or learn from past runs
Molecules track work but don't know which skills were used to complete them
Successful ad-hoc work patterns can't be easily promoted to reusable skills
No connection between "what was done" (mol) and "how it was done" (skill)

Decision

Link skills and molecules via three mechanisms:

1. Skill References in Molecules

Add a skill: field to molecule nodes that references skills used during execution:

# In a proto template
- title: "Generate worklog for {{session}}"
  skill: worklog
  description: "Document the work session"

When an agent works on a mol step that has a skill: reference, it knows which skill to invoke.

2. Wisp Execution Traces

Use wisps to capture skill execution traces. When a skill runs within a molecule context:

# Wisp execution trace format
skill_ref: worklog
skill_version: "abc123"  # git SHA of skill
inputs:
  context: "session context..."
  env:
    PROJECT: "skills"
tool_calls:
  - cmd: "extract-metrics.sh"
    args: ["--session", "2025-12-23"]
    exit_code: 0
    duration_ms: 1234
checkpoints:
  - step: "metrics_extracted"
    summary: "Found 5 commits, 12 file changes"
    timestamp: "2025-12-23T19:30:00Z"
outputs:
  files_created:
    - "docs/worklogs/2025-12-23-session.org"

This enables:

Replay: Re-run a skill with the same inputs
Diff: Compare two executions of the same skill
Debug: Understand what happened when something fails
Regression testing: Detect when skill behavior changes

3. Elevation Pipeline

When a molecule completes successfully, offer to "elevate" it to a skill:

bd mol squash <mol-id>     # Compress execution history
bd elevate <mol-id>        # Analyze and generate skill draft

The elevation pipeline:

Analyze squashed trace for generalizable patterns
Extract variable inputs (things that changed between runs)
Generate SKILL.md draft with:
- Frontmatter from mol metadata
- Steps derived from trace checkpoints
- Scripts extracted from tool_calls
Human approval gate before deployment

Phase Transitions (Chemistry Metaphor)

Proto (solid)     →  pour   →  Mol (liquid)     →  squash  →  Digest (solid)
                                    ↓
Wisp (vapor)      ←  create ←  Proto
                       ↓
                   execute  →  Trace
                       ↓
                   elevate  →  Skill draft

Solid: Static templates (protos, digests, skills)
Liquid: Active work being tracked (mols)
Vapor: Ephemeral execution (wisps, traces)

Consequences

Positive

Traceability: Know exactly how work was completed
Reusability: Successful patterns become skills automatically
Debugging: Execution traces make failures understandable
Learning: System improves as more work is tracked

Negative

Overhead: Capturing traces adds complexity
Storage: Wisp traces need cleanup strategy
Coupling: Skills and beads become interdependent

Neutral

Skills remain usable without molecules (standalone invocation)
Molecules remain usable without skills (manual work)
Integration is opt-in per-proto via skill: field

Implementation Plan

Phase 1 (this ADR): Document the design
Phase 2: Define wisp execution trace format (skills-jeb)
Phase 3: Prototype elevation pipeline (skills-3em)
Phase 4: Test on worklog skill (skills-rex)

Anti-Patterns to Avoid

Over-instrumentation: Don't trace every shell command. Focus on meaningful checkpoints.
Forced coupling: Don't require molecules to use skills or vice versa.
Premature elevation: Don't auto-generate skills from single executions. Wait for patterns.
Trace bloat: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.

Open Questions

How granular should skill_version be? Git SHA? Flake hash? Both?
Should traces capture stdout/stderr or just exit codes?
What's the minimum number of similar executions before suggesting elevation?
How do we handle skills that span multiple mol steps?

References

beads 0.35 molecule commands: bd mol --help, bd wisp --help, bd pour --help
Skills repo: ~/proj/skills/
Existing skills: worklog, orch, niri-window-capture, spec-review, etc.

6.3 KiB Raw Blame History