ADRs: add skill manifest, versioning, and trace security designs
- ADR-002: Skill manifest format with JSON Schema, path bases, preconditions - ADR-003: Versioning with Nix store paths, lockfiles, interface contracts - ADR-004: Trace security with HMAC redaction, entropy detection, trace modes Refined based on orch consensus feedback from GPT and Gemini. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
afbfb6b05b
commit
c1f644e6a6
178
docs/adr/001-skills-molecules-integration.md
Normal file
178
docs/adr/001-skills-molecules-integration.md
Normal file
|
|
@ -0,0 +1,178 @@
|
|||
# ADR-001: Skills and Molecules Integration
|
||||
|
||||
## Status
|
||||
|
||||
In Progress
|
||||
|
||||
## Context
|
||||
|
||||
We have two complementary systems for agent-assisted work:
|
||||
|
||||
1. **Skills** (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.
|
||||
|
||||
2. **Molecules** (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.
|
||||
|
||||
These systems evolved independently but have natural integration points. The question is: how should they connect?
|
||||
|
||||
### Current State
|
||||
|
||||
**Skills system:**
|
||||
- Skills are directories under `~/.claude/skills/` (deployed via Nix)
|
||||
- Each skill has a `SKILL.md` with frontmatter + prompt/instructions
|
||||
- Skills are invoked by agents via `/skill-name` or automatically based on triggers
|
||||
- No execution tracking beyond what the agent logs
|
||||
|
||||
**Molecules system (beads 0.35):**
|
||||
- **Proto**: Template epic with `template` label, uses `{{var}}` placeholders
|
||||
- **Mol**: Instantiated work from a proto (permanent, git-synced)
|
||||
- **Wisp**: Ephemeral mol for operational work (gitignored, `.beads-wisp/`)
|
||||
- **Hook**: Agent's attachment point for assigned work
|
||||
- **Pin**: Assign mol to agent's hook
|
||||
|
||||
Key molecule commands:
|
||||
```
|
||||
bd mol spawn <proto> # Create mol from proto
|
||||
bd pour <proto> # Spawn persistent mol
|
||||
bd wisp create <proto> # Spawn ephemeral mol
|
||||
bd pin <mol> --for me # Assign to self
|
||||
bd mol squash <id> # Compress mol → digest
|
||||
bd mol distill <epic> # Extract proto from ad-hoc epic
|
||||
```
|
||||
|
||||
### Problem Statement
|
||||
|
||||
1. Skills have no execution history - we can't replay, debug, or learn from past runs
|
||||
2. Molecules track work but don't know which skills were used to complete them
|
||||
3. Successful ad-hoc work patterns can't be easily promoted to reusable skills
|
||||
4. No connection between "what was done" (mol) and "how it was done" (skill)
|
||||
|
||||
## Decision
|
||||
|
||||
Link skills and molecules via three mechanisms:
|
||||
|
||||
### 1. Skill References in Molecules
|
||||
|
||||
Add a `skill:` field to molecule nodes that references skills used during execution:
|
||||
|
||||
```yaml
|
||||
# In a proto template
|
||||
- title: "Generate worklog for {{session}}"
|
||||
skill: worklog
|
||||
description: "Document the work session"
|
||||
```
|
||||
|
||||
When an agent works on a mol step that has a `skill:` reference, it knows which skill to invoke.
|
||||
|
||||
### 2. Wisp Execution Traces
|
||||
|
||||
Use wisps to capture skill execution traces. When a skill runs within a molecule context:
|
||||
|
||||
```yaml
|
||||
# Wisp execution trace format
|
||||
skill_ref: worklog
|
||||
skill_version: "abc123" # git SHA of skill
|
||||
inputs:
|
||||
context: "session context..."
|
||||
env:
|
||||
PROJECT: "skills"
|
||||
tool_calls:
|
||||
- cmd: "extract-metrics.sh"
|
||||
args: ["--session", "2025-12-23"]
|
||||
exit_code: 0
|
||||
duration_ms: 1234
|
||||
checkpoints:
|
||||
- step: "metrics_extracted"
|
||||
summary: "Found 5 commits, 12 file changes"
|
||||
timestamp: "2025-12-23T19:30:00Z"
|
||||
outputs:
|
||||
files_created:
|
||||
- "docs/worklogs/2025-12-23-session.org"
|
||||
```
|
||||
|
||||
This enables:
|
||||
- Replay: Re-run a skill with the same inputs
|
||||
- Diff: Compare two executions of the same skill
|
||||
- Debug: Understand what happened when something fails
|
||||
- Regression testing: Detect when skill behavior changes
|
||||
|
||||
### 3. Elevation Pipeline
|
||||
|
||||
When a molecule completes successfully, offer to "elevate" it to a skill:
|
||||
|
||||
```
|
||||
bd mol squash <mol-id> # Compress execution history
|
||||
bd elevate <mol-id> # Analyze and generate skill draft
|
||||
```
|
||||
|
||||
The elevation pipeline:
|
||||
1. Analyze squashed trace for generalizable patterns
|
||||
2. Extract variable inputs (things that changed between runs)
|
||||
3. Generate SKILL.md draft with:
|
||||
- Frontmatter from mol metadata
|
||||
- Steps derived from trace checkpoints
|
||||
- Scripts extracted from tool_calls
|
||||
4. Human approval gate before deployment
|
||||
|
||||
### Phase Transitions (Chemistry Metaphor)
|
||||
|
||||
```
|
||||
Proto (solid) → pour → Mol (liquid) → squash → Digest (solid)
|
||||
↓
|
||||
Wisp (vapor) ← create ← Proto
|
||||
↓
|
||||
execute → Trace
|
||||
↓
|
||||
elevate → Skill draft
|
||||
```
|
||||
|
||||
- **Solid**: Static templates (protos, digests, skills)
|
||||
- **Liquid**: Active work being tracked (mols)
|
||||
- **Vapor**: Ephemeral execution (wisps, traces)
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Traceability**: Know exactly how work was completed
|
||||
- **Reusability**: Successful patterns become skills automatically
|
||||
- **Debugging**: Execution traces make failures understandable
|
||||
- **Learning**: System improves as more work is tracked
|
||||
|
||||
### Negative
|
||||
|
||||
- **Overhead**: Capturing traces adds complexity
|
||||
- **Storage**: Wisp traces need cleanup strategy
|
||||
- **Coupling**: Skills and beads become interdependent
|
||||
|
||||
### Neutral
|
||||
|
||||
- Skills remain usable without molecules (standalone invocation)
|
||||
- Molecules remain usable without skills (manual work)
|
||||
- Integration is opt-in per-proto via `skill:` field
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
1. **Phase 1** (this ADR): Document the design
|
||||
2. **Phase 2**: Define wisp execution trace format (skills-jeb)
|
||||
3. **Phase 3**: Prototype elevation pipeline (skills-3em)
|
||||
4. **Phase 4**: Test on worklog skill (skills-rex)
|
||||
|
||||
## Anti-Patterns to Avoid
|
||||
|
||||
1. **Over-instrumentation**: Don't trace every shell command. Focus on meaningful checkpoints.
|
||||
2. **Forced coupling**: Don't require molecules to use skills or vice versa.
|
||||
3. **Premature elevation**: Don't auto-generate skills from single executions. Wait for patterns.
|
||||
4. **Trace bloat**: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. How granular should skill_version be? Git SHA? Flake hash? Both?
|
||||
2. Should traces capture stdout/stderr or just exit codes?
|
||||
3. What's the minimum number of similar executions before suggesting elevation?
|
||||
4. How do we handle skills that span multiple mol steps?
|
||||
|
||||
## References
|
||||
|
||||
- beads 0.35 molecule commands: `bd mol --help`, `bd wisp --help`, `bd pour --help`
|
||||
- Skills repo: `~/proj/skills/`
|
||||
- Existing skills: worklog, orch, niri-window-capture, spec-review, etc.
|
||||
312
docs/adr/002-skill-manifest-format.md
Normal file
312
docs/adr/002-skill-manifest-format.md
Normal file
|
|
@ -0,0 +1,312 @@
|
|||
# ADR-002: Skill Manifest Format
|
||||
|
||||
## Status
|
||||
|
||||
Draft (Revised)
|
||||
|
||||
## Context
|
||||
|
||||
Skills currently have minimal frontmatter (name, description). For molecule integration, we need skills to declare their interface so:
|
||||
- Beads can validate proto→skill bindings before spawning
|
||||
- Agents know what inputs to provide
|
||||
- Traces can record what was expected vs actual
|
||||
- Errors are clear when requirements aren't met
|
||||
|
||||
## Decision
|
||||
|
||||
Extend SKILL.md frontmatter with optional manifest fields. All new fields are optional for backward compatibility.
|
||||
|
||||
### Proposed Schema
|
||||
|
||||
```yaml
|
||||
---
|
||||
manifest_version: "1.0"
|
||||
name: worklog
|
||||
description: Create org-mode worklogs documenting work sessions.
|
||||
|
||||
# Version info for reproducibility
|
||||
version: 1.0.0
|
||||
|
||||
# What the skill needs to run
|
||||
inputs:
|
||||
required:
|
||||
- name: session_date
|
||||
description: Date of the session (YYYY-MM-DD)
|
||||
schema:
|
||||
type: string
|
||||
pattern: "^\\d{4}-\\d{2}-\\d{2}$"
|
||||
optional:
|
||||
- name: topic
|
||||
description: Brief topic descriptor for filename
|
||||
schema:
|
||||
type: string
|
||||
default: "session"
|
||||
- name: output_dir
|
||||
description: Directory for worklog output (relative to repo root)
|
||||
schema:
|
||||
type: string
|
||||
default: "docs/worklogs"
|
||||
- name: api_key
|
||||
description: API key for external service
|
||||
sensitive: true
|
||||
schema:
|
||||
type: string
|
||||
|
||||
# Environment requirements
|
||||
env:
|
||||
required: []
|
||||
optional:
|
||||
- name: PROJECT
|
||||
description: Project name for context
|
||||
- name: API_TOKEN
|
||||
description: Authentication token
|
||||
sensitive: true
|
||||
|
||||
# Tools/commands that must be available
|
||||
preconditions:
|
||||
commands:
|
||||
- cmd: git
|
||||
min_version: "2.40"
|
||||
- cmd: date
|
||||
files:
|
||||
- path: scripts/extract-metrics.sh
|
||||
base: skill_root
|
||||
description: Metrics extraction script
|
||||
- path: templates/worklog-template.org
|
||||
base: skill_root
|
||||
description: Worklog template
|
||||
|
||||
# What the skill produces
|
||||
outputs:
|
||||
files:
|
||||
- pattern: "{{output_dir}}/{{session_date}}-{{topic}}.org"
|
||||
base: repo_root
|
||||
description: The generated worklog file
|
||||
artifacts: []
|
||||
|
||||
# Execution characteristics (hints for scheduling, not enforcement)
|
||||
execution:
|
||||
idempotent: false # Safe to re-run?
|
||||
destructive: false # Modifies existing files?
|
||||
network: false # Requires network access?
|
||||
interactive: false # Requires user input during execution?
|
||||
timeout: 30 # Maximum execution time in seconds
|
||||
|
||||
# For security classification
|
||||
sensitive: false # Handles sensitive data?
|
||||
---
|
||||
```
|
||||
|
||||
### Field Definitions
|
||||
|
||||
#### `manifest_version`
|
||||
Schema version for the manifest format itself. Allows parsers to handle breaking changes.
|
||||
|
||||
- Current version: `"1.0"`
|
||||
- Semantic versioning: major version changes indicate breaking schema changes
|
||||
|
||||
#### `inputs`
|
||||
Declares what the skill needs from the caller. Uses JSON Schema subset for type definitions.
|
||||
|
||||
- `required`: Must be provided or skill fails
|
||||
- `optional`: Has defaults, can be overridden
|
||||
- Each input has: `name`, `description`, `schema` (JSON Schema), optional `sensitive` flag
|
||||
- `sensitive: true`: Mark inputs for redaction in traces
|
||||
|
||||
**JSON Schema Subset Supported:**
|
||||
- `type`: string, number, integer, boolean, array, object
|
||||
- `pattern`: regex for string validation
|
||||
- `minimum`, `maximum`: numeric bounds
|
||||
- `items`: array item schema
|
||||
- `properties`: object property schemas
|
||||
- `default`: default value
|
||||
- `enum`: allowed values
|
||||
|
||||
#### `env`
|
||||
Environment variables the skill reads.
|
||||
|
||||
- `required`: Skill fails if not set
|
||||
- `optional`: Used if available, not fatal if missing
|
||||
- `sensitive: true`: Mark for redaction
|
||||
|
||||
#### `preconditions`
|
||||
What must exist before skill can run.
|
||||
|
||||
- `commands`: CLI tools that must be in PATH
|
||||
- `cmd`: command name (required)
|
||||
- `min_version`: minimum version (optional, e.g., "2.40")
|
||||
- `max_version`: maximum version (optional)
|
||||
- `files`: Files that must exist
|
||||
- `path`: relative path (required)
|
||||
- `base`: path base - `skill_root`, `repo_root`, or `cwd` (default: `skill_root`)
|
||||
- `description`: what this file is for (optional)
|
||||
|
||||
**Path Resolution:**
|
||||
- `skill_root`: Directory containing SKILL.md
|
||||
- `repo_root`: Git repository root
|
||||
- `cwd`: Current working directory
|
||||
- Absolute paths (starting with `/` or `~`) are forbidden
|
||||
|
||||
#### `outputs`
|
||||
What the skill produces.
|
||||
|
||||
- `files`: File patterns with `{{var}}` substitution
|
||||
- `pattern`: file path pattern
|
||||
- `base`: path base (default: `repo_root`)
|
||||
- `description`: what this output is for
|
||||
- `artifacts`: Other outputs (stdout, state changes, etc.)
|
||||
|
||||
#### `execution`
|
||||
Characteristics for scheduling and safety. **These are hints, not enforcement.**
|
||||
|
||||
- `idempotent`: Can run multiple times safely
|
||||
- `destructive`: Modifies or deletes existing data
|
||||
- `network`: Requires internet access
|
||||
- `interactive`: Needs human input during run
|
||||
- `timeout`: Maximum execution time in seconds
|
||||
|
||||
The runtime may use these hints to:
|
||||
- Schedule non-destructive skills in parallel
|
||||
- Warn before running destructive operations
|
||||
- Set execution timeouts
|
||||
|
||||
These flags do NOT enforce sandboxing or isolation.
|
||||
|
||||
#### `sensitive`
|
||||
Whether skill handles sensitive data.
|
||||
|
||||
- `false` (default): Standard tracing, eligible for elevation
|
||||
- `true`: Aggressive redaction, elevation blocked without review
|
||||
|
||||
### Validation Rules
|
||||
|
||||
When beads spawns a molecule with `skill:` reference:
|
||||
|
||||
1. Check skill exists
|
||||
2. Verify `manifest_version` is supported
|
||||
3. Validate all required inputs are mapped
|
||||
4. Validate input schemas using JSON Schema
|
||||
5. Check preconditions (commands with versions, files)
|
||||
6. Check required env vars are set
|
||||
7. Warn if optional inputs have no mapping
|
||||
8. Apply redaction rules for `sensitive` fields
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
- All new fields are optional
|
||||
- Skills with only `name`/`description` continue to work
|
||||
- Missing manifest = no validation (current behavior)
|
||||
- Missing `manifest_version` = assume legacy format
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Clear contracts between molecules and skills
|
||||
- Better error messages (fail before execution, not during)
|
||||
- JSON Schema provides standard, well-understood type system
|
||||
- Sensitive data handling built into manifest
|
||||
- Version constraints prevent runtime errors
|
||||
- Explicit path resolution prevents portability issues
|
||||
|
||||
### Negative
|
||||
|
||||
- More frontmatter to maintain
|
||||
- Risk of manifest drift from actual behavior
|
||||
- JSON Schema subset may not cover all validation needs
|
||||
|
||||
### Neutral
|
||||
|
||||
- Existing skills unaffected until they opt-in
|
||||
- Execution flags are hints, not enforcement
|
||||
|
||||
## Examples
|
||||
|
||||
### Minimal (current, still valid)
|
||||
```yaml
|
||||
---
|
||||
name: simple-skill
|
||||
description: Does something simple
|
||||
---
|
||||
```
|
||||
|
||||
### With inputs only
|
||||
```yaml
|
||||
---
|
||||
manifest_version: "1.0"
|
||||
name: greeter
|
||||
description: Greets the user
|
||||
inputs:
|
||||
required:
|
||||
- name: username
|
||||
description: Name of the user to greet
|
||||
schema:
|
||||
type: string
|
||||
---
|
||||
```
|
||||
|
||||
### With sensitive data
|
||||
```yaml
|
||||
---
|
||||
manifest_version: "1.0"
|
||||
name: deploy
|
||||
description: Deploy application to production
|
||||
inputs:
|
||||
required:
|
||||
- name: deploy_key
|
||||
description: SSH deployment key
|
||||
sensitive: true
|
||||
schema:
|
||||
type: string
|
||||
env:
|
||||
required:
|
||||
- name: DEPLOY_TOKEN
|
||||
description: Authentication token
|
||||
sensitive: true
|
||||
execution:
|
||||
network: true
|
||||
destructive: true
|
||||
timeout: 300
|
||||
sensitive: true
|
||||
---
|
||||
```
|
||||
|
||||
### With version constraints
|
||||
```yaml
|
||||
---
|
||||
manifest_version: "1.0"
|
||||
name: analyze-git
|
||||
description: Analyze git repository history
|
||||
preconditions:
|
||||
commands:
|
||||
- cmd: git
|
||||
min_version: "2.40"
|
||||
- cmd: python3
|
||||
min_version: "3.11"
|
||||
files:
|
||||
- path: scripts/analyze.py
|
||||
base: skill_root
|
||||
inputs:
|
||||
optional:
|
||||
- name: since
|
||||
description: Analyze commits since this date
|
||||
schema:
|
||||
type: string
|
||||
pattern: "^\\d{4}-\\d{2}-\\d{2}$"
|
||||
default: "2025-01-01"
|
||||
outputs:
|
||||
files:
|
||||
- pattern: "reports/git-analysis-{{since}}.md"
|
||||
base: repo_root
|
||||
execution:
|
||||
idempotent: true
|
||||
timeout: 60
|
||||
---
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Should `outputs` be validated after execution?
|
||||
2. How to handle skills that produce variable outputs?
|
||||
3. Should we allow custom JSON Schema extensions?
|
||||
4. Where does skill version come from - git tag, manual, or derived?
|
||||
309
docs/adr/003-skill-versioning-strategy.md
Normal file
309
docs/adr/003-skill-versioning-strategy.md
Normal file
|
|
@ -0,0 +1,309 @@
|
|||
# ADR-003: Skill Versioning Strategy
|
||||
|
||||
## Status
|
||||
|
||||
Draft (Revised)
|
||||
|
||||
## Context
|
||||
|
||||
Skills are deployed via Nix/direnv, which means:
|
||||
- The "installed" version is a build artifact, not just source code
|
||||
- Git SHA may not exist or match deployed content
|
||||
- Skills can reference external scripts/binaries
|
||||
- Protos and molecules need stable references
|
||||
|
||||
A single version identifier is insufficient. We need to answer:
|
||||
1. How do we identify what version of a skill ran?
|
||||
2. How do protos reference skills (pin vs float)?
|
||||
3. How do we handle breaking changes?
|
||||
|
||||
## Decision
|
||||
|
||||
### Version Tuple
|
||||
|
||||
Every skill execution records a version tuple:
|
||||
|
||||
```yaml
|
||||
skill_version:
|
||||
# Primary identity - Nix store path (immutable, content-addressed)
|
||||
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
|
||||
|
||||
# Source identity (where it came from)
|
||||
source_ref: "git+file:///home/dan/proj/skills#worklog"
|
||||
source_rev: "abc123def" # git SHA, null if not in git
|
||||
|
||||
# Content identity (what was actually deployed)
|
||||
content_hash: "sha256:789xyz..." # hash of skill content per algorithm below
|
||||
|
||||
# Semantic version from manifest (optional)
|
||||
version: "1.0.0"
|
||||
|
||||
# Deployment metadata
|
||||
deployed_at: "2025-12-23T10:00:00Z"
|
||||
```
|
||||
|
||||
#### Identity Selection by Context
|
||||
|
||||
| Context | Primary Identity | Rationale |
|
||||
|---------|------------------|-----------|
|
||||
| Nix-deployed skills | `nix_store_path` | Immutable, content-addressed by Nix |
|
||||
| Development/local | `content_hash` | No Nix path available |
|
||||
| Trace replay | `nix_store_path` or `content_hash` | Exact reproducibility |
|
||||
| Proto pinning | `content_hash` or `version` | Portable across machines |
|
||||
|
||||
### Computing `content_hash`
|
||||
|
||||
Hash computation must be deterministic and portable:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# skill-content-hash.sh <skill-dir>
|
||||
set -euo pipefail
|
||||
|
||||
SKILL_DIR="${1:-.}"
|
||||
SKILL_DIR="$(cd "$SKILL_DIR" && pwd)"
|
||||
|
||||
# Use .skillignore if present, otherwise default exclusions
|
||||
if [[ -f "$SKILL_DIR/.skillignore" ]]; then
|
||||
EXCLUDE_FILE="$SKILL_DIR/.skillignore"
|
||||
else
|
||||
EXCLUDE_FILE=""
|
||||
fi
|
||||
|
||||
# Find files, convert to relative paths, sort, hash
|
||||
(
|
||||
cd "$SKILL_DIR"
|
||||
find . -type f \
|
||||
! -path './.git/*' \
|
||||
! -path './.skillignore' \
|
||||
! -name '*.pyc' \
|
||||
! -name '.DS_Store' \
|
||||
! -name '__pycache__' \
|
||||
${EXCLUDE_FILE:+-not -path "$(cat "$EXCLUDE_FILE" | grep -v '^#' | tr '\n' '|' | sed 's/|$//')"} \
|
||||
-print0 | \
|
||||
sort -z | \
|
||||
xargs -0 -I {} sh -c 'echo "{}"; cat "{}"' | \
|
||||
sha256sum | \
|
||||
cut -d' ' -f1
|
||||
)
|
||||
```
|
||||
|
||||
**Critical requirements:**
|
||||
- Use relative paths (not absolute) for portability
|
||||
- Include filename in hash stream (not just content)
|
||||
- Sort files deterministically before hashing
|
||||
- Exclude non-functional files via `.skillignore`
|
||||
|
||||
#### `.skillignore` Format
|
||||
|
||||
Skills can exclude files from content hash (like `.gitignore`):
|
||||
|
||||
```
|
||||
# .skillignore - files excluded from content_hash
|
||||
README.md
|
||||
CHANGELOG.md
|
||||
docs/
|
||||
tests/
|
||||
*.test.js
|
||||
```
|
||||
|
||||
This allows documentation changes without invalidating version pins.
|
||||
|
||||
### Proto Reference Modes
|
||||
|
||||
#### 1. Float (default, development)
|
||||
```yaml
|
||||
skill: worklog
|
||||
```
|
||||
Uses whatever version is currently deployed. Simple but unstable.
|
||||
|
||||
#### 2. Pin to content hash (CI/automation)
|
||||
```yaml
|
||||
skill:
|
||||
id: worklog
|
||||
content_hash: "sha256:789xyz..."
|
||||
```
|
||||
Fails if deployed skill doesn't match. Most stable for automation.
|
||||
|
||||
#### 3. Pin to minimum version (published templates)
|
||||
```yaml
|
||||
skill:
|
||||
id: worklog
|
||||
min_version: "1.0.0"
|
||||
```
|
||||
Requires skill manifest to declare `version` field with semantic versioning.
|
||||
|
||||
### Lockfile Workflow
|
||||
|
||||
For reproducible proto execution, use `proto.lock`:
|
||||
|
||||
```yaml
|
||||
# my-proto.lock
|
||||
# Auto-generated - do not edit manually
|
||||
# Regenerate with: bd proto lock my-proto
|
||||
|
||||
generated_at: "2025-12-23T10:00:00Z"
|
||||
beads_version: "0.35.0"
|
||||
|
||||
skills:
|
||||
worklog:
|
||||
content_hash: "sha256:789xyz..."
|
||||
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
|
||||
version: "1.0.0"
|
||||
source_rev: "abc123def"
|
||||
|
||||
deploy:
|
||||
content_hash: "sha256:456abc..."
|
||||
nix_store_path: "/nix/store/def456-deploy-2.1.0"
|
||||
version: "2.1.0"
|
||||
source_rev: "def456ghi"
|
||||
```
|
||||
|
||||
**Workflow:**
|
||||
```bash
|
||||
# Development: float freely
|
||||
bd mol spawn my-proto
|
||||
|
||||
# CI/production: lock versions
|
||||
bd proto lock my-proto # Generate/update lockfile
|
||||
bd mol spawn my-proto --locked # Fail if versions don't match lock
|
||||
```
|
||||
|
||||
Lockfile should be committed to version control for reproducible builds.
|
||||
|
||||
### Breaking Change Handling
|
||||
|
||||
#### Interface Contracts
|
||||
|
||||
For semantic versioning to be meaningful, skills should declare their interface contract:
|
||||
|
||||
```yaml
|
||||
# In SKILL.md manifest
|
||||
interface:
|
||||
inputs:
|
||||
- session_date # Required inputs are part of contract
|
||||
- topic # Optional inputs with defaults
|
||||
outputs:
|
||||
- pattern: "docs/worklogs/*.org"
|
||||
env:
|
||||
- PROJECT # Required env vars
|
||||
```
|
||||
|
||||
**Breaking changes** (bump major version):
|
||||
- Renamed/removed required inputs
|
||||
- Changed required input types
|
||||
- Changed output patterns
|
||||
- Added new required inputs without defaults
|
||||
- Removed required env vars
|
||||
|
||||
**Non-breaking changes** (bump minor/patch):
|
||||
- Added optional inputs with defaults
|
||||
- Documentation changes
|
||||
- Bug fixes
|
||||
- Performance improvements
|
||||
|
||||
#### Version Validation
|
||||
|
||||
```bash
|
||||
# When spawning a proto with pinned skill
|
||||
bd mol spawn my-proto --var x=1
|
||||
# → Validates skill content_hash or version matches pin
|
||||
# → Fails early if mismatch
|
||||
|
||||
# Check for breaking changes
|
||||
bd skill check-compat worklog@1.0.0 worklog@2.0.0
|
||||
# → Reports interface differences
|
||||
```
|
||||
|
||||
### Path Sanitization
|
||||
|
||||
Traces should sanitize paths to avoid leaking local structure:
|
||||
|
||||
```yaml
|
||||
# Before sanitization
|
||||
skill_version:
|
||||
source_ref: "git+file:///home/dan/proj/skills#worklog"
|
||||
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
|
||||
|
||||
# After sanitization (for sharing/elevation)
|
||||
skill_version:
|
||||
source_ref: "git+file://LOCAL/skills#worklog"
|
||||
nix_store_path: "/nix/store/abc123-worklog-1.0.0" # Already safe
|
||||
```
|
||||
|
||||
Sanitization patterns:
|
||||
- `/home/<username>/` → `LOCAL/`
|
||||
- `/Users/<username>/` → `LOCAL/`
|
||||
- Nix store paths are already content-addressed and safe
|
||||
|
||||
### Recording in Traces
|
||||
|
||||
Wisp traces always record the full version tuple:
|
||||
|
||||
```yaml
|
||||
execution:
|
||||
skill_version:
|
||||
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
|
||||
source_ref: "git+file://LOCAL/skills#worklog" # Sanitized
|
||||
source_rev: "abc123def"
|
||||
content_hash: "sha256:789xyz..."
|
||||
version: "1.0.0"
|
||||
```
|
||||
|
||||
This enables:
|
||||
- Replay with exact version
|
||||
- Diff between executions
|
||||
- Debugging "it worked before" issues
|
||||
- Portable sharing (sanitized paths)
|
||||
|
||||
### Recommendations
|
||||
|
||||
| Use Case | Mode | Identity | Why |
|
||||
|----------|------|----------|-----|
|
||||
| Active development | Float | N/A | Iterate quickly |
|
||||
| Local testing | Float or pin | `content_hash` | Reproducible locally |
|
||||
| Shared proto | Pin + lock | `content_hash` | Portable across machines |
|
||||
| Published template | Pin to version | `min_version` | Semantic compatibility |
|
||||
| CI/automation | Locked | `content_hash` | Exact reproducibility |
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Full traceability of what ran
|
||||
- Reproducible executions via lockfile
|
||||
- Clear failure when version mismatch
|
||||
- Supports gradual adoption (float first, pin later)
|
||||
- Portable hashing (relative paths)
|
||||
- Interface contracts enable meaningful SemVer
|
||||
|
||||
### Negative
|
||||
|
||||
- Content hash computation adds overhead
|
||||
- Pinned protos need updates when skills change
|
||||
- More fields to manage
|
||||
- Lockfile adds another file to maintain
|
||||
|
||||
### Neutral
|
||||
|
||||
- Float mode preserves current behavior
|
||||
- Version tuple is metadata, not enforcement
|
||||
- Nix store path available only in Nix-deployed environments
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Implement deterministic content hash script
|
||||
- [ ] Add `.skillignore` support to hash computation
|
||||
- [ ] Add `nix_store_path` capture for Nix-deployed skills
|
||||
- [ ] Implement `bd proto lock` command
|
||||
- [ ] Implement `bd mol spawn --locked` validation
|
||||
- [ ] Add path sanitization to trace writer
|
||||
- [ ] Add interface contract validation
|
||||
- [ ] Implement `bd skill check-compat` command
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Should lockfile include transitive dependencies (skills that call other skills)?
|
||||
2. How to handle skills that shell out to system binaries (git, curl)? Version those too?
|
||||
3. Cache content_hash or compute on every invocation?
|
||||
4. Should we support nix flake references directly? (e.g., `github:user/skills#worklog`)
|
||||
394
docs/adr/004-trace-security-redaction.md
Normal file
394
docs/adr/004-trace-security-redaction.md
Normal file
|
|
@ -0,0 +1,394 @@
|
|||
# ADR-004: Trace Security and Redaction Policy
|
||||
|
||||
## Status
|
||||
|
||||
Draft (Revised)
|
||||
|
||||
## Context
|
||||
|
||||
Skill execution traces (wisps) capture:
|
||||
- Environment variables
|
||||
- Input arguments
|
||||
- Tool calls with arguments
|
||||
- File paths and contents
|
||||
- Stdout/stderr
|
||||
|
||||
This data often contains secrets:
|
||||
- API keys (AWS, GitHub, OpenAI)
|
||||
- Tokens and passwords
|
||||
- PII (usernames, emails, paths)
|
||||
- Proprietary data
|
||||
|
||||
Wisps are gitignored but still risky:
|
||||
- Local machine compromise
|
||||
- Accidental sharing
|
||||
- Squashing into public digests
|
||||
- Elevation to published skills
|
||||
|
||||
## Decision
|
||||
|
||||
### Default-Deny Policy
|
||||
|
||||
Traces capture minimal information by default. Sensitive data requires explicit opt-in.
|
||||
|
||||
### HMAC-Based Redaction
|
||||
|
||||
Instead of plain `[REDACTED]`, use HMAC hashing to enable correlation without revealing values:
|
||||
|
||||
```yaml
|
||||
# Format: [REDACTED:hmac:<first-8-chars-of-hmac>]
|
||||
inputs:
|
||||
api_token: "[REDACTED:hmac:a1b2c3d4]"
|
||||
other_token: "[REDACTED:hmac:a1b2c3d4]" # Same value = same hash
|
||||
different_token: "[REDACTED:hmac:e5f6g7h8]" # Different value
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Can detect if same secret was used across executions
|
||||
- Can correlate inputs without knowing values
|
||||
- HMAC key is per-session, not stored in trace
|
||||
|
||||
**Implementation:**
|
||||
```python
|
||||
import hmac
|
||||
import hashlib
|
||||
|
||||
def redact_sensitive(value: str, session_key: bytes) -> str:
|
||||
"""Redact value with HMAC for correlation."""
|
||||
h = hmac.new(session_key, value.encode(), hashlib.sha256)
|
||||
return f"[REDACTED:hmac:{h.hexdigest()[:8]}]"
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
**Default**: Only capture from allowlist.
|
||||
|
||||
```yaml
|
||||
trace_env_allowlist:
|
||||
- USER
|
||||
- HOME
|
||||
- PROJECT
|
||||
- PWD
|
||||
- SHELL
|
||||
- TERM
|
||||
- LANG
|
||||
- TZ
|
||||
```
|
||||
|
||||
**Never capture** (hardcoded denylist with glob patterns):
|
||||
```yaml
|
||||
trace_env_denylist:
|
||||
- "*_KEY"
|
||||
- "*_SECRET"
|
||||
- "*_TOKEN"
|
||||
- "*_PASSWORD"
|
||||
- "*_CREDENTIAL*"
|
||||
- "AWS_*"
|
||||
- "GITHUB_TOKEN"
|
||||
- "OPENAI_API_KEY"
|
||||
- "ANTHROPIC_API_KEY"
|
||||
```
|
||||
|
||||
### Input Arguments
|
||||
|
||||
**Default**: Inputs are NOT captured unless explicitly marked safe.
|
||||
|
||||
Skills must opt-in to capture inputs in manifest:
|
||||
|
||||
```yaml
|
||||
inputs:
|
||||
required:
|
||||
- name: api_token
|
||||
type: string
|
||||
sensitive: true # Default, will be HMAC-redacted
|
||||
- name: project_name
|
||||
type: string
|
||||
sensitive: false # Explicitly safe to capture
|
||||
```
|
||||
|
||||
**Rationale**: Safer to miss debugging data than leak secrets. Most inputs can be reconstructed from context.
|
||||
|
||||
Trace output:
|
||||
```yaml
|
||||
inputs:
|
||||
api_token: "[REDACTED:hmac:a1b2c3d4]"
|
||||
project: "my-project" # Only captured because sensitive: false
|
||||
```
|
||||
|
||||
### Tool Calls
|
||||
|
||||
**Default**: Capture command name and exit code. Parse arguments structurally.
|
||||
|
||||
#### Structured Argument Parsing
|
||||
|
||||
Instead of regex on raw strings, parse arguments properly:
|
||||
|
||||
```yaml
|
||||
tool_calls:
|
||||
- cmd: "curl"
|
||||
parsed_args:
|
||||
url: "https://api.example.com/endpoint"
|
||||
headers:
|
||||
- name: "Authorization"
|
||||
value: "[REDACTED:hmac:b2c3d4e5]"
|
||||
- name: "Content-Type"
|
||||
value: "application/json"
|
||||
method: "POST"
|
||||
exit_code: 0
|
||||
duration_ms: 1234
|
||||
```
|
||||
|
||||
For commands we don't have parsers for, fall back to pattern redaction:
|
||||
|
||||
```yaml
|
||||
tool_calls:
|
||||
- cmd: "unknown-tool"
|
||||
raw_args: "--token [REDACTED:hmac:c3d4e5f6] --output file.txt"
|
||||
exit_code: 0
|
||||
```
|
||||
|
||||
#### Known Command Parsers
|
||||
|
||||
Implement argument parsers for common commands:
|
||||
- `curl`: Parse `-H`, `--header`, `-u`, `--user`, `-d`, `--data`
|
||||
- `git`: Parse credentials in URLs, `-c` config values
|
||||
- `aws`: Parse `--profile`, environment-based auth
|
||||
- `docker`: Parse `-e`, `--env`, registry auth
|
||||
|
||||
#### Fallback Redaction Patterns
|
||||
|
||||
For unparsed commands, apply pattern matching:
|
||||
```
|
||||
Bearer [^\s]+ → Bearer [REDACTED:hmac:...]
|
||||
token=[^\s&]+ → token=[REDACTED:hmac:...]
|
||||
password=[^\s&]+ → password=[REDACTED:hmac:...]
|
||||
-p [^\s]+ → -p [REDACTED:hmac:...]
|
||||
--password[= ][^\s]+ → --password [REDACTED:hmac:...]
|
||||
```
|
||||
|
||||
### Entropy Detection
|
||||
|
||||
Catch secrets that slip through pattern matching:
|
||||
|
||||
```python
|
||||
import math
|
||||
from collections import Counter
|
||||
|
||||
def entropy(s: str) -> float:
|
||||
"""Calculate Shannon entropy of string."""
|
||||
if not s:
|
||||
return 0
|
||||
counts = Counter(s)
|
||||
probs = [c / len(s) for c in counts.values()]
|
||||
return -sum(p * math.log2(p) for p in probs)
|
||||
|
||||
def looks_like_secret(value: str) -> bool:
|
||||
"""Heuristic: high entropy + sufficient length = probably secret."""
|
||||
if len(value) < 16:
|
||||
return False
|
||||
if entropy(value) > 4.5: # Random strings typically > 4.5
|
||||
return True
|
||||
return False
|
||||
```
|
||||
|
||||
Apply entropy detection to:
|
||||
- Unrecognized command arguments
|
||||
- Environment variable values (before allowlist check)
|
||||
- Input values marked `sensitive: false` (as safety check)
|
||||
|
||||
### Stdin
|
||||
|
||||
**Never capture stdin.** Sensitive data is often piped:
|
||||
- Passwords via `echo $PASS | command`
|
||||
- API responses with tokens
|
||||
- File contents with secrets
|
||||
|
||||
```yaml
|
||||
tool_calls:
|
||||
- cmd: "some-command"
|
||||
stdin: "[NOT_CAPTURED]" # Always this value
|
||||
exit_code: 0
|
||||
```
|
||||
|
||||
### File Contents
|
||||
|
||||
**Default**: Never capture file contents.
|
||||
|
||||
Only capture:
|
||||
- File path (sanitized per ADR-003)
|
||||
- File size
|
||||
- Content hash (sha256)
|
||||
- Action (created/modified/deleted)
|
||||
|
||||
```yaml
|
||||
outputs:
|
||||
artifacts:
|
||||
- path: "docs/worklogs/2025-12-23.org"
|
||||
size: 2048
|
||||
sha256: "abc123..."
|
||||
action: created
|
||||
# content: NOT CAPTURED
|
||||
```
|
||||
|
||||
### Stdout/Stderr
|
||||
|
||||
**Default**: Not captured.
|
||||
|
||||
**Opt-in**: Skill can enable with automatic redaction:
|
||||
|
||||
```yaml
|
||||
execution:
|
||||
capture_output: true
|
||||
output_max_lines: 100
|
||||
```
|
||||
|
||||
Output is run through:
|
||||
1. Pattern redaction
|
||||
2. Entropy detection
|
||||
3. HMAC replacement
|
||||
|
||||
Before storage.
|
||||
|
||||
### Trace Modes
|
||||
|
||||
Support different capture levels for different contexts:
|
||||
|
||||
| Mode | Use Case | Capture Level |
|
||||
|------|----------|---------------|
|
||||
| `local` | Debugging on your machine | More permissive, still redacts secrets |
|
||||
| `export` | Sharing with others | Aggressive redaction, path sanitization |
|
||||
| `elevation` | Promoting to skill | Maximum redaction, human review required |
|
||||
|
||||
```yaml
|
||||
# In trace metadata
|
||||
trace:
|
||||
mode: local
|
||||
redaction_version: "1.0"
|
||||
session_key_id: "abc123" # For HMAC correlation within session
|
||||
```
|
||||
|
||||
**Mode transitions:**
|
||||
```bash
|
||||
# Local trace (default)
|
||||
bd wisp show <id>
|
||||
|
||||
# Export for sharing
|
||||
bd wisp export <id> --mode=export > trace.yaml
|
||||
|
||||
# Prepare for elevation
|
||||
bd wisp export <id> --mode=elevation > trace.yaml
|
||||
# → Requires manual review before elevation proceeds
|
||||
```
|
||||
|
||||
### Classification Levels
|
||||
|
||||
Skills declare classification in manifest:
|
||||
|
||||
| Level | Description | Trace Policy |
|
||||
|-------|-------------|--------------|
|
||||
| `public` | Safe to share externally | Standard redaction, can elevate |
|
||||
| `internal` | Normal internal use | Standard redaction, elevation requires review |
|
||||
| `secret` | Contains sensitive data | Maximum redaction, elevation blocked |
|
||||
|
||||
```yaml
|
||||
# In SKILL.md frontmatter
|
||||
classification: internal
|
||||
```
|
||||
|
||||
**Behavior by classification**:
|
||||
|
||||
- `public`: Standard tracing, eligible for elevation
|
||||
- `internal`: Standard tracing, elevation requires `--force` and review
|
||||
- `secret`:
|
||||
- All inputs treated as sensitive
|
||||
- Env vars: Only allowlist
|
||||
- Tool args: Maximum redaction + entropy detection
|
||||
- Elevation: Blocked entirely
|
||||
|
||||
### Elevation Gate
|
||||
|
||||
When elevating a molecule to a skill:
|
||||
|
||||
1. Check skill classification
|
||||
2. If `secret`: Block with error
|
||||
3. If `internal`: Warn, require `--force`, show redaction summary
|
||||
4. If `public`: Proceed with standard review
|
||||
|
||||
```bash
|
||||
$ bd elevate mol-123
|
||||
Error: Molecule used skill with classification=secret
|
||||
Cannot elevate without manual review
|
||||
|
||||
$ bd elevate mol-456
|
||||
Warning: Molecule used internal skill.
|
||||
Redacted fields: api_token, auth_header, 3 env vars
|
||||
Review trace for sensitive data.
|
||||
Use --force to proceed.
|
||||
|
||||
$ bd elevate mol-456 --force
|
||||
Elevated to skill draft: skills/new-skill/
|
||||
Please review before publishing.
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
Users can extend allowlist/denylist in `.beads/config.yaml`:
|
||||
|
||||
```yaml
|
||||
trace:
|
||||
mode: local # default mode
|
||||
env_allowlist:
|
||||
- MY_SAFE_VAR
|
||||
env_denylist:
|
||||
- MY_SECRET_*
|
||||
redact_patterns:
|
||||
- "my-api-key-[a-z0-9]+"
|
||||
entropy_threshold: 4.5 # Adjust sensitivity
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- Secrets don't leak into traces by default
|
||||
- HMAC enables correlation without revealing values
|
||||
- Entropy detection catches novel secret patterns
|
||||
- Structured parsing more reliable than regex
|
||||
- Clear mode separation for different contexts
|
||||
- Defense in depth (patterns + entropy + opt-in)
|
||||
|
||||
### Negative
|
||||
|
||||
- Less data available for debugging (especially in export mode)
|
||||
- HMAC adds computational overhead
|
||||
- Entropy detection may have false positives
|
||||
- Structured parsers need maintenance per command
|
||||
- Configuration complexity
|
||||
|
||||
### Neutral
|
||||
|
||||
- Existing wisps unaffected (new policy applies going forward)
|
||||
- Trade-off between utility and safety favors safety
|
||||
- Local mode still provides reasonable debugging data
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Implement HMAC redaction with session keys
|
||||
- [ ] Implement env var filtering with allowlist/denylist
|
||||
- [ ] Add `sensitive` field support to manifest parser (default true)
|
||||
- [ ] Build structured argument parsers for curl, git, aws, docker
|
||||
- [ ] Implement fallback pattern redaction
|
||||
- [ ] Implement entropy detection
|
||||
- [ ] Add stdin never-capture enforcement
|
||||
- [ ] Implement trace modes (local/export/elevation)
|
||||
- [ ] Add classification field to manifest
|
||||
- [ ] Implement elevation gate with redaction summary
|
||||
- [ ] Add config.yaml trace section support
|
||||
- [ ] Document patterns, allowlists, and entropy thresholds
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Should HMAC keys be derivable from trace metadata for authorized replay?
|
||||
2. How to handle secrets in multi-line values (JSON blobs, certificates)?
|
||||
3. Should we offer a "paranoid mode" that captures nothing but exit codes?
|
||||
4. How to detect and handle base64-encoded secrets?
|
||||
Loading…
Reference in a new issue