ADRs: add skill manifest, versioning, and trace security designs

- ADR-002: Skill manifest format with JSON Schema, path bases, preconditions
- ADR-003: Versioning with Nix store paths, lockfiles, interface contracts
- ADR-004: Trace security with HMAC redaction, entropy detection, trace modes

Refined based on orch consensus feedback from GPT and Gemini.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
dan 2025-12-23 20:55:18 -05:00
parent afbfb6b05b
commit c1f644e6a6
4 changed files with 1193 additions and 0 deletions

View file

@ -0,0 +1,178 @@
# ADR-001: Skills and Molecules Integration
## Status
In Progress
## Context
We have two complementary systems for agent-assisted work:
1. **Skills** (this repo): Procedural knowledge deployed via Nix/direnv. Skills define HOW to do things - scripts, prompts, and workflows that agents can invoke.
2. **Molecules** (beads 0.35+): Work tracking templates in beads. Molecules define WHAT work needs to be done - DAGs of issues that can be instantiated, tracked, and completed.
These systems evolved independently but have natural integration points. The question is: how should they connect?
### Current State
**Skills system:**
- Skills are directories under `~/.claude/skills/` (deployed via Nix)
- Each skill has a `SKILL.md` with frontmatter + prompt/instructions
- Skills are invoked by agents via `/skill-name` or automatically based on triggers
- No execution tracking beyond what the agent logs
**Molecules system (beads 0.35):**
- **Proto**: Template epic with `template` label, uses `{{var}}` placeholders
- **Mol**: Instantiated work from a proto (permanent, git-synced)
- **Wisp**: Ephemeral mol for operational work (gitignored, `.beads-wisp/`)
- **Hook**: Agent's attachment point for assigned work
- **Pin**: Assign mol to agent's hook
Key molecule commands:
```
bd mol spawn <proto> # Create mol from proto
bd pour <proto> # Spawn persistent mol
bd wisp create <proto> # Spawn ephemeral mol
bd pin <mol> --for me # Assign to self
bd mol squash <id> # Compress mol → digest
bd mol distill <epic> # Extract proto from ad-hoc epic
```
### Problem Statement
1. Skills have no execution history - we can't replay, debug, or learn from past runs
2. Molecules track work but don't know which skills were used to complete them
3. Successful ad-hoc work patterns can't be easily promoted to reusable skills
4. No connection between "what was done" (mol) and "how it was done" (skill)
## Decision
Link skills and molecules via three mechanisms:
### 1. Skill References in Molecules
Add a `skill:` field to molecule nodes that references skills used during execution:
```yaml
# In a proto template
- title: "Generate worklog for {{session}}"
skill: worklog
description: "Document the work session"
```
When an agent works on a mol step that has a `skill:` reference, it knows which skill to invoke.
### 2. Wisp Execution Traces
Use wisps to capture skill execution traces. When a skill runs within a molecule context:
```yaml
# Wisp execution trace format
skill_ref: worklog
skill_version: "abc123" # git SHA of skill
inputs:
context: "session context..."
env:
PROJECT: "skills"
tool_calls:
- cmd: "extract-metrics.sh"
args: ["--session", "2025-12-23"]
exit_code: 0
duration_ms: 1234
checkpoints:
- step: "metrics_extracted"
summary: "Found 5 commits, 12 file changes"
timestamp: "2025-12-23T19:30:00Z"
outputs:
files_created:
- "docs/worklogs/2025-12-23-session.org"
```
This enables:
- Replay: Re-run a skill with the same inputs
- Diff: Compare two executions of the same skill
- Debug: Understand what happened when something fails
- Regression testing: Detect when skill behavior changes
### 3. Elevation Pipeline
When a molecule completes successfully, offer to "elevate" it to a skill:
```
bd mol squash <mol-id> # Compress execution history
bd elevate <mol-id> # Analyze and generate skill draft
```
The elevation pipeline:
1. Analyze squashed trace for generalizable patterns
2. Extract variable inputs (things that changed between runs)
3. Generate SKILL.md draft with:
- Frontmatter from mol metadata
- Steps derived from trace checkpoints
- Scripts extracted from tool_calls
4. Human approval gate before deployment
### Phase Transitions (Chemistry Metaphor)
```
Proto (solid) → pour → Mol (liquid) → squash → Digest (solid)
Wisp (vapor) ← create ← Proto
execute → Trace
elevate → Skill draft
```
- **Solid**: Static templates (protos, digests, skills)
- **Liquid**: Active work being tracked (mols)
- **Vapor**: Ephemeral execution (wisps, traces)
## Consequences
### Positive
- **Traceability**: Know exactly how work was completed
- **Reusability**: Successful patterns become skills automatically
- **Debugging**: Execution traces make failures understandable
- **Learning**: System improves as more work is tracked
### Negative
- **Overhead**: Capturing traces adds complexity
- **Storage**: Wisp traces need cleanup strategy
- **Coupling**: Skills and beads become interdependent
### Neutral
- Skills remain usable without molecules (standalone invocation)
- Molecules remain usable without skills (manual work)
- Integration is opt-in per-proto via `skill:` field
## Implementation Plan
1. **Phase 1** (this ADR): Document the design
2. **Phase 2**: Define wisp execution trace format (skills-jeb)
3. **Phase 3**: Prototype elevation pipeline (skills-3em)
4. **Phase 4**: Test on worklog skill (skills-rex)
## Anti-Patterns to Avoid
1. **Over-instrumentation**: Don't trace every shell command. Focus on meaningful checkpoints.
2. **Forced coupling**: Don't require molecules to use skills or vice versa.
3. **Premature elevation**: Don't auto-generate skills from single executions. Wait for patterns.
4. **Trace bloat**: Wisps are ephemeral for a reason. Squash or burn, don't accumulate.
## Open Questions
1. How granular should skill_version be? Git SHA? Flake hash? Both?
2. Should traces capture stdout/stderr or just exit codes?
3. What's the minimum number of similar executions before suggesting elevation?
4. How do we handle skills that span multiple mol steps?
## References
- beads 0.35 molecule commands: `bd mol --help`, `bd wisp --help`, `bd pour --help`
- Skills repo: `~/proj/skills/`
- Existing skills: worklog, orch, niri-window-capture, spec-review, etc.

View file

@ -0,0 +1,312 @@
# ADR-002: Skill Manifest Format
## Status
Draft (Revised)
## Context
Skills currently have minimal frontmatter (name, description). For molecule integration, we need skills to declare their interface so:
- Beads can validate proto→skill bindings before spawning
- Agents know what inputs to provide
- Traces can record what was expected vs actual
- Errors are clear when requirements aren't met
## Decision
Extend SKILL.md frontmatter with optional manifest fields. All new fields are optional for backward compatibility.
### Proposed Schema
```yaml
---
manifest_version: "1.0"
name: worklog
description: Create org-mode worklogs documenting work sessions.
# Version info for reproducibility
version: 1.0.0
# What the skill needs to run
inputs:
required:
- name: session_date
description: Date of the session (YYYY-MM-DD)
schema:
type: string
pattern: "^\\d{4}-\\d{2}-\\d{2}$"
optional:
- name: topic
description: Brief topic descriptor for filename
schema:
type: string
default: "session"
- name: output_dir
description: Directory for worklog output (relative to repo root)
schema:
type: string
default: "docs/worklogs"
- name: api_key
description: API key for external service
sensitive: true
schema:
type: string
# Environment requirements
env:
required: []
optional:
- name: PROJECT
description: Project name for context
- name: API_TOKEN
description: Authentication token
sensitive: true
# Tools/commands that must be available
preconditions:
commands:
- cmd: git
min_version: "2.40"
- cmd: date
files:
- path: scripts/extract-metrics.sh
base: skill_root
description: Metrics extraction script
- path: templates/worklog-template.org
base: skill_root
description: Worklog template
# What the skill produces
outputs:
files:
- pattern: "{{output_dir}}/{{session_date}}-{{topic}}.org"
base: repo_root
description: The generated worklog file
artifacts: []
# Execution characteristics (hints for scheduling, not enforcement)
execution:
idempotent: false # Safe to re-run?
destructive: false # Modifies existing files?
network: false # Requires network access?
interactive: false # Requires user input during execution?
timeout: 30 # Maximum execution time in seconds
# For security classification
sensitive: false # Handles sensitive data?
---
```
### Field Definitions
#### `manifest_version`
Schema version for the manifest format itself. Allows parsers to handle breaking changes.
- Current version: `"1.0"`
- Semantic versioning: major version changes indicate breaking schema changes
#### `inputs`
Declares what the skill needs from the caller. Uses JSON Schema subset for type definitions.
- `required`: Must be provided or skill fails
- `optional`: Has defaults, can be overridden
- Each input has: `name`, `description`, `schema` (JSON Schema), optional `sensitive` flag
- `sensitive: true`: Mark inputs for redaction in traces
**JSON Schema Subset Supported:**
- `type`: string, number, integer, boolean, array, object
- `pattern`: regex for string validation
- `minimum`, `maximum`: numeric bounds
- `items`: array item schema
- `properties`: object property schemas
- `default`: default value
- `enum`: allowed values
#### `env`
Environment variables the skill reads.
- `required`: Skill fails if not set
- `optional`: Used if available, not fatal if missing
- `sensitive: true`: Mark for redaction
#### `preconditions`
What must exist before skill can run.
- `commands`: CLI tools that must be in PATH
- `cmd`: command name (required)
- `min_version`: minimum version (optional, e.g., "2.40")
- `max_version`: maximum version (optional)
- `files`: Files that must exist
- `path`: relative path (required)
- `base`: path base - `skill_root`, `repo_root`, or `cwd` (default: `skill_root`)
- `description`: what this file is for (optional)
**Path Resolution:**
- `skill_root`: Directory containing SKILL.md
- `repo_root`: Git repository root
- `cwd`: Current working directory
- Absolute paths (starting with `/` or `~`) are forbidden
#### `outputs`
What the skill produces.
- `files`: File patterns with `{{var}}` substitution
- `pattern`: file path pattern
- `base`: path base (default: `repo_root`)
- `description`: what this output is for
- `artifacts`: Other outputs (stdout, state changes, etc.)
#### `execution`
Characteristics for scheduling and safety. **These are hints, not enforcement.**
- `idempotent`: Can run multiple times safely
- `destructive`: Modifies or deletes existing data
- `network`: Requires internet access
- `interactive`: Needs human input during run
- `timeout`: Maximum execution time in seconds
The runtime may use these hints to:
- Schedule non-destructive skills in parallel
- Warn before running destructive operations
- Set execution timeouts
These flags do NOT enforce sandboxing or isolation.
#### `sensitive`
Whether skill handles sensitive data.
- `false` (default): Standard tracing, eligible for elevation
- `true`: Aggressive redaction, elevation blocked without review
### Validation Rules
When beads spawns a molecule with `skill:` reference:
1. Check skill exists
2. Verify `manifest_version` is supported
3. Validate all required inputs are mapped
4. Validate input schemas using JSON Schema
5. Check preconditions (commands with versions, files)
6. Check required env vars are set
7. Warn if optional inputs have no mapping
8. Apply redaction rules for `sensitive` fields
### Backward Compatibility
- All new fields are optional
- Skills with only `name`/`description` continue to work
- Missing manifest = no validation (current behavior)
- Missing `manifest_version` = assume legacy format
## Consequences
### Positive
- Clear contracts between molecules and skills
- Better error messages (fail before execution, not during)
- JSON Schema provides standard, well-understood type system
- Sensitive data handling built into manifest
- Version constraints prevent runtime errors
- Explicit path resolution prevents portability issues
### Negative
- More frontmatter to maintain
- Risk of manifest drift from actual behavior
- JSON Schema subset may not cover all validation needs
### Neutral
- Existing skills unaffected until they opt-in
- Execution flags are hints, not enforcement
## Examples
### Minimal (current, still valid)
```yaml
---
name: simple-skill
description: Does something simple
---
```
### With inputs only
```yaml
---
manifest_version: "1.0"
name: greeter
description: Greets the user
inputs:
required:
- name: username
description: Name of the user to greet
schema:
type: string
---
```
### With sensitive data
```yaml
---
manifest_version: "1.0"
name: deploy
description: Deploy application to production
inputs:
required:
- name: deploy_key
description: SSH deployment key
sensitive: true
schema:
type: string
env:
required:
- name: DEPLOY_TOKEN
description: Authentication token
sensitive: true
execution:
network: true
destructive: true
timeout: 300
sensitive: true
---
```
### With version constraints
```yaml
---
manifest_version: "1.0"
name: analyze-git
description: Analyze git repository history
preconditions:
commands:
- cmd: git
min_version: "2.40"
- cmd: python3
min_version: "3.11"
files:
- path: scripts/analyze.py
base: skill_root
inputs:
optional:
- name: since
description: Analyze commits since this date
schema:
type: string
pattern: "^\\d{4}-\\d{2}-\\d{2}$"
default: "2025-01-01"
outputs:
files:
- pattern: "reports/git-analysis-{{since}}.md"
base: repo_root
execution:
idempotent: true
timeout: 60
---
```
## Open Questions
1. Should `outputs` be validated after execution?
2. How to handle skills that produce variable outputs?
3. Should we allow custom JSON Schema extensions?
4. Where does skill version come from - git tag, manual, or derived?

View file

@ -0,0 +1,309 @@
# ADR-003: Skill Versioning Strategy
## Status
Draft (Revised)
## Context
Skills are deployed via Nix/direnv, which means:
- The "installed" version is a build artifact, not just source code
- Git SHA may not exist or match deployed content
- Skills can reference external scripts/binaries
- Protos and molecules need stable references
A single version identifier is insufficient. We need to answer:
1. How do we identify what version of a skill ran?
2. How do protos reference skills (pin vs float)?
3. How do we handle breaking changes?
## Decision
### Version Tuple
Every skill execution records a version tuple:
```yaml
skill_version:
# Primary identity - Nix store path (immutable, content-addressed)
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
# Source identity (where it came from)
source_ref: "git+file:///home/dan/proj/skills#worklog"
source_rev: "abc123def" # git SHA, null if not in git
# Content identity (what was actually deployed)
content_hash: "sha256:789xyz..." # hash of skill content per algorithm below
# Semantic version from manifest (optional)
version: "1.0.0"
# Deployment metadata
deployed_at: "2025-12-23T10:00:00Z"
```
#### Identity Selection by Context
| Context | Primary Identity | Rationale |
|---------|------------------|-----------|
| Nix-deployed skills | `nix_store_path` | Immutable, content-addressed by Nix |
| Development/local | `content_hash` | No Nix path available |
| Trace replay | `nix_store_path` or `content_hash` | Exact reproducibility |
| Proto pinning | `content_hash` or `version` | Portable across machines |
### Computing `content_hash`
Hash computation must be deterministic and portable:
```bash
#!/usr/bin/env bash
# skill-content-hash.sh <skill-dir>
set -euo pipefail
SKILL_DIR="${1:-.}"
SKILL_DIR="$(cd "$SKILL_DIR" && pwd)"
# Use .skillignore if present, otherwise default exclusions
if [[ -f "$SKILL_DIR/.skillignore" ]]; then
EXCLUDE_FILE="$SKILL_DIR/.skillignore"
else
EXCLUDE_FILE=""
fi
# Find files, convert to relative paths, sort, hash
(
cd "$SKILL_DIR"
find . -type f \
! -path './.git/*' \
! -path './.skillignore' \
! -name '*.pyc' \
! -name '.DS_Store' \
! -name '__pycache__' \
${EXCLUDE_FILE:+-not -path "$(cat "$EXCLUDE_FILE" | grep -v '^#' | tr '\n' '|' | sed 's/|$//')"} \
-print0 | \
sort -z | \
xargs -0 -I {} sh -c 'echo "{}"; cat "{}"' | \
sha256sum | \
cut -d' ' -f1
)
```
**Critical requirements:**
- Use relative paths (not absolute) for portability
- Include filename in hash stream (not just content)
- Sort files deterministically before hashing
- Exclude non-functional files via `.skillignore`
#### `.skillignore` Format
Skills can exclude files from content hash (like `.gitignore`):
```
# .skillignore - files excluded from content_hash
README.md
CHANGELOG.md
docs/
tests/
*.test.js
```
This allows documentation changes without invalidating version pins.
### Proto Reference Modes
#### 1. Float (default, development)
```yaml
skill: worklog
```
Uses whatever version is currently deployed. Simple but unstable.
#### 2. Pin to content hash (CI/automation)
```yaml
skill:
id: worklog
content_hash: "sha256:789xyz..."
```
Fails if deployed skill doesn't match. Most stable for automation.
#### 3. Pin to minimum version (published templates)
```yaml
skill:
id: worklog
min_version: "1.0.0"
```
Requires skill manifest to declare `version` field with semantic versioning.
### Lockfile Workflow
For reproducible proto execution, use `proto.lock`:
```yaml
# my-proto.lock
# Auto-generated - do not edit manually
# Regenerate with: bd proto lock my-proto
generated_at: "2025-12-23T10:00:00Z"
beads_version: "0.35.0"
skills:
worklog:
content_hash: "sha256:789xyz..."
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
version: "1.0.0"
source_rev: "abc123def"
deploy:
content_hash: "sha256:456abc..."
nix_store_path: "/nix/store/def456-deploy-2.1.0"
version: "2.1.0"
source_rev: "def456ghi"
```
**Workflow:**
```bash
# Development: float freely
bd mol spawn my-proto
# CI/production: lock versions
bd proto lock my-proto # Generate/update lockfile
bd mol spawn my-proto --locked # Fail if versions don't match lock
```
Lockfile should be committed to version control for reproducible builds.
### Breaking Change Handling
#### Interface Contracts
For semantic versioning to be meaningful, skills should declare their interface contract:
```yaml
# In SKILL.md manifest
interface:
inputs:
- session_date # Required inputs are part of contract
- topic # Optional inputs with defaults
outputs:
- pattern: "docs/worklogs/*.org"
env:
- PROJECT # Required env vars
```
**Breaking changes** (bump major version):
- Renamed/removed required inputs
- Changed required input types
- Changed output patterns
- Added new required inputs without defaults
- Removed required env vars
**Non-breaking changes** (bump minor/patch):
- Added optional inputs with defaults
- Documentation changes
- Bug fixes
- Performance improvements
#### Version Validation
```bash
# When spawning a proto with pinned skill
bd mol spawn my-proto --var x=1
# → Validates skill content_hash or version matches pin
# → Fails early if mismatch
# Check for breaking changes
bd skill check-compat worklog@1.0.0 worklog@2.0.0
# → Reports interface differences
```
### Path Sanitization
Traces should sanitize paths to avoid leaking local structure:
```yaml
# Before sanitization
skill_version:
source_ref: "git+file:///home/dan/proj/skills#worklog"
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
# After sanitization (for sharing/elevation)
skill_version:
source_ref: "git+file://LOCAL/skills#worklog"
nix_store_path: "/nix/store/abc123-worklog-1.0.0" # Already safe
```
Sanitization patterns:
- `/home/<username>/``LOCAL/`
- `/Users/<username>/``LOCAL/`
- Nix store paths are already content-addressed and safe
### Recording in Traces
Wisp traces always record the full version tuple:
```yaml
execution:
skill_version:
nix_store_path: "/nix/store/abc123-worklog-1.0.0"
source_ref: "git+file://LOCAL/skills#worklog" # Sanitized
source_rev: "abc123def"
content_hash: "sha256:789xyz..."
version: "1.0.0"
```
This enables:
- Replay with exact version
- Diff between executions
- Debugging "it worked before" issues
- Portable sharing (sanitized paths)
### Recommendations
| Use Case | Mode | Identity | Why |
|----------|------|----------|-----|
| Active development | Float | N/A | Iterate quickly |
| Local testing | Float or pin | `content_hash` | Reproducible locally |
| Shared proto | Pin + lock | `content_hash` | Portable across machines |
| Published template | Pin to version | `min_version` | Semantic compatibility |
| CI/automation | Locked | `content_hash` | Exact reproducibility |
## Consequences
### Positive
- Full traceability of what ran
- Reproducible executions via lockfile
- Clear failure when version mismatch
- Supports gradual adoption (float first, pin later)
- Portable hashing (relative paths)
- Interface contracts enable meaningful SemVer
### Negative
- Content hash computation adds overhead
- Pinned protos need updates when skills change
- More fields to manage
- Lockfile adds another file to maintain
### Neutral
- Float mode preserves current behavior
- Version tuple is metadata, not enforcement
- Nix store path available only in Nix-deployed environments
## Implementation Checklist
- [ ] Implement deterministic content hash script
- [ ] Add `.skillignore` support to hash computation
- [ ] Add `nix_store_path` capture for Nix-deployed skills
- [ ] Implement `bd proto lock` command
- [ ] Implement `bd mol spawn --locked` validation
- [ ] Add path sanitization to trace writer
- [ ] Add interface contract validation
- [ ] Implement `bd skill check-compat` command
## Open Questions
1. Should lockfile include transitive dependencies (skills that call other skills)?
2. How to handle skills that shell out to system binaries (git, curl)? Version those too?
3. Cache content_hash or compute on every invocation?
4. Should we support nix flake references directly? (e.g., `github:user/skills#worklog`)

View file

@ -0,0 +1,394 @@
# ADR-004: Trace Security and Redaction Policy
## Status
Draft (Revised)
## Context
Skill execution traces (wisps) capture:
- Environment variables
- Input arguments
- Tool calls with arguments
- File paths and contents
- Stdout/stderr
This data often contains secrets:
- API keys (AWS, GitHub, OpenAI)
- Tokens and passwords
- PII (usernames, emails, paths)
- Proprietary data
Wisps are gitignored but still risky:
- Local machine compromise
- Accidental sharing
- Squashing into public digests
- Elevation to published skills
## Decision
### Default-Deny Policy
Traces capture minimal information by default. Sensitive data requires explicit opt-in.
### HMAC-Based Redaction
Instead of plain `[REDACTED]`, use HMAC hashing to enable correlation without revealing values:
```yaml
# Format: [REDACTED:hmac:<first-8-chars-of-hmac>]
inputs:
api_token: "[REDACTED:hmac:a1b2c3d4]"
other_token: "[REDACTED:hmac:a1b2c3d4]" # Same value = same hash
different_token: "[REDACTED:hmac:e5f6g7h8]" # Different value
```
**Benefits:**
- Can detect if same secret was used across executions
- Can correlate inputs without knowing values
- HMAC key is per-session, not stored in trace
**Implementation:**
```python
import hmac
import hashlib
def redact_sensitive(value: str, session_key: bytes) -> str:
"""Redact value with HMAC for correlation."""
h = hmac.new(session_key, value.encode(), hashlib.sha256)
return f"[REDACTED:hmac:{h.hexdigest()[:8]}]"
```
### Environment Variables
**Default**: Only capture from allowlist.
```yaml
trace_env_allowlist:
- USER
- HOME
- PROJECT
- PWD
- SHELL
- TERM
- LANG
- TZ
```
**Never capture** (hardcoded denylist with glob patterns):
```yaml
trace_env_denylist:
- "*_KEY"
- "*_SECRET"
- "*_TOKEN"
- "*_PASSWORD"
- "*_CREDENTIAL*"
- "AWS_*"
- "GITHUB_TOKEN"
- "OPENAI_API_KEY"
- "ANTHROPIC_API_KEY"
```
### Input Arguments
**Default**: Inputs are NOT captured unless explicitly marked safe.
Skills must opt-in to capture inputs in manifest:
```yaml
inputs:
required:
- name: api_token
type: string
sensitive: true # Default, will be HMAC-redacted
- name: project_name
type: string
sensitive: false # Explicitly safe to capture
```
**Rationale**: Safer to miss debugging data than leak secrets. Most inputs can be reconstructed from context.
Trace output:
```yaml
inputs:
api_token: "[REDACTED:hmac:a1b2c3d4]"
project: "my-project" # Only captured because sensitive: false
```
### Tool Calls
**Default**: Capture command name and exit code. Parse arguments structurally.
#### Structured Argument Parsing
Instead of regex on raw strings, parse arguments properly:
```yaml
tool_calls:
- cmd: "curl"
parsed_args:
url: "https://api.example.com/endpoint"
headers:
- name: "Authorization"
value: "[REDACTED:hmac:b2c3d4e5]"
- name: "Content-Type"
value: "application/json"
method: "POST"
exit_code: 0
duration_ms: 1234
```
For commands we don't have parsers for, fall back to pattern redaction:
```yaml
tool_calls:
- cmd: "unknown-tool"
raw_args: "--token [REDACTED:hmac:c3d4e5f6] --output file.txt"
exit_code: 0
```
#### Known Command Parsers
Implement argument parsers for common commands:
- `curl`: Parse `-H`, `--header`, `-u`, `--user`, `-d`, `--data`
- `git`: Parse credentials in URLs, `-c` config values
- `aws`: Parse `--profile`, environment-based auth
- `docker`: Parse `-e`, `--env`, registry auth
#### Fallback Redaction Patterns
For unparsed commands, apply pattern matching:
```
Bearer [^\s]+ → Bearer [REDACTED:hmac:...]
token=[^\s&]+ → token=[REDACTED:hmac:...]
password=[^\s&]+ → password=[REDACTED:hmac:...]
-p [^\s]+ → -p [REDACTED:hmac:...]
--password[= ][^\s]+ → --password [REDACTED:hmac:...]
```
### Entropy Detection
Catch secrets that slip through pattern matching:
```python
import math
from collections import Counter
def entropy(s: str) -> float:
"""Calculate Shannon entropy of string."""
if not s:
return 0
counts = Counter(s)
probs = [c / len(s) for c in counts.values()]
return -sum(p * math.log2(p) for p in probs)
def looks_like_secret(value: str) -> bool:
"""Heuristic: high entropy + sufficient length = probably secret."""
if len(value) < 16:
return False
if entropy(value) > 4.5: # Random strings typically > 4.5
return True
return False
```
Apply entropy detection to:
- Unrecognized command arguments
- Environment variable values (before allowlist check)
- Input values marked `sensitive: false` (as safety check)
### Stdin
**Never capture stdin.** Sensitive data is often piped:
- Passwords via `echo $PASS | command`
- API responses with tokens
- File contents with secrets
```yaml
tool_calls:
- cmd: "some-command"
stdin: "[NOT_CAPTURED]" # Always this value
exit_code: 0
```
### File Contents
**Default**: Never capture file contents.
Only capture:
- File path (sanitized per ADR-003)
- File size
- Content hash (sha256)
- Action (created/modified/deleted)
```yaml
outputs:
artifacts:
- path: "docs/worklogs/2025-12-23.org"
size: 2048
sha256: "abc123..."
action: created
# content: NOT CAPTURED
```
### Stdout/Stderr
**Default**: Not captured.
**Opt-in**: Skill can enable with automatic redaction:
```yaml
execution:
capture_output: true
output_max_lines: 100
```
Output is run through:
1. Pattern redaction
2. Entropy detection
3. HMAC replacement
Before storage.
### Trace Modes
Support different capture levels for different contexts:
| Mode | Use Case | Capture Level |
|------|----------|---------------|
| `local` | Debugging on your machine | More permissive, still redacts secrets |
| `export` | Sharing with others | Aggressive redaction, path sanitization |
| `elevation` | Promoting to skill | Maximum redaction, human review required |
```yaml
# In trace metadata
trace:
mode: local
redaction_version: "1.0"
session_key_id: "abc123" # For HMAC correlation within session
```
**Mode transitions:**
```bash
# Local trace (default)
bd wisp show <id>
# Export for sharing
bd wisp export <id> --mode=export > trace.yaml
# Prepare for elevation
bd wisp export <id> --mode=elevation > trace.yaml
# → Requires manual review before elevation proceeds
```
### Classification Levels
Skills declare classification in manifest:
| Level | Description | Trace Policy |
|-------|-------------|--------------|
| `public` | Safe to share externally | Standard redaction, can elevate |
| `internal` | Normal internal use | Standard redaction, elevation requires review |
| `secret` | Contains sensitive data | Maximum redaction, elevation blocked |
```yaml
# In SKILL.md frontmatter
classification: internal
```
**Behavior by classification**:
- `public`: Standard tracing, eligible for elevation
- `internal`: Standard tracing, elevation requires `--force` and review
- `secret`:
- All inputs treated as sensitive
- Env vars: Only allowlist
- Tool args: Maximum redaction + entropy detection
- Elevation: Blocked entirely
### Elevation Gate
When elevating a molecule to a skill:
1. Check skill classification
2. If `secret`: Block with error
3. If `internal`: Warn, require `--force`, show redaction summary
4. If `public`: Proceed with standard review
```bash
$ bd elevate mol-123
Error: Molecule used skill with classification=secret
Cannot elevate without manual review
$ bd elevate mol-456
Warning: Molecule used internal skill.
Redacted fields: api_token, auth_header, 3 env vars
Review trace for sensitive data.
Use --force to proceed.
$ bd elevate mol-456 --force
Elevated to skill draft: skills/new-skill/
Please review before publishing.
```
### Configuration
Users can extend allowlist/denylist in `.beads/config.yaml`:
```yaml
trace:
mode: local # default mode
env_allowlist:
- MY_SAFE_VAR
env_denylist:
- MY_SECRET_*
redact_patterns:
- "my-api-key-[a-z0-9]+"
entropy_threshold: 4.5 # Adjust sensitivity
```
## Consequences
### Positive
- Secrets don't leak into traces by default
- HMAC enables correlation without revealing values
- Entropy detection catches novel secret patterns
- Structured parsing more reliable than regex
- Clear mode separation for different contexts
- Defense in depth (patterns + entropy + opt-in)
### Negative
- Less data available for debugging (especially in export mode)
- HMAC adds computational overhead
- Entropy detection may have false positives
- Structured parsers need maintenance per command
- Configuration complexity
### Neutral
- Existing wisps unaffected (new policy applies going forward)
- Trade-off between utility and safety favors safety
- Local mode still provides reasonable debugging data
## Implementation Checklist
- [ ] Implement HMAC redaction with session keys
- [ ] Implement env var filtering with allowlist/denylist
- [ ] Add `sensitive` field support to manifest parser (default true)
- [ ] Build structured argument parsers for curl, git, aws, docker
- [ ] Implement fallback pattern redaction
- [ ] Implement entropy detection
- [ ] Add stdin never-capture enforcement
- [ ] Implement trace modes (local/export/elevation)
- [ ] Add classification field to manifest
- [ ] Implement elevation gate with redaction summary
- [ ] Add config.yaml trace section support
- [ ] Document patterns, allowlists, and entropy thresholds
## Open Questions
1. Should HMAC keys be derivable from trace metadata for authorized replay?
2. How to handle secrets in multi-line values (JSON blobs, certificates)?
3. Should we offer a "paranoid mode" that captures nothing but exit codes?
4. How to detect and handle base64-encoded secrets?