dan/skills

dan 90e72f1095 fix(use-skills): prevent stderr from corrupting symlink targets

Remove 2>&1 from nix build capture. When repo is dirty, nix emits
warnings to stderr which were being merged into $out and used as
symlink targets, creating broken symlinks like:

  orch -> warning: Git tree '...' is dirty\n/nix/store/...

Now stderr goes to terminal, only stdout (store path) captured.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-14 12:42:26 -08:00

7.2 KiB

Raw Blame History

use-skills.sh Symlink Corruption Bug Fix and Orch Invocation Analysis

Session Summary
- Date: 2025-12-14 (Sunday)
- Focus Area: Diagnosing and fixing skill loading failures in per-repo deployment
Accomplishments
Key Decisions
Problems & Solutions
Technical Details
Process and Workflow
- What Worked Well
- What Was Challenging
Learning and Insights
Context for Future Work
Raw Notes
Session Metrics

Session Summary

Date: 2025-12-14 (Sunday)

Focus Area: Diagnosing and fixing skill loading failures in per-repo deployment

Accomplishments

Analyzed talu project's skill configuration status
Diagnosed why orch skill failed to work in talu session (from user-provided transcript)
Root-caused the symlink corruption bug in use-skills.sh
Fixed the stderr capture issue that corrupted symlink targets
Verified fix by reloading skills in talu - symlinks now correct
Filed skills-fvx (P1 bug) for symlink corruption - closed with fix
Filed skills-d87 (P2 bug) for orch invocation mechanism
Filed dotfiles-3to (P2 task) for adding orch to home-manager
Ran orch consensus to evaluate orch deployment options
Commit the use-skills.sh fix (pending)

Key Decisions

Decision 1: Fix stderr capture by removing 2>&1

Context: nix build emits warnings to stderr when repo is dirty
Options considered:
1. Remove 2>&1 entirely - let stderr go to terminal, capture only stdout
2. Filter out warning lines with grep
3. Redirect stderr to temp file, only show on failure
Rationale: Option 1 is simplest and correct - nix already shows errors on failure, no need to capture and re-echo
Impact: Symlinks now contain only the store path, not warning text

Decision 2: orch belongs in home-manager, not bundled in skill

Context: orch skill provides documentation but CLI isn't in PATH
Options considered (via orch consensus with gemini + gpt):
1. Wrapper script in skill - self-contained but hardcoded path
2. Global install via home-manager - system tool approach
3. Per-project direnv PATH - repetitive, fragile for agents
4. (Gemini suggestion) Build CLI from source in skill package
Rationale: orch is a general-purpose system tool (like git, rg), not a project-specific dependency. System tools belong in home-manager.
Impact: Cross-repo coordination needed - skills repo documents, dotfiles repo installs

Decision 3: Cross-repo dependencies noted in description, not formal

Context: skills-d87 is blocked by dotfiles-3to, but bd dep doesn't support cross-repo
Options: Hard dep, soft/formal dep, or text note
Rationale: Text note is sufficient for human readers, no tooling benefit from more formal tracking
Impact: skills-d87 description mentions "Blocked by: dotfiles-3to"

Problems & Solutions

Problem	Solution	Learning
Symlinks contained "warning: Git tree is dirty" in target path	Remove 2>&1 from nix build capture - let stderr go to terminal	Shell command substitution captures all stdout, including merged stderr
orch command not found when agent tried to use skill	Skill documents tool but doesn't provide it - need global install	Skills can be documentation-only for system tools
Can't create formal cross-repo dependency	Note in issue description	bd beads is per-repo; cross-repo tracking is manual

Technical Details

Code Changes

Total files modified: 1 (bin/use-skills.sh)
Key files changed:
- bin/use-skills.sh - Removed 2>&1 from nix build command

The Bug

Original code:

out=$(nix build --print-out-paths --no-link "${SKILLS_REPO}#${skill}" 2>&1) || {
    echo "use_skill: failed to build ${skill}" >&2
    echo "$out" >&2
    return 1
}

When repo is dirty, nix emits warning to stderr. The 2>&1 merges stderr into stdout, so $out becomes:

warning: Git tree '/home/dan/proj/skills' is dirty
/nix/store/j952hgxixifscafb42vmw9vgdphi1djs-ai-skill-orch

This multiline string with warning becomes the symlink target - completely broken.

The Fix

out=$(nix build --print-out-paths --no-link "${SKILLS_REPO}#${skill}") || {
    echo "use_skill: failed to build ${skill}" >&2
    return 1
}

Now stderr goes to terminal (where warnings belong), stdout captured cleanly.

Commands Used

# Verify talu's skill setup
cat ~/proj/talu/.envrc
cat ~/proj/talu/.skills
ls -la ~/proj/talu/.claude/skills/

# Diagnose broken symlinks
readlink -f ~/proj/talu/.claude/skills/orch  # showed "symlink broken"

# Test the fix
cd ~/proj/talu && rm -rf .claude/skills .opencode/skills
source ~/proj/skills/bin/use-skills.sh && load_skills_from_manifest
ls -la .claude/skills/  # now shows clean paths

# orch consensus for design decision
cd ~/proj/orch && uv run orch consensus "..." gemini gpt --mode vote

Architecture Notes

Skills system has two layers: skill packages (nix) and skill loading (direnv/bash)
Skills can be documentation-only (assume tool exists) or bundled (include tool)
System tools (git, rg, orch) should be globally installed, not per-skill
Per-repo skill deployment via .skills manifest + direnv

Process and Workflow

What Worked Well

User provided exact transcript of failure - made diagnosis quick
orch consensus gave useful opposing viewpoints on design decision
Cross-repo issue filing maintained traceability

What Was Challenging

Shell stderr/stdout behavior is easy to get wrong
Cross-repo dependencies have no formal tooling support

Learning and Insights

Technical Insights

$(cmd 2>&1) is dangerous when you only want stdout - stderr gets mixed in
nix build warnings go to stderr even on success
Symlinks happily accept multiline strings as targets (they just won't resolve)

Process Insights

When skill invocation fails, check: (1) symlink validity, (2) skill.md readability, (3) actual tool availability
orch consensus is useful for getting opposing viewpoints on design decisions

Architectural Insights

Distinction between "system tools" and "project tools" helps decide where to install
Skills documenting system tools don't need to bundle them - just assume they exist
Cross-repo coordination is a reality; text notes in descriptions are pragmatic

Context for Future Work

Open Questions

Should skills have a way to declare system tool dependencies?
Would a "skill doctor" tool help diagnose skill loading issues?

Next Steps

Commit use-skills.sh fix
dotfiles team implements dotfiles-3to (add orch to home-manager)
Then skills-d87 can be closed

Related Work

Previous: Per-Repo Skill Deployment Design
Cross-repo: dotfiles-3to (Add orch CLI to home-manager packages)

Raw Notes

The failure cascade: dirty repo → nix warning → stderr merged → symlink corrupted → skill.md unreadable → agent doesn't know invocation → bare command fails → hunt for workaround
User asked "how much do we want to mix nix and agentic dev tooling" - good architectural tension to keep in mind
Gemini suggested the "proper" fix (build tool in skill), GPT suggested pragmatic fix (global install) - ended up with pragmatic

Session Metrics

Commits made: 0 (fix uncommitted)
Files touched: 2 (bin/use-skills.sh, .beads/issues.jsonl)
Lines added/removed: +3/-2
Issues created: 2 (skills-fvx closed, skills-d87 open)
Cross-repo issues: 1 (dotfiles-3to)

7.2 KiB Raw Blame History