skills/docs/worklogs/2025-12-14-use-skills-symlink-corruption-fix.org
dan 90e72f1095 fix(use-skills): prevent stderr from corrupting symlink targets
Remove 2>&1 from nix build capture. When repo is dirty, nix emits
warnings to stderr which were being merged into $out and used as
symlink targets, creating broken symlinks like:

  orch -> warning: Git tree '...' is dirty\n/nix/store/...

Now stderr goes to terminal, only stdout (store path) captured.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 12:42:26 -08:00

7.2 KiB

use-skills.sh Symlink Corruption Bug Fix and Orch Invocation Analysis

Session Summary

Date: 2025-12-14 (Sunday)

Focus Area: Diagnosing and fixing skill loading failures in per-repo deployment

Accomplishments

  • Analyzed talu project's skill configuration status
  • Diagnosed why orch skill failed to work in talu session (from user-provided transcript)
  • Root-caused the symlink corruption bug in use-skills.sh
  • Fixed the stderr capture issue that corrupted symlink targets
  • Verified fix by reloading skills in talu - symlinks now correct
  • Filed skills-fvx (P1 bug) for symlink corruption - closed with fix
  • Filed skills-d87 (P2 bug) for orch invocation mechanism
  • Filed dotfiles-3to (P2 task) for adding orch to home-manager
  • Ran orch consensus to evaluate orch deployment options
  • Commit the use-skills.sh fix (pending)

Key Decisions

Decision 1: Fix stderr capture by removing 2>&1

  • Context: nix build emits warnings to stderr when repo is dirty
  • Options considered:

    1. Remove 2>&1 entirely - let stderr go to terminal, capture only stdout
    2. Filter out warning lines with grep
    3. Redirect stderr to temp file, only show on failure
  • Rationale: Option 1 is simplest and correct - nix already shows errors on failure, no need to capture and re-echo
  • Impact: Symlinks now contain only the store path, not warning text

Decision 2: orch belongs in home-manager, not bundled in skill

  • Context: orch skill provides documentation but CLI isn't in PATH
  • Options considered (via orch consensus with gemini + gpt):

    1. Wrapper script in skill - self-contained but hardcoded path
    2. Global install via home-manager - system tool approach
    3. Per-project direnv PATH - repetitive, fragile for agents
    4. (Gemini suggestion) Build CLI from source in skill package
  • Rationale: orch is a general-purpose system tool (like git, rg), not a project-specific dependency. System tools belong in home-manager.
  • Impact: Cross-repo coordination needed - skills repo documents, dotfiles repo installs

Decision 3: Cross-repo dependencies noted in description, not formal

  • Context: skills-d87 is blocked by dotfiles-3to, but bd dep doesn't support cross-repo
  • Options: Hard dep, soft/formal dep, or text note
  • Rationale: Text note is sufficient for human readers, no tooling benefit from more formal tracking
  • Impact: skills-d87 description mentions "Blocked by: dotfiles-3to"

Problems & Solutions

Problem Solution Learning
Symlinks contained "warning: Git tree is dirty" in target path Remove 2>&1 from nix build capture - let stderr go to terminal Shell command substitution captures all stdout, including merged stderr
orch command not found when agent tried to use skill Skill documents tool but doesn't provide it - need global install Skills can be documentation-only for system tools
Can't create formal cross-repo dependency Note in issue description bd beads is per-repo; cross-repo tracking is manual

Technical Details

Code Changes

  • Total files modified: 1 (bin/use-skills.sh)
  • Key files changed:

    • bin/use-skills.sh - Removed 2>&1 from nix build command

The Bug

Original code:

out=$(nix build --print-out-paths --no-link "${SKILLS_REPO}#${skill}" 2>&1) || {
    echo "use_skill: failed to build ${skill}" >&2
    echo "$out" >&2
    return 1
}

When repo is dirty, nix emits warning to stderr. The 2>&1 merges stderr into stdout, so $out becomes:

warning: Git tree '/home/dan/proj/skills' is dirty
/nix/store/j952hgxixifscafb42vmw9vgdphi1djs-ai-skill-orch

This multiline string with warning becomes the symlink target - completely broken.

The Fix

out=$(nix build --print-out-paths --no-link "${SKILLS_REPO}#${skill}") || {
    echo "use_skill: failed to build ${skill}" >&2
    return 1
}

Now stderr goes to terminal (where warnings belong), stdout captured cleanly.

Commands Used

# Verify talu's skill setup
cat ~/proj/talu/.envrc
cat ~/proj/talu/.skills
ls -la ~/proj/talu/.claude/skills/

# Diagnose broken symlinks
readlink -f ~/proj/talu/.claude/skills/orch  # showed "symlink broken"

# Test the fix
cd ~/proj/talu && rm -rf .claude/skills .opencode/skills
source ~/proj/skills/bin/use-skills.sh && load_skills_from_manifest
ls -la .claude/skills/  # now shows clean paths

# orch consensus for design decision
cd ~/proj/orch && uv run orch consensus "..." gemini gpt --mode vote

Architecture Notes

  • Skills system has two layers: skill packages (nix) and skill loading (direnv/bash)
  • Skills can be documentation-only (assume tool exists) or bundled (include tool)
  • System tools (git, rg, orch) should be globally installed, not per-skill
  • Per-repo skill deployment via .skills manifest + direnv

Process and Workflow

What Worked Well

  • User provided exact transcript of failure - made diagnosis quick
  • orch consensus gave useful opposing viewpoints on design decision
  • Cross-repo issue filing maintained traceability

What Was Challenging

  • Shell stderr/stdout behavior is easy to get wrong
  • Cross-repo dependencies have no formal tooling support

Learning and Insights

Technical Insights

  • $(cmd 2>&1) is dangerous when you only want stdout - stderr gets mixed in
  • nix build warnings go to stderr even on success
  • Symlinks happily accept multiline strings as targets (they just won't resolve)

Process Insights

  • When skill invocation fails, check: (1) symlink validity, (2) skill.md readability, (3) actual tool availability
  • orch consensus is useful for getting opposing viewpoints on design decisions

Architectural Insights

  • Distinction between "system tools" and "project tools" helps decide where to install
  • Skills documenting system tools don't need to bundle them - just assume they exist
  • Cross-repo coordination is a reality; text notes in descriptions are pragmatic

Context for Future Work

Open Questions

  • Should skills have a way to declare system tool dependencies?
  • Would a "skill doctor" tool help diagnose skill loading issues?

Next Steps

  • Commit use-skills.sh fix
  • dotfiles team implements dotfiles-3to (add orch to home-manager)
  • Then skills-d87 can be closed

Related Work

Raw Notes

  • The failure cascade: dirty repo → nix warning → stderr merged → symlink corrupted → skill.md unreadable → agent doesn't know invocation → bare command fails → hunt for workaround
  • User asked "how much do we want to mix nix and agentic dev tooling" - good architectural tension to keep in mind
  • Gemini suggested the "proper" fix (build tool in skill), GPT suggested pragmatic fix (global install) - ended up with pragmatic

Session Metrics

  • Commits made: 0 (fix uncommitted)
  • Files touched: 2 (bin/use-skills.sh, .beads/issues.jsonl)
  • Lines added/removed: +3/-2
  • Issues created: 2 (skills-fvx closed, skills-d87 open)
  • Cross-repo issues: 1 (dotfiles-3to)