dan/skills

dan 5fea49b7c0 feat(tufte-press): evolve skill to complete workflow with JSON generation and build automation

- Transform tufte-press from reference guide to conversation-aware generator
- Add JSON generation from conversation context following strict schema
- Create build automation scripts with Nix environment handling
- Integrate CUPS printing with duplex support
- Add comprehensive workflow documentation

Scripts added:
- skills/tufte-press/scripts/generate-and-build.sh (242 lines)
- skills/tufte-press/scripts/build-card.sh (23 lines)

Documentation:
- Updated SKILL.md with complete workflow instructions (370 lines)
- Updated README.md with usage examples (340 lines)
- Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines)
- Added worklog: 2025-11-10-tufte-press-skill-evolution.org

Features:
- Agent generates valid JSON from conversation
- Schema validation before build (catches errors early)
- Automatic Nix shell entry for dependencies
- PDF build via tufte-press toolchain
- Optional print with duplex support
- Self-contained margin notes enforced
- Complete end-to-end testing

Workflow: Conversation → JSON → Validate → Build → Print

Related: niri-window-capture, screenshot-latest, worklog skills

2025-11-10 15:03:44 -08:00

6 KiB

Raw Blame History

Future Enhancement: Direct Screen Capture

Discovery

During implementation, we discovered that grim (the Wayland screenshot tool) can output directly to stdout:

grim - | file -
# Output: /dev/stdin: PNG image data, 174 x 174, 8-bit/color RGBA, non-interlaced

This opens up the possibility of skipping file-based screenshots entirely.

Current Workflow

User action:

Mod4+S → select region → space
Screenshot saved to ~/Pictures/Screenshots/Screenshot-YYYY-MM-DD-HH-MM-SS.png
Tell AI: "look at my screenshot"
AI runs: ls -t ~/Pictures/Screenshots/*.png | head -1
AI reads file and analyzes

Latency: 2-5 seconds (file I/O, directory scanning)

Proposed Direct Capture Workflow

User action:

Tell AI: "show me what's on my screen"
AI runs: grim - | <inject into context>
AI analyzes without file intermediary

Latency: <1 second (no file I/O)

Technical Questions (Unanswered)

Can AI read from stdin?

grim - | base64 | <how does AI ingest this?>

Unknown: Does OpenCode/Claude Code support image injection from stdin/base64?

Can AI read from clipboard?

grim - | wl-copy
# AI reads from clipboard with wl-paste?

Unknown: Does OpenCode/Claude Code have clipboard access?

Can we capture specific windows?

niri compositor provides:

niri msg focused-window    # Get focused window info
niri msg windows           # List all windows
niri msg pick-window       # Mouse selection

grim supports regions:

grim -g "x,y widthxheight" -    # Capture specific region

Possibility:

Get window geometry from niri
Capture that specific region with grim
Inject directly without saving

Implementation Options

Option A: Clipboard-Based (Easiest to Test)

#!/usr/bin/env bash
# skills/screenshot-capture/scripts/capture-screen.sh

# Capture entire screen to clipboard
grim - | wl-copy

# Tell AI it's in clipboard
echo "Screen captured to clipboard. Use wl-paste to read."

Pros:

Simple integration
Works with existing clipboard tools
No file cleanup needed

Cons:

Requires AI to support clipboard reading
Unclear if OpenCode/Claude Code can do this

Option B: Temp File (Current Approach)

#!/usr/bin/env bash
# What we currently do (implicitly)

TEMP_FILE="/tmp/screen-capture-$(date +%s).png"
grim "$TEMP_FILE"
echo "$TEMP_FILE"

# AI reads file, analyzes, could delete after

Pros:

Works with current AI image capabilities
Proven approach

Cons:

File I/O overhead
Temp file cleanup required
Not as elegant

Option C: Base64 Stdin (Most Direct)

#!/usr/bin/env bash
# Hypothetical direct injection

grim - | base64 | ai-inject-image --format png --encoding base64

Pros:

No files at all
Minimal latency
Clean architecture

Cons:

Requires AI tool support for stdin images
Completely unknown if possible

Next Steps to Validate

Test clipboard reading:

grim - | wl-copy
# In OpenCode: "What's in the clipboard?"
# Does it understand it's an image?

Test temp file with auto-cleanup:

TEMP=$(mktemp --suffix=.png)
trap "rm -f $TEMP" EXIT
grim "$TEMP"
# AI analyzes
# File auto-deleted on exit

Research AI tool capabilities:
- Check OpenCode documentation for image input methods
- Check Claude Code documentation for image input methods
- Test if base64-encoded images can be injected

Test region capture:

# Get focused window geometry
niri msg focused-window -j | jq -r '.geometry'

# Capture just that region
grim -g "$GEOMETRY" -

User Experience Comparison

Current (File-Based)

User: "Look at my last screenshot"
AI: <finds file in ~/Pictures/Screenshots>
AI: <reads file>
AI: "I see a terminal window with..."
Time: 2-5 seconds

Proposed (Direct Capture)

User: "Show me what's on screen"
AI: <captures directly with grim>
AI: "I see a terminal window with..."
Time: <1 second

Advanced (Region Aware)

User: "What's in the focused window?"
AI: <gets geometry from niri>
AI: <captures that region only>
AI: "The focused window shows..."
Time: <1 second

Decision: Why We Didn't Implement This Now

Unknown AI Capabilities: Don't know if OpenCode/Claude Code support non-file image input
Unvalidated Workflow: Current file-based approach is proven to work
User Request: User asked for "find my screenshots", not "capture my screen"
YAGNI: Would be premature optimization without user feedback

Current implementation solves the stated problem. This enhancement is for IF users say:

"This is too slow"
"I want to capture what's on screen now, not find old files"
"Can you see my current window?"

Recommendation

Ship the file-based solution first (screenshot-latest skill).

After real usage, if users want:

Real-time screen capture → Investigate direct capture
Region selection → Integrate niri window geometry
Clipboard workflow → Test clipboard-based approach

Don't build it until users ask for it.

Technical Notes

grim capabilities verified:

✅ Can output to stdout (grim -)
✅ Outputs valid PNG format
✅ Supports region capture (-g "x,y WxH")
✅ Works with Wayland compositors (niri confirmed)

niri capabilities verified:

✅ Can query window geometry (niri msg windows -j)
✅ Can get focused window (niri msg focused-window)
✅ Supports JSON output for parsing

Unknown capabilities:

❓ Can OpenCode/Claude Code read from clipboard?
❓ Can OpenCode/Claude Code accept base64 image data?
❓ Can OpenCode/Claude Code accept stdin image data?
❓ What's the actual latency difference in real usage?

References

man grim - Screenshot tool documentation
niri msg --help - Compositor IPC commands
man wl-clipboard - Wayland clipboard utilities

This document describes potential enhancements, not current implementation. The current screenshot-latest skill uses file-based approach intentionally.

6 KiB Raw Blame History

Future Enhancement: Direct Screen Capture

Discovery

Current Workflow

Proposed Direct Capture Workflow

Technical Questions (Unanswered)

Can AI read from stdin?

Can AI read from clipboard?

Can we capture specific windows?

Implementation Options

Option A: Clipboard-Based (Easiest to Test)

Option B: Temp File (Current Approach)

Option C: Base64 Stdin (Most Direct)

Next Steps to Validate

User Experience Comparison

Current (File-Based)

Proposed (Direct Capture)

Advanced (Region Aware)

Decision: Why We Didn't Implement This Now

Recommendation

Technical Notes

References

6 KiB

Raw Blame History