- Transform tufte-press from reference guide to conversation-aware generator - Add JSON generation from conversation context following strict schema - Create build automation scripts with Nix environment handling - Integrate CUPS printing with duplex support - Add comprehensive workflow documentation Scripts added: - skills/tufte-press/scripts/generate-and-build.sh (242 lines) - skills/tufte-press/scripts/build-card.sh (23 lines) Documentation: - Updated SKILL.md with complete workflow instructions (370 lines) - Updated README.md with usage examples (340 lines) - Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines) - Added worklog: 2025-11-10-tufte-press-skill-evolution.org Features: - Agent generates valid JSON from conversation - Schema validation before build (catches errors early) - Automatic Nix shell entry for dependencies - PDF build via tufte-press toolchain - Optional print with duplex support - Self-contained margin notes enforced - Complete end-to-end testing Workflow: Conversation → JSON → Validate → Build → Print Related: niri-window-capture, screenshot-latest, worklog skills
6 KiB
Future Enhancement: Direct Screen Capture
Discovery
During implementation, we discovered that grim (the Wayland screenshot tool) can output directly to stdout:
grim - | file -
# Output: /dev/stdin: PNG image data, 174 x 174, 8-bit/color RGBA, non-interlaced
This opens up the possibility of skipping file-based screenshots entirely.
Current Workflow
User action:
- Mod4+S → select region → space
- Screenshot saved to
~/Pictures/Screenshots/Screenshot-YYYY-MM-DD-HH-MM-SS.png - Tell AI: "look at my screenshot"
- AI runs:
ls -t ~/Pictures/Screenshots/*.png | head -1 - AI reads file and analyzes
Latency: 2-5 seconds (file I/O, directory scanning)
Proposed Direct Capture Workflow
User action:
- Tell AI: "show me what's on my screen"
- AI runs:
grim - | <inject into context> - AI analyzes without file intermediary
Latency: <1 second (no file I/O)
Technical Questions (Unanswered)
Can AI read from stdin?
grim - | base64 | <how does AI ingest this?>
Unknown: Does OpenCode/Claude Code support image injection from stdin/base64?
Can AI read from clipboard?
grim - | wl-copy
# AI reads from clipboard with wl-paste?
Unknown: Does OpenCode/Claude Code have clipboard access?
Can we capture specific windows?
niri compositor provides:
niri msg focused-window # Get focused window info
niri msg windows # List all windows
niri msg pick-window # Mouse selection
grim supports regions:
grim -g "x,y widthxheight" - # Capture specific region
Possibility:
- Get window geometry from niri
- Capture that specific region with grim
- Inject directly without saving
Implementation Options
Option A: Clipboard-Based (Easiest to Test)
#!/usr/bin/env bash
# skills/screenshot-capture/scripts/capture-screen.sh
# Capture entire screen to clipboard
grim - | wl-copy
# Tell AI it's in clipboard
echo "Screen captured to clipboard. Use wl-paste to read."
Pros:
- Simple integration
- Works with existing clipboard tools
- No file cleanup needed
Cons:
- Requires AI to support clipboard reading
- Unclear if OpenCode/Claude Code can do this
Option B: Temp File (Current Approach)
#!/usr/bin/env bash
# What we currently do (implicitly)
TEMP_FILE="/tmp/screen-capture-$(date +%s).png"
grim "$TEMP_FILE"
echo "$TEMP_FILE"
# AI reads file, analyzes, could delete after
Pros:
- Works with current AI image capabilities
- Proven approach
Cons:
- File I/O overhead
- Temp file cleanup required
- Not as elegant
Option C: Base64 Stdin (Most Direct)
#!/usr/bin/env bash
# Hypothetical direct injection
grim - | base64 | ai-inject-image --format png --encoding base64
Pros:
- No files at all
- Minimal latency
- Clean architecture
Cons:
- Requires AI tool support for stdin images
- Completely unknown if possible
Next Steps to Validate
-
Test clipboard reading:
grim - | wl-copy # In OpenCode: "What's in the clipboard?" # Does it understand it's an image? -
Test temp file with auto-cleanup:
TEMP=$(mktemp --suffix=.png) trap "rm -f $TEMP" EXIT grim "$TEMP" # AI analyzes # File auto-deleted on exit -
Research AI tool capabilities:
- Check OpenCode documentation for image input methods
- Check Claude Code documentation for image input methods
- Test if base64-encoded images can be injected
-
Test region capture:
# Get focused window geometry niri msg focused-window -j | jq -r '.geometry' # Capture just that region grim -g "$GEOMETRY" -
User Experience Comparison
Current (File-Based)
User: "Look at my last screenshot"
AI: <finds file in ~/Pictures/Screenshots>
AI: <reads file>
AI: "I see a terminal window with..."
Time: 2-5 seconds
Proposed (Direct Capture)
User: "Show me what's on screen"
AI: <captures directly with grim>
AI: "I see a terminal window with..."
Time: <1 second
Advanced (Region Aware)
User: "What's in the focused window?"
AI: <gets geometry from niri>
AI: <captures that region only>
AI: "The focused window shows..."
Time: <1 second
Decision: Why We Didn't Implement This Now
- Unknown AI Capabilities: Don't know if OpenCode/Claude Code support non-file image input
- Unvalidated Workflow: Current file-based approach is proven to work
- User Request: User asked for "find my screenshots", not "capture my screen"
- YAGNI: Would be premature optimization without user feedback
Current implementation solves the stated problem. This enhancement is for IF users say:
- "This is too slow"
- "I want to capture what's on screen now, not find old files"
- "Can you see my current window?"
Recommendation
Ship the file-based solution first (screenshot-latest skill).
After real usage, if users want:
- Real-time screen capture → Investigate direct capture
- Region selection → Integrate niri window geometry
- Clipboard workflow → Test clipboard-based approach
Don't build it until users ask for it.
Technical Notes
grim capabilities verified:
- ✅ Can output to stdout (
grim -) - ✅ Outputs valid PNG format
- ✅ Supports region capture (
-g "x,y WxH") - ✅ Works with Wayland compositors (niri confirmed)
niri capabilities verified:
- ✅ Can query window geometry (
niri msg windows -j) - ✅ Can get focused window (
niri msg focused-window) - ✅ Supports JSON output for parsing
Unknown capabilities:
- ❓ Can OpenCode/Claude Code read from clipboard?
- ❓ Can OpenCode/Claude Code accept base64 image data?
- ❓ Can OpenCode/Claude Code accept stdin image data?
- ❓ What's the actual latency difference in real usage?
References
man grim- Screenshot tool documentationniri msg --help- Compositor IPC commandsman wl-clipboard- Wayland clipboard utilities
This document describes potential enhancements, not current implementation.
The current screenshot-latest skill uses file-based approach intentionally.