# Future Enhancement: Direct Screen Capture ## Discovery During implementation, we discovered that `grim` (the Wayland screenshot tool) can output directly to stdout: ```bash grim - | file - # Output: /dev/stdin: PNG image data, 174 x 174, 8-bit/color RGBA, non-interlaced ``` This opens up the possibility of **skipping file-based screenshots entirely**. ## Current Workflow **User action**: 1. Mod4+S → select region → space 2. Screenshot saved to `~/Pictures/Screenshots/Screenshot-YYYY-MM-DD-HH-MM-SS.png` 3. Tell AI: "look at my screenshot" 4. AI runs: `ls -t ~/Pictures/Screenshots/*.png | head -1` 5. AI reads file and analyzes **Latency**: 2-5 seconds (file I/O, directory scanning) ## Proposed Direct Capture Workflow **User action**: 1. Tell AI: "show me what's on my screen" 2. AI runs: `grim - | ` 3. AI analyzes without file intermediary **Latency**: <1 second (no file I/O) ## Technical Questions (Unanswered) ### Can AI read from stdin? ```bash grim - | base64 | ``` **Unknown**: Does OpenCode/Claude Code support image injection from stdin/base64? ### Can AI read from clipboard? ```bash grim - | wl-copy # AI reads from clipboard with wl-paste? ``` **Unknown**: Does OpenCode/Claude Code have clipboard access? ### Can we capture specific windows? **niri compositor provides**: ```bash niri msg focused-window # Get focused window info niri msg windows # List all windows niri msg pick-window # Mouse selection ``` **grim supports regions**: ```bash grim -g "x,y widthxheight" - # Capture specific region ``` **Possibility**: 1. Get window geometry from niri 2. Capture that specific region with grim 3. Inject directly without saving ## Implementation Options ### Option A: Clipboard-Based (Easiest to Test) ```bash #!/usr/bin/env bash # skills/screenshot-capture/scripts/capture-screen.sh # Capture entire screen to clipboard grim - | wl-copy # Tell AI it's in clipboard echo "Screen captured to clipboard. Use wl-paste to read." ``` **Pros**: - Simple integration - Works with existing clipboard tools - No file cleanup needed **Cons**: - Requires AI to support clipboard reading - Unclear if OpenCode/Claude Code can do this ### Option B: Temp File (Current Approach) ```bash #!/usr/bin/env bash # What we currently do (implicitly) TEMP_FILE="/tmp/screen-capture-$(date +%s).png" grim "$TEMP_FILE" echo "$TEMP_FILE" # AI reads file, analyzes, could delete after ``` **Pros**: - Works with current AI image capabilities - Proven approach **Cons**: - File I/O overhead - Temp file cleanup required - Not as elegant ### Option C: Base64 Stdin (Most Direct) ```bash #!/usr/bin/env bash # Hypothetical direct injection grim - | base64 | ai-inject-image --format png --encoding base64 ``` **Pros**: - No files at all - Minimal latency - Clean architecture **Cons**: - Requires AI tool support for stdin images - Completely unknown if possible ## Next Steps to Validate 1. **Test clipboard reading**: ```bash grim - | wl-copy # In OpenCode: "What's in the clipboard?" # Does it understand it's an image? ``` 2. **Test temp file with auto-cleanup**: ```bash TEMP=$(mktemp --suffix=.png) trap "rm -f $TEMP" EXIT grim "$TEMP" # AI analyzes # File auto-deleted on exit ``` 3. **Research AI tool capabilities**: - Check OpenCode documentation for image input methods - Check Claude Code documentation for image input methods - Test if base64-encoded images can be injected 4. **Test region capture**: ```bash # Get focused window geometry niri msg focused-window -j | jq -r '.geometry' # Capture just that region grim -g "$GEOMETRY" - ``` ## User Experience Comparison ### Current (File-Based) ``` User: "Look at my last screenshot" AI: AI: AI: "I see a terminal window with..." Time: 2-5 seconds ``` ### Proposed (Direct Capture) ``` User: "Show me what's on screen" AI: AI: "I see a terminal window with..." Time: <1 second ``` ### Advanced (Region Aware) ``` User: "What's in the focused window?" AI: AI: AI: "The focused window shows..." Time: <1 second ``` ## Decision: Why We Didn't Implement This Now 1. **Unknown AI Capabilities**: Don't know if OpenCode/Claude Code support non-file image input 2. **Unvalidated Workflow**: Current file-based approach is proven to work 3. **User Request**: User asked for "find my screenshots", not "capture my screen" 4. **YAGNI**: Would be premature optimization without user feedback **Current implementation solves the stated problem.** This enhancement is for IF users say: - "This is too slow" - "I want to capture what's on screen now, not find old files" - "Can you see my current window?" ## Recommendation **Ship the file-based solution first** (`screenshot-latest` skill). **After real usage**, if users want: - Real-time screen capture → Investigate direct capture - Region selection → Integrate niri window geometry - Clipboard workflow → Test clipboard-based approach **Don't build it until users ask for it.** ## Technical Notes **grim capabilities verified**: - ✅ Can output to stdout (`grim -`) - ✅ Outputs valid PNG format - ✅ Supports region capture (`-g "x,y WxH"`) - ✅ Works with Wayland compositors (niri confirmed) **niri capabilities verified**: - ✅ Can query window geometry (`niri msg windows -j`) - ✅ Can get focused window (`niri msg focused-window`) - ✅ Supports JSON output for parsing **Unknown capabilities**: - ❓ Can OpenCode/Claude Code read from clipboard? - ❓ Can OpenCode/Claude Code accept base64 image data? - ❓ Can OpenCode/Claude Code accept stdin image data? - ❓ What's the actual latency difference in real usage? ## References - `man grim` - Screenshot tool documentation - `niri msg --help` - Compositor IPC commands - `man wl-clipboard` - Wayland clipboard utilities --- *This document describes potential enhancements, not current implementation.* *The current `screenshot-latest` skill uses file-based approach intentionally.*