- Transform tufte-press from reference guide to conversation-aware generator - Add JSON generation from conversation context following strict schema - Create build automation scripts with Nix environment handling - Integrate CUPS printing with duplex support - Add comprehensive workflow documentation Scripts added: - skills/tufte-press/scripts/generate-and-build.sh (242 lines) - skills/tufte-press/scripts/build-card.sh (23 lines) Documentation: - Updated SKILL.md with complete workflow instructions (370 lines) - Updated README.md with usage examples (340 lines) - Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines) - Added worklog: 2025-11-10-tufte-press-skill-evolution.org Features: - Agent generates valid JSON from conversation - Schema validation before build (catches errors early) - Automatic Nix shell entry for dependencies - PDF build via tufte-press toolchain - Optional print with duplex support - Self-contained margin notes enforced - Complete end-to-end testing Workflow: Conversation → JSON → Validate → Build → Print Related: niri-window-capture, screenshot-latest, worklog skills
4.4 KiB
Specification Reset: Screenshot Analysis
The Over-Engineering Problem
Lines of specification: 635 lines across 3 documents Lines of code needed: ~10-20 lines of bash
What happened: Classic solution-first thinking instead of problem-first thinking.
The Actual Problem Statement
User workflow:
- Press Mod4+S → select region → space
- Screenshot saved to
~/Pictures/Screenshots/ - Want AI to see it immediately
- Don't want to type
~/Pictures/Screenshots/filename.pngevery time
User says: "Look at my last screenshot" AI needs: The image file
Reality Check
One-Line Solution
ls -t ~/Pictures/Screenshots/*.{png,jpg,jpeg} 2>/dev/null | head -1
This already works. Finding the latest file is solved.
What We Actually Need
Option 1: Helper script (if file-based is fine)
#!/usr/bin/env bash
# skills/screenshot-latest
ls -t ~/Pictures/Screenshots/*.{png,jpg,jpeg} 2>/dev/null | head -1
Option 2: Direct capture (if we want to skip files)
#!/usr/bin/env bash
# skills/screenshot-capture
grim - | wl-copy # Capture to clipboard
# or
grim - > /tmp/screen-$(date +%s).png && echo "/tmp/screen-$(date +%s).png"
Questions We Should Have Asked First
-
Is finding files the real problem?
- No. Finding the latest file is trivial (
ls -t | head -1)
- No. Finding the latest file is trivial (
-
Do we need the file at all?
- Can we capture directly from compositor? YES (
grim -) - Can AI read from clipboard? UNKNOWN
- Can AI read from stdin? UNKNOWN
- Can we capture directly from compositor? YES (
-
What's the actual pain point?
- Typing paths? → Solved with 1-line helper
- File management? → Not mentioned by user
- Latency? → Not mentioned by user
- Memory usage? → Files are already on disk
-
What does "skill" mean in this context?
- Is it a bash script? (seems like it)
- Is it an OpenCode integration? (unclear)
- Is it a prompt template? (maybe?)
The Specification Trap
We wrote:
- 11 functional requirements
- 3 user stories with acceptance scenarios
- 5 success criteria
- 82 implementation tasks
- 4 bash scripts with full test coverage
We should have written:
- "Find latest screenshot:
ls -t ~/Pictures/Screenshots/*.png | head -1" - "Test: Create temp dir, touch files, verify script returns newest"
- "Done"
Root Cause Analysis
Failure mode: Applied enterprise feature specification process to a 10-line script
Why it happened:
- Used
/speckit.specifytool without calibrating scope - Answered spec template questions instead of questioning the template
- Generated tasks from requirements instead of questioning requirements
- Focused on "how to do it properly" instead of "should we even do this"
Path Forward
Immediate Actions
-
Clarify skill definition
- What is a "skill" in the OpenCode/Claude Code context?
- Is it a bash script, prompt template, or integration?
-
Test direct capture
- Can
grim - | base64be piped to AI? - Can AI read from clipboard via
wl-paste? - What's the actual integration mechanism?
- Can
-
Verify actual user workflow
- Does user want file-finding or screen-capture?
- Is this about past screenshots or current screen?
- Is this about "show AI what I see" or "find old screenshots"?
Decision Tree
Do we need to capture NEW screens?
├─ YES → Use `grim -` for direct capture
│ └─ Can AI read from clipboard/stdin?
│ ├─ YES → Skip files entirely
│ └─ NO → Capture to /tmp/screenshot.png
└─ NO → Finding existing files
└─ One-liner: ls -t ~/Pictures/Screenshots/*.png | head -1
Recommendations
Stop:
- Implementing the 82-task plan
- Building config file system
- Creating time-based filtering
- Writing comprehensive test suites
Start:
- Write 10-line proof-of-concept
- Test with actual AI workflow
- Observe what breaks
- Fix that one thing
- Ship it
Success criteria: User types "show me the screen" and sees analysis in <2 seconds.
Not success criteria:
- ✗ Handles 1000+ files efficiently
- ✗ Supports Nth screenshot lookup
- ✗ Configurable via JSON
- ✗ Has 80% test coverage
- ✗ Follows enterprise best practices
Next Session Goal
Single question to answer: What's the simplest thing that makes "look at my last screenshot" work?
Acceptance: User says "that works, thanks"
Not acceptance: Comprehensive framework for screenshot management with plugin architecture