dan/skills

dan 5fea49b7c0 feat(tufte-press): evolve skill to complete workflow with JSON generation and build automation

- Transform tufte-press from reference guide to conversation-aware generator
- Add JSON generation from conversation context following strict schema
- Create build automation scripts with Nix environment handling
- Integrate CUPS printing with duplex support
- Add comprehensive workflow documentation

Scripts added:
- skills/tufte-press/scripts/generate-and-build.sh (242 lines)
- skills/tufte-press/scripts/build-card.sh (23 lines)

Documentation:
- Updated SKILL.md with complete workflow instructions (370 lines)
- Updated README.md with usage examples (340 lines)
- Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines)
- Added worklog: 2025-11-10-tufte-press-skill-evolution.org

Features:
- Agent generates valid JSON from conversation
- Schema validation before build (catches errors early)
- Automatic Nix shell entry for dependencies
- PDF build via tufte-press toolchain
- Optional print with duplex support
- Self-contained margin notes enforced
- Complete end-to-end testing

Workflow: Conversation → JSON → Validate → Build → Print

Related: niri-window-capture, screenshot-latest, worklog skills

2025-11-10 15:03:44 -08:00

20 KiB

Raw Blame History

Screenshot Analysis Feature: Over-Engineering Discovery and Wayland Capture Research

Session Summary
- Date: 2025-11-08 (Day 2 of screenshot-analysis feature)
- Focus Area: Screenshot analysis skill implementation - discovered massive over-engineering, pivoted to minimal implementation and Wayland direct capture research
Accomplishments
Key Decisions
Problems & Solutions
Technical Details
Process and Workflow
Learning and Insights
Context for Future Work
Raw Notes
Session Metrics

Session Summary

Date: 2025-11-08 (Day 2 of screenshot-analysis feature)

Focus Area: Screenshot analysis skill implementation - discovered massive over-engineering, pivoted to minimal implementation and Wayland direct capture research

Accomplishments

Identified severe over-engineering in specification (635 lines of planning for 22 lines of code)
Built minimal viable screenshot-latest skill (185 lines total including docs)
Tested and verified find-latest.sh script works correctly
Researched Wayland screencopy protocol capabilities with grim
Discovered niri overview mode enables capturing inactive workspace windows
Verified AI can read PNG images directly from temp files
Created comprehensive analysis documents (RESET.md, COMPARISON.md, RESOLUTION.md)
Documented future enhancement path for direct screen capture
Deploy skill to ~/.claude/skills/ (pending user testing)
Test skill in actual AI workflow (pending deployment)

Key Decisions

Decision 1: Abort 82-task specification, ship minimal implementation

Context: Previous session generated 635 lines of specification with 82 implementation tasks for what turned out to be a 22-line bash script
Options considered:
1. Continue with comprehensive specification approach (4 scripts, full test coverage, config system)
2. Build minimal version first, validate with users, enhance if needed
3. Abandon feature entirely as over-engineered
Rationale: One-liner test `ls -t ~/Pictures/Screenshots/*.png | head -1` proved the core functionality already works. User requested "don't make me type paths" - minimal solution solves exactly that.
Impact: Reduced implementation from estimated 200 lines of code + tests to 22 lines of working bash + 83 lines of documentation. Saves ~2-3 hours of implementation time.

Decision 2: Use file-based approach instead of direct capture for MVP

Context: Discovered `grim - ` can output PNG to stdout, enabling clipboard or direct injection workflows
Options considered:
1. File-based: `ls -t ~/Pictures/Screenshots/*.png | head -1` (proven to work)
2. Clipboard-based: `grim - | wl-copy` then AI reads from clipboard (unknown if AI supports)
3. Direct injection: `grim - | base64 | <inject to AI>` (unknown if possible)
4. Temp file capture: `grim /tmp/screen.png` (works but adds file I/O)
Rationale: File-based approach is proven, solves stated user problem, no unknown dependencies. Direct capture requires AI integration research that blocks MVP.
Impact: Can ship working solution immediately. Direct capture documented as future enhancement if users request lower latency or real-time capture.

Decision 3: Document over-engineering lessons rather than hide the mistake

Context: Spent 115 minutes on specification vs 22 minutes on implementation (5.2x waste)
Options considered:
1. Delete spec files and pretend they never happened
2. Keep spec files but don't document the failure
3. Create detailed analysis documents showing what went wrong and why
Rationale: This is valuable learning about when to specify vs when to code first. Future features can reference this decision framework.
Impact: Created RESET.md, COMPARISON.md, RESOLUTION.md documenting the over-engineering trap and how to avoid it. These become reference material for future scope decisions.

Decision 4: Investigate Wayland capture limitations vs compositor capabilities

Context: User asked if inactive workspace windows can be captured - unclear if limitation is "not rendered" vs "security restriction"
Options considered:
1. Accept that Wayland can't capture inactive workspaces
2. Research compositor-specific capabilities (niri overview mode)
3. Look for alternative protocols or tools
Rationale: Understanding the actual limitation determines what's possible. If compositor renders it for overview, we can capture it.
Impact: Discovered niri overview mode DOES render inactive workspace windows, making multi-workspace capture possible via brief overview toggle. Opens up new use cases like "find window with error message across all workspaces".

Problems & Solutions

Problem	Solution	Learning
635 lines of specification for 22 lines of code - massive scope creep	Tested one-liner solution first: `ls -t ~/Pictures/Screenshots/*.png \	head -1` works perfectly. Shipped minimal implementation.	Always validate problem with simplest solution before writing comprehensive specs. For obvious problems (file finding), code IS the specification.
Spec template drove over-engineering - filling sections created unnecessary requirements	Created "complexity gate" recommendation: ask "can you solve this with a one-liner?" before running /speckit.specify	Spec tools are powerful but dangerous for simple problems. Template-driven development can create work that doesn't need to exist.
Unclear if Wayland screencopy limitation is rendering or security	Researched protocol, tested niri overview mode. Found overview renders ALL workspace windows, enabling capture via `niri msg action toggle-overview && grim && toggle-overview`	Wayland limitation is "not rendered" not "security blocked". Compositor design choice (keeping thumbnail buffers) determines what's capturable.
Don't know if AI can read from clipboard or stdin for images	Tested with temp file: `grim /tmp/test.png` → Read tool successfully loads and displays image	AI (OpenCode/Claude) CAN read PNG files directly. File-based approach works, no need to research clipboard/stdin for MVP.
Overview mode toggle causes ~450ms visible flicker	Measured timing, checked animation config. Flicker is inherent to rendering overview for capture.	Invisible capture requires either: 1) compositor thumbnail buffers (not in niri), 2) metadata only (no visuals), or 3) accept brief flicker. Physics/Wayland security model - can't capture what's not rendered.

Technical Details

Code Changes

Total files created: 9 (4 implementation, 5 analysis)
Key files created:
- `skills/screenshot-latest/SKILL.md` - Agent instructions for finding latest screenshot (83 lines)
- `skills/screenshot-latest/scripts/find-latest.sh` - Bash script to find most recent screenshot (22 lines)
- `skills/screenshot-latest/README.md` - User documentation
- `skills/screenshot-latest/examples/example-output.txt` - Example output
- `specs/001-screenshot-analysis/RESET.md` - Over-engineering analysis
- `specs/001-screenshot-analysis/COMPARISON.md` - Spec vs implementation reality check (1400 lines)
- `specs/001-screenshot-analysis/RESOLUTION.md` - Feature closure document
- `specs/001-screenshot-analysis/FUTURE-ENHANCEMENT.md` - Direct capture research
- `AGENTS.md` - Auto-generated agent context file
Spec files archived but not deleted:
- `specs/001-screenshot-analysis/spec.md` (165 lines - over-specified)
- `specs/001-screenshot-analysis/plan.md` (139 lines - premature)
- `specs/001-screenshot-analysis/tasks.md` (331 lines - 82 unnecessary tasks)

Commands Used

Finding latest screenshot (the core solution): ```bash ls -t ~/Pictures/Screenshots/*.{png,jpg,jpeg} 2>/dev/null | head -1

```

Testing grim stdout capability: ```bash grim -g "0,0 100x100" - | file -

```

Testing grim to base64 pipeline: ```bash grim -g "0,0 100x100" - | base64 | head -c 80

```

Capturing during niri overview mode: ```bash niri msg action toggle-overview sleep 0.1 grim /tmp/overview-test.png niri msg action toggle-overview

```

Getting window metadata from niri: ```bash niri msg –json windows | jq -r '.[] | "\(.id) - \(.title) - Workspace: \(.workspace_id)"'

```

Architecture Notes

Skills structure (validated):

Each skill is a directory under `skills/`
`SKILL.md` with YAML frontmatter contains agent instructions
Optional `scripts/` directory for helper scripts
Optional `templates/` and `examples/` directories
Skills deployed to `~/.claude/skills/` or `~/.config/opencode/skills/`
Agent auto-discovers based on `description` field and "When to Use" section

Wayland screencopy protocol limitations:

Only captures currently visible screen buffers
Windows on inactive workspaces are not rendered → not capturable
Compositor design choice whether to maintain thumbnail buffers
niri overview mode IS a render pass → windows become capturable during overview
No way to capture without making content visible (security by design)

Direct capture workflow possibilities:

Temp file (proven): `grim /tmp/screen.png` → AI reads with Read tool
Clipboard (untested): `grim - | wl-copy` → AI reads with `wl-paste`?
Base64 stdin (untested): `grim - | base64` → AI accepts as image data?
Overview toggle (proven): Brief flicker enables multi-workspace capture

Process and Workflow

What Worked Well

Testing one-liner solution BEFORE writing comprehensive spec (should have done this in session 1)
Creating analysis documents (RESET.md, COMPARISON.md) to capture learning
Using actual numbers (635 lines spec vs 22 lines code) to demonstrate over-engineering
Hands-on testing with grim, niri, and Read tool to validate capabilities
Documenting future enhancements separately so they don't block MVP
Keeping spec files as "what not to do" examples rather than deleting

What Was Challenging

Recognizing the over-engineering early enough (took 5 sessions to catch it)
Resisting the pull to "do it properly" with comprehensive specs
Admitting that 115 minutes of specification work should be abandoned
Distinguishing between "thorough planning" and "planning theater"
Balancing documentation quality (these analysis docs are also long!) with shipping
Investigating Wayland compositor internals to understand actual limitations

What I Would Do Differently

Test the one-liner solution in Session 1 before opening the spec template
Use complexity gate: "Can this be solved with <50 lines of code? Just write it."
Question every spec template section: "What happens if I skip this?"
Ship code first for simple problems, document after it works
Research actual constraints (Wayland protocol) before designing solutions

Learning and Insights

Technical Insights

Wayland security model and rendering:

Wayland's "not rendered = not capturable" is a feature, not a bug
Prevents background window spying (security win)
Compositors choose whether to keep thumbnail buffers (GNOME/KDE do, niri doesn't by default)
Overview modes are actual render passes, making capture possible
~450ms flicker is unavoidable if overview has animations

grim capabilities:

Can output PNG to stdout with `grim -` (opens direct injection possibilities)
Supports region capture with `-g "x,y WxH"` syntax
Supports specific output/monitor capture with `-o <output-name>`
Supports window capture with `-T <toplevel-id>` IF window is visible
Works with any Wayland compositor supporting screencopy protocol

AI image handling:

Read tool can directly ingest PNG files from any path
No need for clipboard or base64 encoding for file-based approach
Temp file approach (`/tmp/screen-*.png`) works perfectly
Opens door to "capture now, analyze immediately" workflows

Process Insights

Specification vs implementation balance:

Comprehensive specs valuable when: multiple teams, complex domain, high rework risk, unclear requirements
Code-first appropriate when: obvious solution, single developer, simple domain, low rework risk
This feature was code-first scenario treated as spec-first (root cause of waste)
5.2x time waste (115 min spec vs 22 min implement) is the cost of wrong approach

Template-driven development risks:

Templates create pressure to fill in every section
Answering template questions feels productive but may create unnecessary work
`/speckit.specify` tool powerful but needs complexity gate
"Did you test if this already works?" should be first question

Over-engineering indicators:

Task breakdown longer than expected code (82 tasks for 22-line script)
Configuration system for single constant value
Comprehensive test coverage before code exists
Features user didn't request ("time-based filtering", "Nth screenshot")
Specification longer than implementation (635 vs 185 lines)

Architectural Insights

Skills as agent interface:

SKILL.md is essentially an API contract for agent behavior
"When to Use" section is trigger detection logic
Helper scripts are implementation details agent can invoke
Skills compose (can reference other skills)
Deployment via symlink enables version control + system integration

Direct capture architectural patterns:

File-based: Proven, simple, works now (chosen for MVP)
Clipboard-based: Unknown AI support, worth testing
Stdin-based: Unknown AI support, more complex
Overview-toggle: Works but causes visible flicker
Metadata-only: No visuals but no flicker (niri windows JSON)

Future enhancement paths:

Real-time screen analysis (capture current screen on demand)
Multi-workspace search (toggle overview, capture, analyze all windows)
Window-specific capture (use niri window geometry + grim region)
Clipboard workflow (if AI supports wl-paste)
Zero-file capture (if AI supports stdin/base64 images)

Context for Future Work

Open Questions

Direct capture capabilities:

Can OpenCode/Claude Code read images from clipboard via `wl-paste`?
Can OpenCode/Claude Code accept base64-encoded image data as input?
Can OpenCode/Claude Code read image data from stdin?
What's actual latency difference: file-based vs clipboard vs temp-file?

niri compositor capabilities:

Can overview mode be triggered without animations for faster capture?
Does niri maintain any thumbnail buffers we could access directly?
Can we hook into niri's IPC to get notified when overview is fully rendered?
Are there niri config options to reduce overview transition time?

Skill deployment and usage:

How do users actually trigger skills in practice?
Is natural language detection reliable ("look at my screenshot")?
Should skill be invokable via explicit command ("/screenshot-latest")?
How to handle skill updates (symlink means changes propagate)?

Specification methodology:

How to formalize "complexity gate" for spec tool?
What metrics indicate spec-first vs code-first approach?
Can we detect over-engineering automatically (tasks > expected LOC)?
Should spec tool warn when solution already exists (grep codebase)?

Next Steps

Immediate (pending user decision):

Deploy skill to `~/.claude/skills/screenshot-latest` or `~/.config/opencode/skills/screenshot-latest`
Test with actual AI usage: "look at my last screenshot"
Gather user feedback on whether it solves the problem
Decide if direct capture enhancements are needed

Future enhancements (only if requested):

Test clipboard-based workflow: `grim - | wl-copy` → AI reads
Implement overview-toggle capture for multi-workspace analysis
Add custom directory support if users request it
Add Nth screenshot lookup if users request it
Investigate zero-file direct injection if latency becomes issue

Process improvements:

Add complexity gate to spec-kit tool usage documentation
Create decision framework flowchart (when to spec vs when to code)
Document this as case study in WORKFLOW.md
Consider adding "test-first" step to specification workflow

Related Work

Skills repository: `/home/dan/proj/skills`
Worklog skill: `~/.claude/skills/worklog/` (used to generate this document)
Spec-kit framework: `.specify/` directory
Screenshot specification (archived): `specs/001-screenshot-analysis/spec.md`
Screenshot implementation: `skills/screenshot-latest/`
OpenCode documentation: https://opencode.ai/docs (for future AI capability research)
Wayland screencopy protocol: https://gitlab.freedesktop.org/wayland/wayland-protocols (for understanding capture limitations)
niri compositor: https://github.com/YaLTeR/niri (for overview mode and IPC capabilities)

Raw Notes

User interaction highlights:

Session started with reviewing previous session's over-engineering summary
User immediately caught new over-engineering: "you're overengineering our overengineering fix"
Pivoted to focus on direct capture possibilities instead of analysis documents
User interested in capturing windows from inactive workspaces ("what about for what's not on the active workspace/windows")
Key question: "Is the problem 'Not rendered' or 'Not viewable because of security'"
Exploring Alt-Tab style live previews of workspaces/windows
Pivoted again when overview capture showed 450ms flicker: "preferred scenario would be making it invisible to the user"
User requested worklog at end of session

Research discoveries this session:

grim can output to stdout (verified with `file -`)
base64 encoding works for grim output
wl-copy/wl-paste work on the system
niri has overview mode (Mod+O keybinding)
Overview mode DOES render inactive workspace windows
Overview capture works but causes ~450ms visible flicker
AI Read tool successfully ingests PNG files directly
niri provides JSON metadata for all windows (IDs, titles, workspaces)

Key insight: Wayland limitation is rendering, not security

Compositors only render visible content by design (performance)
Alt-Tab previews on Windows work because DWM maintains thumbnail buffers
GNOME/KDE do maintain thumbnails for workspace switchers
niri doesn't maintain thumbnails BUT overview mode IS a render pass
This means capture IS possible via brief overview toggle
Tradeoff: visual content requires making it visible (Wayland by design)

Alternatives explored:

Fast flicker (~450ms overview toggle) - works, visible to user
Metadata only (niri JSON) - invisible, no visual content
Individual window capture - requires workspace switching, still visible
Invisible capture - not possible without compositor thumbnail buffers

Decision point reached: User wants invisible capture, which conflicts with Wayland's render-to-capture model. Options are:

Accept brief flicker for visual capture
Use metadata-only for invisible queries
Request/implement thumbnail buffer support in niri (major undertaking)

Session ended with request for worklog before deciding on approach.

Metrics and scale:

Specification documents: 635 lines (spec.md + plan.md + tasks.md)
Implementation: 185 lines total (22 lines code + 83 lines SKILL.md + 80 lines README + examples)
Analysis documents created: 5 files, ~2000+ lines documenting the learning
Time spent: Session 1-4 (spec) ~115 min, Session 5-6 (implement + research) ~90 min
Ratio: 3.4x more spec than implementation, 5.2x more time on spec than coding
Potential tasks avoided: 82 tasks from original breakdown

File tree created: ``` skills/screenshot-latest/ ├── SKILL.md (83 lines - agent instructions) ├── README.md (user documentation) ├── scripts/ │ └── find-latest.sh (22 lines - the actual solution) └── examples/ └── example-output.txt

specs/001-screenshot-analysis/ ├── spec.md (165 lines - archived as over-engineered) ├── plan.md (139 lines - archived as premature) ├── tasks.md (331 lines - archived as unnecessary) ├── RESET.md (analysis of over-engineering) ├── COMPARISON.md (spec vs implementation comparison) ├── RESOLUTION.md (feature closure) └── FUTURE-ENHANCEMENT.md (direct capture research) ```

Session Metrics

Commits made: 1 (initial commit)
Files touched (uncommitted): 9 new files
Lines added: ~4500+ (implementation + analysis + worklog)
Lines of actual code: 22 (find-latest.sh)
Lines of documentation: ~4000+
Tests added: 0 (manual testing only)
Tests passing: 1/1 (manual test of find-latest.sh successful)

20 KiB Raw Blame History Unescape Escape