dan/skills

dan 5fea49b7c0 feat(tufte-press): evolve skill to complete workflow with JSON generation and build automation

- Transform tufte-press from reference guide to conversation-aware generator
- Add JSON generation from conversation context following strict schema
- Create build automation scripts with Nix environment handling
- Integrate CUPS printing with duplex support
- Add comprehensive workflow documentation

Scripts added:
- skills/tufte-press/scripts/generate-and-build.sh (242 lines)
- skills/tufte-press/scripts/build-card.sh (23 lines)

Documentation:
- Updated SKILL.md with complete workflow instructions (370 lines)
- Updated README.md with usage examples (340 lines)
- Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines)
- Added worklog: 2025-11-10-tufte-press-skill-evolution.org

Features:
- Agent generates valid JSON from conversation
- Schema validation before build (catches errors early)
- Automatic Nix shell entry for dependencies
- PDF build via tufte-press toolchain
- Optional print with duplex support
- Self-contained margin notes enforced
- Complete end-to-end testing

Workflow: Conversation → JSON → Validate → Build → Print

Related: niri-window-capture, screenshot-latest, worklog skills

2025-11-10 15:03:44 -08:00

23 KiB

Raw Blame History

Invisible Window Capture: From Over-Engineering to Production Security

Session Summary
- Date: 2025-11-08 (Day 2 of screenshot-analysis feature, session 2)
- Focus Area: Discovered and implemented invisible cross-workspace window capture using niri compositor's direct buffer rendering, with comprehensive security analysis
Accomplishments
Key Decisions
Problems & Solutions
Technical Details
Process and Workflow
- What Worked Well
- What Was Challenging
Learning and Insights
Context for Future Work
Raw Notes
Session Metrics

Session Summary

Date: 2025-11-08 (Day 2 of screenshot-analysis feature, session 2)

Focus Area: Discovered and implemented invisible cross-workspace window capture using niri compositor's direct buffer rendering, with comprehensive security analysis

Accomplishments

Discovered niri can capture windows from inactive workspaces invisibly using direct buffer rendering
Researched niri source code to understand window capture mechanism (~180 min deep dive)
Built complete niri-window-capture skill with security documentation (703 lines)
Implemented audit logging using systemd journal (logger pattern from dotfiles)
Created comprehensive security analysis (196-line SECURITY.md)
Tested invisible cross-workspace capture (verified works on workspaces 1 and 2)
Created upstream feature request template for –no-clipboard flag
Documented complete technical flow from user intent to screenshot analysis
Deploy skill to ~/.claude/skills/ (pending user security review)
File upstream niri issue (template ready)

Key Decisions

Decision 1: Build invisible capture skill despite security implications

Context: Discovered niri can capture ANY window invisibly - major privacy/security concern
Options considered:
1. Don't build it - too dangerous, privacy violation
2. Build with user confirmation prompts for cross-workspace
3. Build with comprehensive security documentation and audit logging
4. Build with sensitive title filtering built-in
Rationale: User explicitly decided (after security discussion) to handle window blocking in niri config, implement audit logging, document security implications thoroughly, but skip user prompts and title filtering
Impact: Production-ready skill that's powerful but requires security-conscious deployment. Users must read SECURITY.md and configure niri block-out rules before use.

Decision 2: Use logger for audit trail (not custom logging)

Context: Needed audit trail for all window captures - security requirement
Options considered:
1. Custom log file (~/.local/share/niri-capture.log)
2. Systemd journal via logger -t niri-capture
3. Upstream niri audit logging feature request
4. No logging (document security risk)
Rationale: dotfiles already use logger pattern (lid-suspend.sh, power-status.sh etc). Consistent with existing system, uses systemd journal (queryable with journalctl), standard Linux utility.
Impact: All captures logged with: timestamp, window ID, title, workspace. Viewable with journalctl –user -t niri-capture. Follows established dotfiles patterns.

Decision 3: Accept clipboard pollution, request upstream flag

Context: niri hardcodes clipboard copy in save_screenshot() - cannot disable
Options considered:
1. Accept it, document behavior
2. Save/restore clipboard (fragile, doesn't preserve mime types)
3. Clear clipboard after AI reads (destroys user clipboard)
4. File upstream PR for –no-clipboard flag
Rationale: Clipboard save/restore too fragile. Clear-after breaks user workflow. Best solution is upstream flag. For now, document the behavior clearly in security docs.
Impact: Users must be aware screenshots persist in clipboard. Clipboard history tools will log all captures. Created UPSTREAM-REQUEST.md template for niri feature request.

Decision 4: Research niri source code before building

Context: Needed to understand if invisible cross-workspace capture was possible
Options considered:
1. Assume overview mode is only way (requires visible flicker)
2. Test empirically without source code research
3. Deep dive into niri compositor source code
4. Ask in niri community channels
Rationale: Source code reveals actual capabilities vs assumptions. Found screenshot-window –id command works on any window regardless of workspace. Discovered mapped.render() with RenderTarget::ScreenCapture bypasses screen compositing.
Impact: Unlocked invisible capture capability. Understood security implications from implementation details. Documented exact technical flow. Time well spent (~90 min research).

Decision 5: Build two skills, not one monolithic solution

Context: Started with "find last screenshot" but discovered broader capabilities
Options considered:
1. One combined skill (find existing + capture new)
2. Two separate skills (screenshot-latest + niri-window-capture)
3. Just the capture skill (skip file-finding)
Rationale: screenshot-latest solves "find existing files" (simple, safe). niri-window-capture solves "capture any window" (powerful, security-sensitive). Different use cases, different risk profiles, cleaner separation.
Impact: screenshot-latest: 185 lines, safe, ready to deploy. niri-window-capture: 703 lines, powerful, requires security review. Users can deploy one without the other.

Problems & Solutions

Problem	Solution	Learning
Overview mode captures all workspaces but causes ~450ms visible flicker	Researched niri source, discovered screenshot-window –id renders buffers directly without compositing. Tested on inactive workspace - works invisibly.	niri maintains window buffers in memory even when not displayed. Direct buffer rendering bypasses screen compositor entirely. This is how screenshot-window achieves invisible capture.
Unclear if windows on inactive workspaces can be captured	Traced through niri source: Mapped struct holds Window (smithay), Window wraps Wayland surface buffer. Applications continuously render to buffers regardless of workspace visibility.	Wayland applications always render to surface buffers. Compositor decides what to composite to screen, but buffers exist independently. Overview mode doesn't create new renders - just composites existing buffers at smaller scale.
jq parse error in capture-by-title.sh - multiple windows matched search	Changed from piping multiple objects to using jq map/select/first: `jq 'map(select(…))	.[0]'` instead of `jq '.[]	select(…)	head -1'`	When jq outputs multiple JSON objects, bash sees multiple lines but they're not valid as single JSON. Use jq array operations (map) then select first element [0] for single valid output.
niri always copies screenshots to clipboard - cannot disable	Researched source: set_data_device_selection() hardcoded in save_screenshot(). Created UPSTREAM-REQUEST.md for –no-clipboard flag. Documented behavior in SECURITY.md.	Clipboard pollution unavoidable with current niri. Future upstream flag needed. Document clearly so users understand privacy implications (clipboard history tools log screenshots).
Needed audit logging pattern - how to match dotfiles style	Searched dotfiles: rg "logger" ~/proj/dotfiles. Found lid-suspend-action.sh uses: logger -t "$LOG_TAG" "message". Systemd journal pattern.	Dotfiles use logger -t <tag> for audit trails. Viewable with journalctl –user -t <tag>. Standard Linux utility from util-linux. Perfect for capture audit trail.

Technical Details

Code Changes

Total files created: 27
Key files created:
- `skills/niri-window-capture/SKILL.md` (184 lines) - Agent instructions with security warnings
- `skills/niri-window-capture/SECURITY.md` (196 lines) - Comprehensive security analysis, threat model, mitigations
- `skills/niri-window-capture/scripts/capture-focused.sh` (31 lines) - Capture current window with audit logging
- `skills/niri-window-capture/scripts/capture-by-title.sh` (40 lines) - Find and capture by title match
- `skills/niri-window-capture/UPSTREAM-REQUEST.md` (108 lines) - Feature request for –no-clipboard flag
- `skills/screenshot-latest/SKILL.md` (83 lines) - Simple file-finding skill
- `skills/screenshot-latest/scripts/find-latest.sh` (22 lines) - One-liner: ls -t | head -1
- `specs/001-screenshot-analysis/RESET.md` - Over-engineering analysis
- `specs/001-screenshot-analysis/COMPARISON.md` - Spec vs implementation reality
- `specs/001-screenshot-analysis/SECURITY.md` - Security findings
- `docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org` - Previous session worklog
Over-specification archived (not deleted):
- `specs/001-screenshot-analysis/spec.md` (165 lines) - Over-engineered
- `specs/001-screenshot-analysis/plan.md` (139 lines) - Premature
- `specs/001-screenshot-analysis/tasks.md` (331 lines) - 82 unnecessary tasks

Commands Used

Testing niri window capture: ```bash

niri msg –json windows | jq -r '.[] | "\(.id) - \(.title) - WS:\(.workspace_id)"'

niri msg action screenshot-window –id <WINDOW_ID> –write-to-disk true

WINDOW_ID=$(niri msg --json windows | jq -r '.[] | select(.workspace_id == 2) | .id' | head -1) niri msg action screenshot-window --id "$WINDOW_ID" –write-to-disk true

```

Verifying niri capabilities: ```bash

grim -g "0,0 100x100" - | file -

niri msg action toggle-overview sleep 0.5 grim /tmp/overview-test.png niri msg action toggle-overview

niri msg –json focused-window | jq '.' niri msg –json windows | jq '.[0]'

```

Audit log viewing: ```bash

journalctl –user -t niri-capture

journalctl –user -t niri-capture -n 20

journalctl –user -t niri-capture –since today

journalctl –user -t niri-capture -f ```

Architecture Notes

niri compositor window rendering architecture (discovered via source code research):

Window buffer lifecycle:
- Applications render to Wayland surface buffers continuously
- niri compositor holds references via `Mapped` struct containing `Window` (smithay)
- Buffers exist in memory regardless of workspace visibility
- Compositor decides what to composite to outputs, but buffers persist
Direct buffer rendering (key discovery): ```rust // From niri/src/niri.rs screenshot_window() let elements = mapped.render( renderer, mapped.window.geometry().loc.to_f64(), scale, alpha, RenderTarget::ScreenCapture, // ← Key: not Output ); ```
- `RenderTarget::ScreenCapture` renders to offscreen texture
- No compositing to screen output required
- Works for windows on any workspace
Security model:
- Access control: niri IPC socket permissions (`srwxr-xr-x` user-private)
- Any process as user can capture any window
- Protection: niri window rules `block-out-from "screen-capture"`
- Audit: systemd journal via logger
Clipboard behavior (hardcoded): ```rust // From save_screenshot() set_data_device_selection( &state.niri.display_handle, &state.niri.seat, vec![String::from("image/png")], buf.clone(), ); ```
- Always copies PNG to clipboard
- No flag to disable
- Runs in separate thread after encoding

Security Considerations

Threat model (documented in SECURITY.md):

Local privilege escalation: Any compromised process as user can capture any window
Cross-workspace privacy: Users may assume inactive workspaces are "private" - they're not
Clipboard side channel: Every capture overwrites clipboard, persists in clipboard history
No audit trail: Added via logger -t niri-capture (systemd journal)
Invisible to user: No workspace switch, no screen flicker (except notification popup)

Mitigations implemented:

Audit logging: All captures logged with window ID, title, workspace
Security documentation: 196-line SECURITY.md with threat analysis
Clear warnings: Security notices in SKILL.md and README.md
Example protection: Block-out rules for password managers in docs
Logged metadata: Can review what was captured via journalctl

Protection mechanisms recommended to users:

Enable niri window rules for sensitive apps: ```kdl window-rule { match app-id=r#"^org\.keepassxc\.KeePassXC$"# block-out-from "screen-capture" } ```
Review audit logs regularly: `journalctl –user -t niri-capture`
Ensure screenshot directory private: `chmod 700 ~/Pictures/Screenshots`
Clear sensitive screenshots after AI analysis
Be aware clipboard contains last screenshot

Process and Workflow

What Worked Well

Source code research: Diving into niri source revealed invisible capture capability vs assuming overview was only option
Security-first thinking: Stopping to think like Security Engineer caught major privacy implications
Iterative exploration: grim → overview → source code → screenshot-window discovery path
Following dotfiles patterns: logger usage matches existing system, no new patterns invented
Testing on real system: Verified cross-workspace capture actually works invisibly
Comprehensive documentation: Security analysis forced clarity about risks and mitigations
User involvement: Security discussion led to clear decisions on what to implement vs skip

What Was Challenging

Scope creep awareness: Started with "find screenshot" became "invisible window capture" - had to recognize the pivot
Security vs usability tension: Powerful capability has privacy implications - balancing both
Clipboard limitation: niri hardcodes clipboard copy, no way around it, had to accept and document
jq JSON parsing: Multiple match objects required different jq syntax than expected
Deciding what not to build: Resisting adding user prompts, sensitive filtering, clipboard workarounds
Documentation depth: Security analysis took longer than code implementation (~90 min vs ~60 min)

Learning and Insights

Technical Insights

Wayland compositor architecture:

Compositors maintain window surface buffers in memory continuously
Applications render to buffers regardless of workspace visibility
"Invisible workspace" just means "not composited to output" not "buffer doesn't exist"
Overview mode doesn't create renders - composites existing buffers at smaller scale
Direct buffer rendering (ScreenCapture target) bypasses screen output entirely

niri implementation details:

Uses smithay library for Wayland protocol handling
Mapped struct wraps Window which wraps surface buffers
screenshot-window action calls mapped.render() with ScreenCapture target
Renders to offscreen texture, converts to PNG, saves to file
Clipboard copy hardcoded in save_screenshot() - no conditional logic

Audit logging pattern:

logger -t <tag> sends to systemd journal
journalctl –user -t <tag> queries by tag
Standard Linux utility from util-linux package
Dotfiles already use this pattern (lid-suspend, power management)
Better than custom log files (integrated with system logging)

Process Insights

When to research source code:

When assumptions limit solution space (overview only? wrong)
When documentation doesn't cover use case (invisible capture not documented)
When security implications unclear (need to understand internals)
When API behavior seems inconsistent (clipboard always copied - why?)
Cost: 90 minutes research. Benefit: Unlocked invisible capture + understood security model.

Security documentation value:

Forces explicit threat modeling
Reveals hidden assumptions (user thinks workspace 2 is "private")
Clarifies trust boundaries (compositor IPC socket = security boundary)
Documents mitigations for future reference
Helps users make informed deployment decisions
196 lines of security docs = confidence in deployment

Specification vs implementation timing:

Simple problems (find latest file): Code first, document after
Complex problems (invisible capture): Research first, build second
Security-sensitive features: Document threats before building
Unknown capabilities: Research, prototype, then specify
This problem: Research revealed capability, then built + documented simultaneously

Architectural Insights

Compositor as security boundary:

Wayland design: compositor is trusted, clients are not
Compositor has god-mode access to all window buffers
Access control is IPC socket permissions (user-level)
Applications cannot capture each other (must go through compositor)
This skill leverages compositor IPC to do what apps cannot

Buffer vs display separation:

Window buffers: Always exist, continuously updated by apps
Screen composition: Compositor's choice what to display when
This separation enables: invisible capture, overview modes, effects
Security implication: "hidden" windows aren't hidden from compositor

Audit trail architecture:

Systemd journal as system-wide audit log
Tagged entries (logger -t) for filtering
Centralized vs per-tool log files
Query interface (journalctl) with time ranges, filtering
Integration with system logging infrastructure

Context for Future Work

Open Questions

Clipboard behavior:

Will niri upstream accept –no-clipboard flag? (template ready to file)
Can clipboard save/restore work reliably for all mime types?
Should AI clear clipboard after reading screenshot?
How do clipboard history tools handle image/png? (privacy leak)

Security enhancements:

Should notification popup be suppressed for invisible captures?
Does mako support per-app notification filtering?
Should captures from other workspaces trigger different notification?
Is there value in upstream niri audit logging vs logger?

User experience:

Will users actually read 196-line SECURITY.md?
Should there be a quickstart with "minimum security setup"?
How to make audit log review part of normal workflow?
Should skill refuse to capture if block-out rules not configured?

Integration:

How does this skill compose with other skills?
Should screenshot-latest and niri-window-capture be merged?
Can this enable new use cases (find error messages across all workspaces)?
Should there be skill for "capture all windows and search"?

Next Steps

Immediate (user actions):

Review SECURITY.md thoroughly
Configure niri block-out rules for password managers
Test skill: `./skills/niri-window-capture/scripts/capture-focused.sh`
Review audit log: `journalctl –user -t niri-capture`
Decide whether to deploy to ~/.claude/skills/

Short term (if deployed):

Monitor audit logs for unexpected captures
Test cross-workspace capture workflows
Verify block-out rules work (try capturing password manager)
Get user feedback on security comfort level

Upstream niri:

File issue using UPSTREAM-REQUEST.md template
Request –no-clipboard flag for screenshot-window action
Discuss security documentation for invisible capture
Potentially contribute PR for flag (if accepted)

Documentation improvements:

Add quickstart security setup guide
Create video/diagram showing invisible capture flow
Document common use cases (find error messages, compare windows)
Write integration examples with other skills

Related Work

screenshot-latest skill: Simple file-finding (completed)
niri compositor: https://github.com/YaLTeR/niri
Wayland security model: Compositor as security boundary
Dotfiles logging pattern: ~/proj/dotfiles/bin/lid-suspend-action.sh
Previous worklog: docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org
Smithay Wayland library: https://github.com/Smithay/smithay
wl-clipboard tools: wl-copy, wl-paste for Wayland clipboard
systemd journal: journalctl for audit log viewing

Raw Notes

Session flow:

Resumed from previous session's over-engineering discovery
User asked: "let's go back and focus on what's possible in terms of skipping the screenshot"
Tested grim - (stdout): works
Explored overview mode: works but visible flicker
User asked: "what about for what's not on the active workspace/windows"
Deep dive into niri source code → discovered invisible capture
User caught flash: notification popup, investigated clipboard
User: "Let's think this entire thing through from the perspective of a Security Engineer"
Security analysis → threat model → mitigations → audit logging
Implementation with security docs
User: "Ok, Break down how the skill works for me"
Created detailed technical explanation with diagram
Worklog requested

Key user decisions from security discussion:

Window blocking: handled in niri config, not skill's responsibility
Audit logging: yes, use logger (dotfiles pattern)
User confirmation: no (too invasive)
Sensitive title filtering: no (niri block-out handles it)
Clipboard clearing: maybe, but can't avoid clipboard involvement
Upstream request: yes, file for –no-clipboard flag

Testing results:

✓ capture-focused.sh works
✓ capture-by-title.sh works (after fixing jq syntax)
✓ Cross-workspace capture works invisibly (workspace 2 from workspace 1)
✓ Audit logging works (journalctl shows entries)
✓ Notification popup visible (mako)
✗ clipboard always polluted (confirmed hardcoded)

Interesting discoveries:

niri overview mode doesn't create new renders - just composites existing buffers
Window buffers exist even when not displayed (continuous application rendering)
screenshot-window –id bypasses screen compositor entirely
Security boundary is compositor IPC socket (user-private)
Dotfiles already use logger pattern - consistency win

Comparison to original over-specification:

Original: 635 lines spec, 82 tasks, 115 min, 0 code
This skill: 703 lines total, 107 lines code, ~180 min, working + security docs
Key difference: Built it, understood it, documented threats, shipped with security analysis

Files structure created: ``` skills/ ├── screenshot-latest/ # Simple file-finding (185 lines) │ ├── SKILL.md │ ├── README.md │ └── scripts/find-latest.sh └── niri-window-capture/ # Invisible capture (703 lines) ├── SKILL.md # Agent instructions ├── SECURITY.md # Threat analysis (196 lines!) ├── README.md # User guide ├── UPSTREAM-REQUEST.md # Feature request template ├── IMPLEMENTATION-NOTES.md # Technical details ├── scripts/ │ ├── capture-focused.sh │ ├── capture-by-title.sh │ └── capture-all-windows.sh └── examples/ ├── window-list.txt └── usage-example.sh ```

Timeline estimate:

Source code research: 90 min
Security analysis: 90 min
Implementation: 60 min
Documentation: 60 min
Testing: 30 min
Total: ~330 min (~5.5 hours)

Session Metrics

Commits made: 1 (initial repo commit)
Files created: 27 (untracked)
Lines of code: 107 (bash scripts)
Lines of documentation: 596 (SKILL.md + README + SECURITY + UPSTREAM)
Lines total: ~1500+ (including specs, analysis docs, worklogs)
Skills completed: 2 (screenshot-latest, niri-window-capture)
Security threats identified: 5 (documented in SECURITY.md)
Audit log entries: 3 (from testing)
Source files researched: ~10 (niri compositor codebase)

23 KiB Raw Blame History Unescape Escape