- Transform tufte-press from reference guide to conversation-aware generator - Add JSON generation from conversation context following strict schema - Create build automation scripts with Nix environment handling - Integrate CUPS printing with duplex support - Add comprehensive workflow documentation Scripts added: - skills/tufte-press/scripts/generate-and-build.sh (242 lines) - skills/tufte-press/scripts/build-card.sh (23 lines) Documentation: - Updated SKILL.md with complete workflow instructions (370 lines) - Updated README.md with usage examples (340 lines) - Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines) - Added worklog: 2025-11-10-tufte-press-skill-evolution.org Features: - Agent generates valid JSON from conversation - Schema validation before build (catches errors early) - Automatic Nix shell entry for dependencies - PDF build via tufte-press toolchain - Optional print with duplex support - Self-contained margin notes enforced - Complete end-to-end testing Workflow: Conversation → JSON → Validate → Build → Print Related: niri-window-capture, screenshot-latest, worklog skills
23 KiB
Invisible Window Capture: From Over-Engineering to Production Security
- Session Summary
- Accomplishments
- Key Decisions
- Decision 1: Build invisible capture skill despite security implications
- Decision 2: Use logger for audit trail (not custom logging)
- Decision 3: Accept clipboard pollution, request upstream flag
- Decision 4: Research niri source code before building
- Decision 5: Build two skills, not one monolithic solution
- Problems & Solutions
- Technical Details
- Process and Workflow
- Learning and Insights
- Context for Future Work
- Raw Notes
- Session Metrics
Session Summary
Date: 2025-11-08 (Day 2 of screenshot-analysis feature, session 2)
Focus Area: Discovered and implemented invisible cross-workspace window capture using niri compositor's direct buffer rendering, with comprehensive security analysis
Accomplishments
- Discovered niri can capture windows from inactive workspaces invisibly using direct buffer rendering
- Researched niri source code to understand window capture mechanism (~180 min deep dive)
- Built complete niri-window-capture skill with security documentation (703 lines)
- Implemented audit logging using systemd journal (logger pattern from dotfiles)
- Created comprehensive security analysis (196-line SECURITY.md)
- Tested invisible cross-workspace capture (verified works on workspaces 1 and 2)
- Created upstream feature request template for –no-clipboard flag
- Documented complete technical flow from user intent to screenshot analysis
- Deploy skill to ~/.claude/skills/ (pending user security review)
- File upstream niri issue (template ready)
Key Decisions
Decision 1: Build invisible capture skill despite security implications
- Context: Discovered niri can capture ANY window invisibly - major privacy/security concern
-
Options considered:
- Don't build it - too dangerous, privacy violation
- Build with user confirmation prompts for cross-workspace
- Build with comprehensive security documentation and audit logging
- Build with sensitive title filtering built-in
- Rationale: User explicitly decided (after security discussion) to handle window blocking in niri config, implement audit logging, document security implications thoroughly, but skip user prompts and title filtering
- Impact: Production-ready skill that's powerful but requires security-conscious deployment. Users must read SECURITY.md and configure niri block-out rules before use.
Decision 2: Use logger for audit trail (not custom logging)
- Context: Needed audit trail for all window captures - security requirement
-
Options considered:
- Custom log file (~/.local/share/niri-capture.log)
- Systemd journal via logger -t niri-capture
- Upstream niri audit logging feature request
- No logging (document security risk)
- Rationale: dotfiles already use logger pattern (lid-suspend.sh, power-status.sh etc). Consistent with existing system, uses systemd journal (queryable with journalctl), standard Linux utility.
- Impact: All captures logged with: timestamp, window ID, title, workspace. Viewable with journalctl –user -t niri-capture. Follows established dotfiles patterns.
Decision 3: Accept clipboard pollution, request upstream flag
- Context: niri hardcodes clipboard copy in save_screenshot() - cannot disable
-
Options considered:
- Accept it, document behavior
- Save/restore clipboard (fragile, doesn't preserve mime types)
- Clear clipboard after AI reads (destroys user clipboard)
- File upstream PR for –no-clipboard flag
- Rationale: Clipboard save/restore too fragile. Clear-after breaks user workflow. Best solution is upstream flag. For now, document the behavior clearly in security docs.
- Impact: Users must be aware screenshots persist in clipboard. Clipboard history tools will log all captures. Created UPSTREAM-REQUEST.md template for niri feature request.
Decision 4: Research niri source code before building
- Context: Needed to understand if invisible cross-workspace capture was possible
-
Options considered:
- Assume overview mode is only way (requires visible flicker)
- Test empirically without source code research
- Deep dive into niri compositor source code
- Ask in niri community channels
- Rationale: Source code reveals actual capabilities vs assumptions. Found screenshot-window –id command works on any window regardless of workspace. Discovered mapped.render() with RenderTarget::ScreenCapture bypasses screen compositing.
- Impact: Unlocked invisible capture capability. Understood security implications from implementation details. Documented exact technical flow. Time well spent (~90 min research).
Decision 5: Build two skills, not one monolithic solution
- Context: Started with "find last screenshot" but discovered broader capabilities
-
Options considered:
- One combined skill (find existing + capture new)
- Two separate skills (screenshot-latest + niri-window-capture)
- Just the capture skill (skip file-finding)
- Rationale: screenshot-latest solves "find existing files" (simple, safe). niri-window-capture solves "capture any window" (powerful, security-sensitive). Different use cases, different risk profiles, cleaner separation.
- Impact: screenshot-latest: 185 lines, safe, ready to deploy. niri-window-capture: 703 lines, powerful, requires security review. Users can deploy one without the other.
Problems & Solutions
| Problem | Solution | Learning | |||
|---|---|---|---|---|---|
| Overview mode captures all workspaces but causes ~450ms visible flicker | Researched niri source, discovered screenshot-window –id renders buffers directly without compositing. Tested on inactive workspace - works invisibly. | niri maintains window buffers in memory even when not displayed. Direct buffer rendering bypasses screen compositor entirely. This is how screenshot-window achieves invisible capture. | |||
| Unclear if windows on inactive workspaces can be captured | Traced through niri source: Mapped struct holds Window (smithay), Window wraps Wayland surface buffer. Applications continuously render to buffers regardless of workspace visibility. | Wayland applications always render to surface buffers. Compositor decides what to composite to screen, but buffers exist independently. Overview mode doesn't create new renders - just composites existing buffers at smaller scale. | |||
| jq parse error in capture-by-title.sh - multiple windows matched search | Changed from piping multiple objects to using jq map/select/first: `jq 'map(select(…)) | .[0]'` instead of `jq '.[] | select(…) | head -1'` | When jq outputs multiple JSON objects, bash sees multiple lines but they're not valid as single JSON. Use jq array operations (map) then select first element [0] for single valid output. |
| niri always copies screenshots to clipboard - cannot disable | Researched source: set_data_device_selection() hardcoded in save_screenshot(). Created UPSTREAM-REQUEST.md for –no-clipboard flag. Documented behavior in SECURITY.md. | Clipboard pollution unavoidable with current niri. Future upstream flag needed. Document clearly so users understand privacy implications (clipboard history tools log screenshots). | |||
| Needed audit logging pattern - how to match dotfiles style | Searched dotfiles: rg "logger" ~/proj/dotfiles. Found lid-suspend-action.sh uses: logger -t "$LOG_TAG" "message". Systemd journal pattern. | Dotfiles use logger -t <tag> for audit trails. Viewable with journalctl –user -t <tag>. Standard Linux utility from util-linux. Perfect for capture audit trail. |
Technical Details
Code Changes
- Total files created: 27
-
Key files created:
- `skills/niri-window-capture/SKILL.md` (184 lines) - Agent instructions with security warnings
- `skills/niri-window-capture/SECURITY.md` (196 lines) - Comprehensive security analysis, threat model, mitigations
- `skills/niri-window-capture/scripts/capture-focused.sh` (31 lines) - Capture current window with audit logging
- `skills/niri-window-capture/scripts/capture-by-title.sh` (40 lines) - Find and capture by title match
- `skills/niri-window-capture/UPSTREAM-REQUEST.md` (108 lines) - Feature request for –no-clipboard flag
- `skills/screenshot-latest/SKILL.md` (83 lines) - Simple file-finding skill
- `skills/screenshot-latest/scripts/find-latest.sh` (22 lines) - One-liner: ls -t | head -1
- `specs/001-screenshot-analysis/RESET.md` - Over-engineering analysis
- `specs/001-screenshot-analysis/COMPARISON.md` - Spec vs implementation reality
- `specs/001-screenshot-analysis/SECURITY.md` - Security findings
- `docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org` - Previous session worklog
-
Over-specification archived (not deleted):
- `specs/001-screenshot-analysis/spec.md` (165 lines) - Over-engineered
- `specs/001-screenshot-analysis/plan.md` (139 lines) - Premature
- `specs/001-screenshot-analysis/tasks.md` (331 lines) - 82 unnecessary tasks
Commands Used
Testing niri window capture: ```bash
niri msg –json windows | jq -r '.[] | "\(.id) - \(.title) - WS:\(.workspace_id)"'
niri msg action screenshot-window –id <WINDOW_ID> –write-to-disk true
WINDOW_ID=$(niri msg --json windows | jq -r '.[] | select(.workspace_id == 2) | .id' | head -1) niri msg action screenshot-window --id "$WINDOW_ID" –write-to-disk true
```
Verifying niri capabilities: ```bash
grim -g "0,0 100x100" - | file -
niri msg action toggle-overview sleep 0.5 grim /tmp/overview-test.png niri msg action toggle-overview
niri msg –json focused-window | jq '.' niri msg –json windows | jq '.[0]'
```
Audit log viewing: ```bash
journalctl –user -t niri-capture
journalctl –user -t niri-capture -n 20
journalctl –user -t niri-capture –since today
journalctl –user -t niri-capture -f ```
Architecture Notes
niri compositor window rendering architecture (discovered via source code research):
-
Window buffer lifecycle:
- Applications render to Wayland surface buffers continuously
- niri compositor holds references via `Mapped` struct containing `Window` (smithay)
- Buffers exist in memory regardless of workspace visibility
- Compositor decides what to composite to outputs, but buffers persist
-
Direct buffer rendering (key discovery): ```rust // From niri/src/niri.rs screenshot_window() let elements = mapped.render( renderer, mapped.window.geometry().loc.to_f64(), scale, alpha, RenderTarget::ScreenCapture, // ← Key: not Output ); ```
- `RenderTarget::ScreenCapture` renders to offscreen texture
- No compositing to screen output required
- Works for windows on any workspace
-
Security model:
- Access control: niri IPC socket permissions (`srwxr-xr-x` user-private)
- Any process as user can capture any window
- Protection: niri window rules `block-out-from "screen-capture"`
- Audit: systemd journal via logger
-
Clipboard behavior (hardcoded): ```rust // From save_screenshot() set_data_device_selection( &state.niri.display_handle, &state.niri.seat, vec![String::from("image/png")], buf.clone(), ); ```
- Always copies PNG to clipboard
- No flag to disable
- Runs in separate thread after encoding
Security Considerations
Threat model (documented in SECURITY.md):
- Local privilege escalation: Any compromised process as user can capture any window
- Cross-workspace privacy: Users may assume inactive workspaces are "private" - they're not
- Clipboard side channel: Every capture overwrites clipboard, persists in clipboard history
- No audit trail: Added via logger -t niri-capture (systemd journal)
- Invisible to user: No workspace switch, no screen flicker (except notification popup)
Mitigations implemented:
- Audit logging: All captures logged with window ID, title, workspace
- Security documentation: 196-line SECURITY.md with threat analysis
- Clear warnings: Security notices in SKILL.md and README.md
- Example protection: Block-out rules for password managers in docs
- Logged metadata: Can review what was captured via journalctl
Protection mechanisms recommended to users:
- Enable niri window rules for sensitive apps: ```kdl window-rule { match app-id=r#"^org\.keepassxc\.KeePassXC$"# block-out-from "screen-capture" } ```
- Review audit logs regularly: `journalctl –user -t niri-capture`
- Ensure screenshot directory private: `chmod 700 ~/Pictures/Screenshots`
- Clear sensitive screenshots after AI analysis
- Be aware clipboard contains last screenshot
Process and Workflow
What Worked Well
- Source code research: Diving into niri source revealed invisible capture capability vs assuming overview was only option
- Security-first thinking: Stopping to think like Security Engineer caught major privacy implications
- Iterative exploration: grim → overview → source code → screenshot-window discovery path
- Following dotfiles patterns: logger usage matches existing system, no new patterns invented
- Testing on real system: Verified cross-workspace capture actually works invisibly
- Comprehensive documentation: Security analysis forced clarity about risks and mitigations
- User involvement: Security discussion led to clear decisions on what to implement vs skip
What Was Challenging
- Scope creep awareness: Started with "find screenshot" became "invisible window capture" - had to recognize the pivot
- Security vs usability tension: Powerful capability has privacy implications - balancing both
- Clipboard limitation: niri hardcodes clipboard copy, no way around it, had to accept and document
- jq JSON parsing: Multiple match objects required different jq syntax than expected
- Deciding what not to build: Resisting adding user prompts, sensitive filtering, clipboard workarounds
- Documentation depth: Security analysis took longer than code implementation (~90 min vs ~60 min)
Learning and Insights
Technical Insights
Wayland compositor architecture:
- Compositors maintain window surface buffers in memory continuously
- Applications render to buffers regardless of workspace visibility
- "Invisible workspace" just means "not composited to output" not "buffer doesn't exist"
- Overview mode doesn't create renders - composites existing buffers at smaller scale
- Direct buffer rendering (ScreenCapture target) bypasses screen output entirely
niri implementation details:
- Uses smithay library for Wayland protocol handling
- Mapped struct wraps Window which wraps surface buffers
- screenshot-window action calls mapped.render() with ScreenCapture target
- Renders to offscreen texture, converts to PNG, saves to file
- Clipboard copy hardcoded in save_screenshot() - no conditional logic
Audit logging pattern:
- logger -t <tag> sends to systemd journal
- journalctl –user -t <tag> queries by tag
- Standard Linux utility from util-linux package
- Dotfiles already use this pattern (lid-suspend, power management)
- Better than custom log files (integrated with system logging)
Process Insights
When to research source code:
- When assumptions limit solution space (overview only? wrong)
- When documentation doesn't cover use case (invisible capture not documented)
- When security implications unclear (need to understand internals)
- When API behavior seems inconsistent (clipboard always copied - why?)
- Cost: 90 minutes research. Benefit: Unlocked invisible capture + understood security model.
Security documentation value:
- Forces explicit threat modeling
- Reveals hidden assumptions (user thinks workspace 2 is "private")
- Clarifies trust boundaries (compositor IPC socket = security boundary)
- Documents mitigations for future reference
- Helps users make informed deployment decisions
- 196 lines of security docs = confidence in deployment
Specification vs implementation timing:
- Simple problems (find latest file): Code first, document after
- Complex problems (invisible capture): Research first, build second
- Security-sensitive features: Document threats before building
- Unknown capabilities: Research, prototype, then specify
- This problem: Research revealed capability, then built + documented simultaneously
Architectural Insights
Compositor as security boundary:
- Wayland design: compositor is trusted, clients are not
- Compositor has god-mode access to all window buffers
- Access control is IPC socket permissions (user-level)
- Applications cannot capture each other (must go through compositor)
- This skill leverages compositor IPC to do what apps cannot
Buffer vs display separation:
- Window buffers: Always exist, continuously updated by apps
- Screen composition: Compositor's choice what to display when
- This separation enables: invisible capture, overview modes, effects
- Security implication: "hidden" windows aren't hidden from compositor
Audit trail architecture:
- Systemd journal as system-wide audit log
- Tagged entries (logger -t) for filtering
- Centralized vs per-tool log files
- Query interface (journalctl) with time ranges, filtering
- Integration with system logging infrastructure
Context for Future Work
Open Questions
Clipboard behavior:
- Will niri upstream accept –no-clipboard flag? (template ready to file)
- Can clipboard save/restore work reliably for all mime types?
- Should AI clear clipboard after reading screenshot?
- How do clipboard history tools handle image/png? (privacy leak)
Security enhancements:
- Should notification popup be suppressed for invisible captures?
- Does mako support per-app notification filtering?
- Should captures from other workspaces trigger different notification?
- Is there value in upstream niri audit logging vs logger?
User experience:
- Will users actually read 196-line SECURITY.md?
- Should there be a quickstart with "minimum security setup"?
- How to make audit log review part of normal workflow?
- Should skill refuse to capture if block-out rules not configured?
Integration:
- How does this skill compose with other skills?
- Should screenshot-latest and niri-window-capture be merged?
- Can this enable new use cases (find error messages across all workspaces)?
- Should there be skill for "capture all windows and search"?
Next Steps
Immediate (user actions):
- Review SECURITY.md thoroughly
- Configure niri block-out rules for password managers
- Test skill: `./skills/niri-window-capture/scripts/capture-focused.sh`
- Review audit log: `journalctl –user -t niri-capture`
- Decide whether to deploy to ~/.claude/skills/
Short term (if deployed):
- Monitor audit logs for unexpected captures
- Test cross-workspace capture workflows
- Verify block-out rules work (try capturing password manager)
- Get user feedback on security comfort level
Upstream niri:
- File issue using UPSTREAM-REQUEST.md template
- Request –no-clipboard flag for screenshot-window action
- Discuss security documentation for invisible capture
- Potentially contribute PR for flag (if accepted)
Documentation improvements:
- Add quickstart security setup guide
- Create video/diagram showing invisible capture flow
- Document common use cases (find error messages, compare windows)
- Write integration examples with other skills
Related Work
- screenshot-latest skill: Simple file-finding (completed)
- niri compositor: https://github.com/YaLTeR/niri
- Wayland security model: Compositor as security boundary
- Dotfiles logging pattern: ~/proj/dotfiles/bin/lid-suspend-action.sh
- Previous worklog: docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org
- Smithay Wayland library: https://github.com/Smithay/smithay
- wl-clipboard tools: wl-copy, wl-paste for Wayland clipboard
- systemd journal: journalctl for audit log viewing
Raw Notes
Session flow:
- Resumed from previous session's over-engineering discovery
- User asked: "let's go back and focus on what's possible in terms of skipping the screenshot"
- Tested grim - (stdout): works
- Explored overview mode: works but visible flicker
- User asked: "what about for what's not on the active workspace/windows"
- Deep dive into niri source code → discovered invisible capture
- User caught flash: notification popup, investigated clipboard
- User: "Let's think this entire thing through from the perspective of a Security Engineer"
- Security analysis → threat model → mitigations → audit logging
- Implementation with security docs
- User: "Ok, Break down how the skill works for me"
- Created detailed technical explanation with diagram
- Worklog requested
Key user decisions from security discussion:
- Window blocking: handled in niri config, not skill's responsibility
- Audit logging: yes, use logger (dotfiles pattern)
- User confirmation: no (too invasive)
- Sensitive title filtering: no (niri block-out handles it)
- Clipboard clearing: maybe, but can't avoid clipboard involvement
- Upstream request: yes, file for –no-clipboard flag
Testing results:
- ✓ capture-focused.sh works
- ✓ capture-by-title.sh works (after fixing jq syntax)
- ✓ Cross-workspace capture works invisibly (workspace 2 from workspace 1)
- ✓ Audit logging works (journalctl shows entries)
- ✓ Notification popup visible (mako)
- ✗ clipboard always polluted (confirmed hardcoded)
Interesting discoveries:
- niri overview mode doesn't create new renders - just composites existing buffers
- Window buffers exist even when not displayed (continuous application rendering)
- screenshot-window –id bypasses screen compositor entirely
- Security boundary is compositor IPC socket (user-private)
- Dotfiles already use logger pattern - consistency win
Comparison to original over-specification:
- Original: 635 lines spec, 82 tasks, 115 min, 0 code
- This skill: 703 lines total, 107 lines code, ~180 min, working + security docs
- Key difference: Built it, understood it, documented threats, shipped with security analysis
Files structure created: ``` skills/ ├── screenshot-latest/ # Simple file-finding (185 lines) │ ├── SKILL.md │ ├── README.md │ └── scripts/find-latest.sh └── niri-window-capture/ # Invisible capture (703 lines) ├── SKILL.md # Agent instructions ├── SECURITY.md # Threat analysis (196 lines!) ├── README.md # User guide ├── UPSTREAM-REQUEST.md # Feature request template ├── IMPLEMENTATION-NOTES.md # Technical details ├── scripts/ │ ├── capture-focused.sh │ ├── capture-by-title.sh │ └── capture-all-windows.sh └── examples/ ├── window-list.txt └── usage-example.sh ```
Timeline estimate:
- Source code research: 90 min
- Security analysis: 90 min
- Implementation: 60 min
- Documentation: 60 min
- Testing: 30 min
- Total: ~330 min (~5.5 hours)
Session Metrics
- Commits made: 1 (initial repo commit)
- Files created: 27 (untracked)
- Lines of code: 107 (bash scripts)
- Lines of documentation: 596 (SKILL.md + README + SECURITY + UPSTREAM)
- Lines total: ~1500+ (including specs, analysis docs, worklogs)
- Skills completed: 2 (screenshot-latest, niri-window-capture)
- Security threats identified: 5 (documented in SECURITY.md)
- Audit log entries: 3 (from testing)
- Source files researched: ~10 (niri compositor codebase)