skills/docs/worklogs/2025-11-08-invisible-window-capture-niri.org
dan 5fea49b7c0 feat(tufte-press): evolve skill to complete workflow with JSON generation and build automation
- Transform tufte-press from reference guide to conversation-aware generator
- Add JSON generation from conversation context following strict schema
- Create build automation scripts with Nix environment handling
- Integrate CUPS printing with duplex support
- Add comprehensive workflow documentation

Scripts added:
- skills/tufte-press/scripts/generate-and-build.sh (242 lines)
- skills/tufte-press/scripts/build-card.sh (23 lines)

Documentation:
- Updated SKILL.md with complete workflow instructions (370 lines)
- Updated README.md with usage examples (340 lines)
- Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines)
- Added worklog: 2025-11-10-tufte-press-skill-evolution.org

Features:
- Agent generates valid JSON from conversation
- Schema validation before build (catches errors early)
- Automatic Nix shell entry for dependencies
- PDF build via tufte-press toolchain
- Optional print with duplex support
- Self-contained margin notes enforced
- Complete end-to-end testing

Workflow: Conversation → JSON → Validate → Build → Print

Related: niri-window-capture, screenshot-latest, worklog skills
2025-11-10 15:03:44 -08:00

23 KiB
Raw Permalink Blame History

Invisible Window Capture: From Over-Engineering to Production Security

Session Summary

Date: 2025-11-08 (Day 2 of screenshot-analysis feature, session 2)

Focus Area: Discovered and implemented invisible cross-workspace window capture using niri compositor's direct buffer rendering, with comprehensive security analysis

Accomplishments

  • Discovered niri can capture windows from inactive workspaces invisibly using direct buffer rendering
  • Researched niri source code to understand window capture mechanism (~180 min deep dive)
  • Built complete niri-window-capture skill with security documentation (703 lines)
  • Implemented audit logging using systemd journal (logger pattern from dotfiles)
  • Created comprehensive security analysis (196-line SECURITY.md)
  • Tested invisible cross-workspace capture (verified works on workspaces 1 and 2)
  • Created upstream feature request template for no-clipboard flag
  • Documented complete technical flow from user intent to screenshot analysis
  • Deploy skill to ~/.claude/skills/ (pending user security review)
  • File upstream niri issue (template ready)

Key Decisions

Decision 1: Build invisible capture skill despite security implications

  • Context: Discovered niri can capture ANY window invisibly - major privacy/security concern
  • Options considered:

    1. Don't build it - too dangerous, privacy violation
    2. Build with user confirmation prompts for cross-workspace
    3. Build with comprehensive security documentation and audit logging
    4. Build with sensitive title filtering built-in
  • Rationale: User explicitly decided (after security discussion) to handle window blocking in niri config, implement audit logging, document security implications thoroughly, but skip user prompts and title filtering
  • Impact: Production-ready skill that's powerful but requires security-conscious deployment. Users must read SECURITY.md and configure niri block-out rules before use.

Decision 2: Use logger for audit trail (not custom logging)

  • Context: Needed audit trail for all window captures - security requirement
  • Options considered:

    1. Custom log file (~/.local/share/niri-capture.log)
    2. Systemd journal via logger -t niri-capture
    3. Upstream niri audit logging feature request
    4. No logging (document security risk)
  • Rationale: dotfiles already use logger pattern (lid-suspend.sh, power-status.sh etc). Consistent with existing system, uses systemd journal (queryable with journalctl), standard Linux utility.
  • Impact: All captures logged with: timestamp, window ID, title, workspace. Viewable with journalctl user -t niri-capture. Follows established dotfiles patterns.

Decision 3: Accept clipboard pollution, request upstream flag

  • Context: niri hardcodes clipboard copy in save_screenshot() - cannot disable
  • Options considered:

    1. Accept it, document behavior
    2. Save/restore clipboard (fragile, doesn't preserve mime types)
    3. Clear clipboard after AI reads (destroys user clipboard)
    4. File upstream PR for no-clipboard flag
  • Rationale: Clipboard save/restore too fragile. Clear-after breaks user workflow. Best solution is upstream flag. For now, document the behavior clearly in security docs.
  • Impact: Users must be aware screenshots persist in clipboard. Clipboard history tools will log all captures. Created UPSTREAM-REQUEST.md template for niri feature request.

Decision 4: Research niri source code before building

  • Context: Needed to understand if invisible cross-workspace capture was possible
  • Options considered:

    1. Assume overview mode is only way (requires visible flicker)
    2. Test empirically without source code research
    3. Deep dive into niri compositor source code
    4. Ask in niri community channels
  • Rationale: Source code reveals actual capabilities vs assumptions. Found screenshot-window id command works on any window regardless of workspace. Discovered mapped.render() with RenderTarget::ScreenCapture bypasses screen compositing.
  • Impact: Unlocked invisible capture capability. Understood security implications from implementation details. Documented exact technical flow. Time well spent (~90 min research).

Decision 5: Build two skills, not one monolithic solution

  • Context: Started with "find last screenshot" but discovered broader capabilities
  • Options considered:

    1. One combined skill (find existing + capture new)
    2. Two separate skills (screenshot-latest + niri-window-capture)
    3. Just the capture skill (skip file-finding)
  • Rationale: screenshot-latest solves "find existing files" (simple, safe). niri-window-capture solves "capture any window" (powerful, security-sensitive). Different use cases, different risk profiles, cleaner separation.
  • Impact: screenshot-latest: 185 lines, safe, ready to deploy. niri-window-capture: 703 lines, powerful, requires security review. Users can deploy one without the other.

Problems & Solutions

Problem Solution Learning
Overview mode captures all workspaces but causes ~450ms visible flicker Researched niri source, discovered screenshot-window id renders buffers directly without compositing. Tested on inactive workspace - works invisibly. niri maintains window buffers in memory even when not displayed. Direct buffer rendering bypasses screen compositor entirely. This is how screenshot-window achieves invisible capture.
Unclear if windows on inactive workspaces can be captured Traced through niri source: Mapped struct holds Window (smithay), Window wraps Wayland surface buffer. Applications continuously render to buffers regardless of workspace visibility. Wayland applications always render to surface buffers. Compositor decides what to composite to screen, but buffers exist independently. Overview mode doesn't create new renders - just composites existing buffers at smaller scale.
jq parse error in capture-by-title.sh - multiple windows matched search Changed from piping multiple objects to using jq map/select/first: `jq 'map(select(…)) .[0]'` instead of `jq '.[] select(…) head -1'` When jq outputs multiple JSON objects, bash sees multiple lines but they're not valid as single JSON. Use jq array operations (map) then select first element [0] for single valid output.
niri always copies screenshots to clipboard - cannot disable Researched source: set_data_device_selection() hardcoded in save_screenshot(). Created UPSTREAM-REQUEST.md for no-clipboard flag. Documented behavior in SECURITY.md. Clipboard pollution unavoidable with current niri. Future upstream flag needed. Document clearly so users understand privacy implications (clipboard history tools log screenshots).
Needed audit logging pattern - how to match dotfiles style Searched dotfiles: rg "logger" ~/proj/dotfiles. Found lid-suspend-action.sh uses: logger -t "$LOG_TAG" "message". Systemd journal pattern. Dotfiles use logger -t <tag> for audit trails. Viewable with journalctl user -t <tag>. Standard Linux utility from util-linux. Perfect for capture audit trail.

Technical Details

Code Changes

  • Total files created: 27
  • Key files created:

    • `skills/niri-window-capture/SKILL.md` (184 lines) - Agent instructions with security warnings
    • `skills/niri-window-capture/SECURITY.md` (196 lines) - Comprehensive security analysis, threat model, mitigations
    • `skills/niri-window-capture/scripts/capture-focused.sh` (31 lines) - Capture current window with audit logging
    • `skills/niri-window-capture/scripts/capture-by-title.sh` (40 lines) - Find and capture by title match
    • `skills/niri-window-capture/UPSTREAM-REQUEST.md` (108 lines) - Feature request for no-clipboard flag
    • `skills/screenshot-latest/SKILL.md` (83 lines) - Simple file-finding skill
    • `skills/screenshot-latest/scripts/find-latest.sh` (22 lines) - One-liner: ls -t | head -1
    • `specs/001-screenshot-analysis/RESET.md` - Over-engineering analysis
    • `specs/001-screenshot-analysis/COMPARISON.md` - Spec vs implementation reality
    • `specs/001-screenshot-analysis/SECURITY.md` - Security findings
    • `docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org` - Previous session worklog
  • Over-specification archived (not deleted):

    • `specs/001-screenshot-analysis/spec.md` (165 lines) - Over-engineered
    • `specs/001-screenshot-analysis/plan.md` (139 lines) - Premature
    • `specs/001-screenshot-analysis/tasks.md` (331 lines) - 82 unnecessary tasks

Commands Used

Testing niri window capture: ```bash

niri msg json windows | jq -r '.[] | "\(.id) - \(.title) - WS:\(.workspace_id)"'

niri msg action screenshot-window id <WINDOW_ID> write-to-disk true

WINDOW_ID=$(niri msg --json windows | jq -r '.[] | select(.workspace_id == 2) | .id' | head -1) niri msg action screenshot-window --id "$WINDOW_ID" write-to-disk true

```

Verifying niri capabilities: ```bash

grim -g "0,0 100x100" - | file -

niri msg action toggle-overview sleep 0.5 grim /tmp/overview-test.png niri msg action toggle-overview

niri msg json focused-window | jq '.' niri msg json windows | jq '.[0]'

```

Audit log viewing: ```bash

journalctl user -t niri-capture

journalctl user -t niri-capture -n 20

journalctl user -t niri-capture since today

journalctl user -t niri-capture -f ```

Architecture Notes

niri compositor window rendering architecture (discovered via source code research):

  1. Window buffer lifecycle:

    • Applications render to Wayland surface buffers continuously
    • niri compositor holds references via `Mapped` struct containing `Window` (smithay)
    • Buffers exist in memory regardless of workspace visibility
    • Compositor decides what to composite to outputs, but buffers persist
  2. Direct buffer rendering (key discovery): ```rust // From niri/src/niri.rs screenshot_window() let elements = mapped.render( renderer, mapped.window.geometry().loc.to_f64(), scale, alpha, RenderTarget::ScreenCapture, // ← Key: not Output ); ```

    • `RenderTarget::ScreenCapture` renders to offscreen texture
    • No compositing to screen output required
    • Works for windows on any workspace
  3. Security model:

    • Access control: niri IPC socket permissions (`srwxr-xr-x` user-private)
    • Any process as user can capture any window
    • Protection: niri window rules `block-out-from "screen-capture"`
    • Audit: systemd journal via logger
  4. Clipboard behavior (hardcoded): ```rust // From save_screenshot() set_data_device_selection( &state.niri.display_handle, &state.niri.seat, vec![String::from("image/png")], buf.clone(), ); ```

    • Always copies PNG to clipboard
    • No flag to disable
    • Runs in separate thread after encoding

Security Considerations

Threat model (documented in SECURITY.md):

  • Local privilege escalation: Any compromised process as user can capture any window
  • Cross-workspace privacy: Users may assume inactive workspaces are "private" - they're not
  • Clipboard side channel: Every capture overwrites clipboard, persists in clipboard history
  • No audit trail: Added via logger -t niri-capture (systemd journal)
  • Invisible to user: No workspace switch, no screen flicker (except notification popup)

Mitigations implemented:

  1. Audit logging: All captures logged with window ID, title, workspace
  2. Security documentation: 196-line SECURITY.md with threat analysis
  3. Clear warnings: Security notices in SKILL.md and README.md
  4. Example protection: Block-out rules for password managers in docs
  5. Logged metadata: Can review what was captured via journalctl

Protection mechanisms recommended to users:

  1. Enable niri window rules for sensitive apps: ```kdl window-rule { match app-id=r#"^org\.keepassxc\.KeePassXC$"# block-out-from "screen-capture" } ```
  2. Review audit logs regularly: `journalctl user -t niri-capture`
  3. Ensure screenshot directory private: `chmod 700 ~/Pictures/Screenshots`
  4. Clear sensitive screenshots after AI analysis
  5. Be aware clipboard contains last screenshot

Process and Workflow

What Worked Well

  • Source code research: Diving into niri source revealed invisible capture capability vs assuming overview was only option
  • Security-first thinking: Stopping to think like Security Engineer caught major privacy implications
  • Iterative exploration: grim → overview → source code → screenshot-window discovery path
  • Following dotfiles patterns: logger usage matches existing system, no new patterns invented
  • Testing on real system: Verified cross-workspace capture actually works invisibly
  • Comprehensive documentation: Security analysis forced clarity about risks and mitigations
  • User involvement: Security discussion led to clear decisions on what to implement vs skip

What Was Challenging

  • Scope creep awareness: Started with "find screenshot" became "invisible window capture" - had to recognize the pivot
  • Security vs usability tension: Powerful capability has privacy implications - balancing both
  • Clipboard limitation: niri hardcodes clipboard copy, no way around it, had to accept and document
  • jq JSON parsing: Multiple match objects required different jq syntax than expected
  • Deciding what not to build: Resisting adding user prompts, sensitive filtering, clipboard workarounds
  • Documentation depth: Security analysis took longer than code implementation (~90 min vs ~60 min)

Learning and Insights

Technical Insights

Wayland compositor architecture:

  • Compositors maintain window surface buffers in memory continuously
  • Applications render to buffers regardless of workspace visibility
  • "Invisible workspace" just means "not composited to output" not "buffer doesn't exist"
  • Overview mode doesn't create renders - composites existing buffers at smaller scale
  • Direct buffer rendering (ScreenCapture target) bypasses screen output entirely

niri implementation details:

  • Uses smithay library for Wayland protocol handling
  • Mapped struct wraps Window which wraps surface buffers
  • screenshot-window action calls mapped.render() with ScreenCapture target
  • Renders to offscreen texture, converts to PNG, saves to file
  • Clipboard copy hardcoded in save_screenshot() - no conditional logic

Audit logging pattern:

  • logger -t <tag> sends to systemd journal
  • journalctl user -t <tag> queries by tag
  • Standard Linux utility from util-linux package
  • Dotfiles already use this pattern (lid-suspend, power management)
  • Better than custom log files (integrated with system logging)

Process Insights

When to research source code:

  • When assumptions limit solution space (overview only? wrong)
  • When documentation doesn't cover use case (invisible capture not documented)
  • When security implications unclear (need to understand internals)
  • When API behavior seems inconsistent (clipboard always copied - why?)
  • Cost: 90 minutes research. Benefit: Unlocked invisible capture + understood security model.

Security documentation value:

  • Forces explicit threat modeling
  • Reveals hidden assumptions (user thinks workspace 2 is "private")
  • Clarifies trust boundaries (compositor IPC socket = security boundary)
  • Documents mitigations for future reference
  • Helps users make informed deployment decisions
  • 196 lines of security docs = confidence in deployment

Specification vs implementation timing:

  • Simple problems (find latest file): Code first, document after
  • Complex problems (invisible capture): Research first, build second
  • Security-sensitive features: Document threats before building
  • Unknown capabilities: Research, prototype, then specify
  • This problem: Research revealed capability, then built + documented simultaneously

Architectural Insights

Compositor as security boundary:

  • Wayland design: compositor is trusted, clients are not
  • Compositor has god-mode access to all window buffers
  • Access control is IPC socket permissions (user-level)
  • Applications cannot capture each other (must go through compositor)
  • This skill leverages compositor IPC to do what apps cannot

Buffer vs display separation:

  • Window buffers: Always exist, continuously updated by apps
  • Screen composition: Compositor's choice what to display when
  • This separation enables: invisible capture, overview modes, effects
  • Security implication: "hidden" windows aren't hidden from compositor

Audit trail architecture:

  • Systemd journal as system-wide audit log
  • Tagged entries (logger -t) for filtering
  • Centralized vs per-tool log files
  • Query interface (journalctl) with time ranges, filtering
  • Integration with system logging infrastructure

Context for Future Work

Open Questions

Clipboard behavior:

  • Will niri upstream accept no-clipboard flag? (template ready to file)
  • Can clipboard save/restore work reliably for all mime types?
  • Should AI clear clipboard after reading screenshot?
  • How do clipboard history tools handle image/png? (privacy leak)

Security enhancements:

  • Should notification popup be suppressed for invisible captures?
  • Does mako support per-app notification filtering?
  • Should captures from other workspaces trigger different notification?
  • Is there value in upstream niri audit logging vs logger?

User experience:

  • Will users actually read 196-line SECURITY.md?
  • Should there be a quickstart with "minimum security setup"?
  • How to make audit log review part of normal workflow?
  • Should skill refuse to capture if block-out rules not configured?

Integration:

  • How does this skill compose with other skills?
  • Should screenshot-latest and niri-window-capture be merged?
  • Can this enable new use cases (find error messages across all workspaces)?
  • Should there be skill for "capture all windows and search"?

Next Steps

Immediate (user actions):

  1. Review SECURITY.md thoroughly
  2. Configure niri block-out rules for password managers
  3. Test skill: `./skills/niri-window-capture/scripts/capture-focused.sh`
  4. Review audit log: `journalctl user -t niri-capture`
  5. Decide whether to deploy to ~/.claude/skills/

Short term (if deployed):

  1. Monitor audit logs for unexpected captures
  2. Test cross-workspace capture workflows
  3. Verify block-out rules work (try capturing password manager)
  4. Get user feedback on security comfort level

Upstream niri:

  1. File issue using UPSTREAM-REQUEST.md template
  2. Request no-clipboard flag for screenshot-window action
  3. Discuss security documentation for invisible capture
  4. Potentially contribute PR for flag (if accepted)

Documentation improvements:

  1. Add quickstart security setup guide
  2. Create video/diagram showing invisible capture flow
  3. Document common use cases (find error messages, compare windows)
  4. Write integration examples with other skills

Related Work

  • screenshot-latest skill: Simple file-finding (completed)
  • niri compositor: https://github.com/YaLTeR/niri
  • Wayland security model: Compositor as security boundary
  • Dotfiles logging pattern: ~/proj/dotfiles/bin/lid-suspend-action.sh
  • Previous worklog: docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org
  • Smithay Wayland library: https://github.com/Smithay/smithay
  • wl-clipboard tools: wl-copy, wl-paste for Wayland clipboard
  • systemd journal: journalctl for audit log viewing

Raw Notes

Session flow:

  1. Resumed from previous session's over-engineering discovery
  2. User asked: "let's go back and focus on what's possible in terms of skipping the screenshot"
  3. Tested grim - (stdout): works
  4. Explored overview mode: works but visible flicker
  5. User asked: "what about for what's not on the active workspace/windows"
  6. Deep dive into niri source code → discovered invisible capture
  7. User caught flash: notification popup, investigated clipboard
  8. User: "Let's think this entire thing through from the perspective of a Security Engineer"
  9. Security analysis → threat model → mitigations → audit logging
  10. Implementation with security docs
  11. User: "Ok, Break down how the skill works for me"
  12. Created detailed technical explanation with diagram
  13. Worklog requested

Key user decisions from security discussion:

  • Window blocking: handled in niri config, not skill's responsibility
  • Audit logging: yes, use logger (dotfiles pattern)
  • User confirmation: no (too invasive)
  • Sensitive title filtering: no (niri block-out handles it)
  • Clipboard clearing: maybe, but can't avoid clipboard involvement
  • Upstream request: yes, file for no-clipboard flag

Testing results:

  • ✓ capture-focused.sh works
  • ✓ capture-by-title.sh works (after fixing jq syntax)
  • ✓ Cross-workspace capture works invisibly (workspace 2 from workspace 1)
  • ✓ Audit logging works (journalctl shows entries)
  • ✓ Notification popup visible (mako)
  • ✗ clipboard always polluted (confirmed hardcoded)

Interesting discoveries:

  • niri overview mode doesn't create new renders - just composites existing buffers
  • Window buffers exist even when not displayed (continuous application rendering)
  • screenshot-window id bypasses screen compositor entirely
  • Security boundary is compositor IPC socket (user-private)
  • Dotfiles already use logger pattern - consistency win

Comparison to original over-specification:

  • Original: 635 lines spec, 82 tasks, 115 min, 0 code
  • This skill: 703 lines total, 107 lines code, ~180 min, working + security docs
  • Key difference: Built it, understood it, documented threats, shipped with security analysis

Files structure created: ``` skills/ ├── screenshot-latest/ # Simple file-finding (185 lines) │ ├── SKILL.md │ ├── README.md │ └── scripts/find-latest.sh └── niri-window-capture/ # Invisible capture (703 lines) ├── SKILL.md # Agent instructions ├── SECURITY.md # Threat analysis (196 lines!) ├── README.md # User guide ├── UPSTREAM-REQUEST.md # Feature request template ├── IMPLEMENTATION-NOTES.md # Technical details ├── scripts/ │ ├── capture-focused.sh │ ├── capture-by-title.sh │ └── capture-all-windows.sh └── examples/ ├── window-list.txt └── usage-example.sh ```

Timeline estimate:

  • Source code research: 90 min
  • Security analysis: 90 min
  • Implementation: 60 min
  • Documentation: 60 min
  • Testing: 30 min
  • Total: ~330 min (~5.5 hours)

Session Metrics

  • Commits made: 1 (initial repo commit)
  • Files created: 27 (untracked)
  • Lines of code: 107 (bash scripts)
  • Lines of documentation: 596 (SKILL.md + README + SECURITY + UPSTREAM)
  • Lines total: ~1500+ (including specs, analysis docs, worklogs)
  • Skills completed: 2 (screenshot-latest, niri-window-capture)
  • Security threats identified: 5 (documented in SECURITY.md)
  • Audit log entries: 3 (from testing)
  • Source files researched: ~10 (niri compositor codebase)