#+TITLE: Invisible Window Capture: From Over-Engineering to Production Security
#+DATE: 2025-11-08
#+KEYWORDS: niri, wayland, security, window-capture, compositor, screenshot, audit-logging
#+COMMITS: 1
#+COMPRESSION_STATUS: uncompressed

* Session Summary
** Date: 2025-11-08 (Day 2 of screenshot-analysis feature, session 2)
** Focus Area: Discovered and implemented invisible cross-workspace window capture using niri compositor's direct buffer rendering, with comprehensive security analysis

* Accomplishments
- [X] Discovered niri can capture windows from inactive workspaces invisibly using direct buffer rendering
- [X] Researched niri source code to understand window capture mechanism (~180 min deep dive)
- [X] Built complete niri-window-capture skill with security documentation (703 lines)
- [X] Implemented audit logging using systemd journal (logger pattern from dotfiles)
- [X] Created comprehensive security analysis (196-line SECURITY.md)
- [X] Tested invisible cross-workspace capture (verified works on workspaces 1 and 2)
- [X] Created upstream feature request template for --no-clipboard flag
- [X] Documented complete technical flow from user intent to screenshot analysis
- [ ] Deploy skill to ~/.claude/skills/ (pending user security review)
- [ ] File upstream niri issue (template ready)

* Key Decisions

** Decision 1: Build invisible capture skill despite security implications
- Context: Discovered niri can capture ANY window invisibly - major privacy/security concern
- Options considered:
  1. Don't build it - too dangerous, privacy violation
  2. Build with user confirmation prompts for cross-workspace
  3. Build with comprehensive security documentation and audit logging
  4. Build with sensitive title filtering built-in
- Rationale: User explicitly decided (after security discussion) to handle window blocking in niri config, implement audit logging, document security implications thoroughly, but skip user prompts and title filtering
- Impact: Production-ready skill that's powerful but requires security-conscious deployment. Users must read SECURITY.md and configure niri block-out rules before use.

** Decision 2: Use logger for audit trail (not custom logging)
- Context: Needed audit trail for all window captures - security requirement
- Options considered:
  1. Custom log file (~/.local/share/niri-capture.log)
  2. Systemd journal via logger -t niri-capture
  3. Upstream niri audit logging feature request
  4. No logging (document security risk)
- Rationale: dotfiles already use logger pattern (lid-suspend.sh, power-status.sh etc). Consistent with existing system, uses systemd journal (queryable with journalctl), standard Linux utility.
- Impact: All captures logged with: timestamp, window ID, title, workspace. Viewable with journalctl --user -t niri-capture. Follows established dotfiles patterns.

** Decision 3: Accept clipboard pollution, request upstream flag
- Context: niri hardcodes clipboard copy in save_screenshot() - cannot disable
- Options considered:
  1. Accept it, document behavior
  2. Save/restore clipboard (fragile, doesn't preserve mime types)
  3. Clear clipboard after AI reads (destroys user clipboard)
  4. File upstream PR for --no-clipboard flag
- Rationale: Clipboard save/restore too fragile. Clear-after breaks user workflow. Best solution is upstream flag. For now, document the behavior clearly in security docs.
- Impact: Users must be aware screenshots persist in clipboard. Clipboard history tools will log all captures. Created UPSTREAM-REQUEST.md template for niri feature request.

** Decision 4: Research niri source code before building
- Context: Needed to understand if invisible cross-workspace capture was possible
- Options considered:
  1. Assume overview mode is only way (requires visible flicker)
  2. Test empirically without source code research
  3. Deep dive into niri compositor source code
  4. Ask in niri community channels
- Rationale: Source code reveals actual capabilities vs assumptions. Found screenshot-window --id command works on any window regardless of workspace. Discovered mapped.render() with RenderTarget::ScreenCapture bypasses screen compositing.
- Impact: Unlocked invisible capture capability. Understood security implications from implementation details. Documented exact technical flow. Time well spent (~90 min research).

** Decision 5: Build two skills, not one monolithic solution
- Context: Started with "find last screenshot" but discovered broader capabilities
- Options considered:
  1. One combined skill (find existing + capture new)
  2. Two separate skills (screenshot-latest + niri-window-capture)
  3. Just the capture skill (skip file-finding)
- Rationale: screenshot-latest solves "find existing files" (simple, safe). niri-window-capture solves "capture any window" (powerful, security-sensitive). Different use cases, different risk profiles, cleaner separation.
- Impact: screenshot-latest: 185 lines, safe, ready to deploy. niri-window-capture: 703 lines, powerful, requires security review. Users can deploy one without the other.

* Problems & Solutions

| Problem | Solution | Learning |
|---------|----------|----------|
| Overview mode captures all workspaces but causes ~450ms visible flicker | Researched niri source, discovered screenshot-window --id renders buffers directly without compositing. Tested on inactive workspace - works invisibly. | niri maintains window buffers in memory even when not displayed. Direct buffer rendering bypasses screen compositor entirely. This is how screenshot-window achieves invisible capture. |
| Unclear if windows on inactive workspaces can be captured | Traced through niri source: Mapped struct holds Window (smithay), Window wraps Wayland surface buffer. Applications continuously render to buffers regardless of workspace visibility. | Wayland applications always render to surface buffers. Compositor decides what to composite to screen, but buffers exist independently. Overview mode doesn't create new renders - just composites existing buffers at smaller scale. |
| jq parse error in capture-by-title.sh - multiple windows matched search | Changed from piping multiple objects to using jq map/select/first: `jq 'map(select(...)) | .[0]'` instead of `jq '.[] | select(...) | head -1'` | When jq outputs multiple JSON objects, bash sees multiple lines but they're not valid as single JSON. Use jq array operations (map) then select first element [0] for single valid output. |
| niri always copies screenshots to clipboard - cannot disable | Researched source: set_data_device_selection() hardcoded in save_screenshot(). Created UPSTREAM-REQUEST.md for --no-clipboard flag. Documented behavior in SECURITY.md. | Clipboard pollution unavoidable with current niri. Future upstream flag needed. Document clearly so users understand privacy implications (clipboard history tools log screenshots). |
| Needed audit logging pattern - how to match dotfiles style | Searched dotfiles: rg "logger" ~/proj/dotfiles. Found lid-suspend-action.sh uses: logger -t "$LOG_TAG" "message". Systemd journal pattern. | Dotfiles use logger -t <tag> for audit trails. Viewable with journalctl --user -t <tag>. Standard Linux utility from util-linux. Perfect for capture audit trail. |

* Technical Details

** Code Changes
- Total files created: 27
- Key files created:
  - `skills/niri-window-capture/SKILL.md` (184 lines) - Agent instructions with security warnings
  - `skills/niri-window-capture/SECURITY.md` (196 lines) - Comprehensive security analysis, threat model, mitigations
  - `skills/niri-window-capture/scripts/capture-focused.sh` (31 lines) - Capture current window with audit logging
  - `skills/niri-window-capture/scripts/capture-by-title.sh` (40 lines) - Find and capture by title match
  - `skills/niri-window-capture/UPSTREAM-REQUEST.md` (108 lines) - Feature request for --no-clipboard flag
  - `skills/screenshot-latest/SKILL.md` (83 lines) - Simple file-finding skill
  - `skills/screenshot-latest/scripts/find-latest.sh` (22 lines) - One-liner: ls -t | head -1
  - `specs/001-screenshot-analysis/RESET.md` - Over-engineering analysis
  - `specs/001-screenshot-analysis/COMPARISON.md` - Spec vs implementation reality
  - `specs/001-screenshot-analysis/SECURITY.md` - Security findings
  - `docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org` - Previous session worklog

- Over-specification archived (not deleted):
  - `specs/001-screenshot-analysis/spec.md` (165 lines) - Over-engineered
  - `specs/001-screenshot-analysis/plan.md` (139 lines) - Premature
  - `specs/001-screenshot-analysis/tasks.md` (331 lines) - 82 unnecessary tasks

** Commands Used

Testing niri window capture:
```bash
# List all windows with metadata
niri msg --json windows | jq -r '.[] | "\(.id) - \(.title) - WS:\(.workspace_id)"'

# Capture specific window invisibly
niri msg action screenshot-window --id <WINDOW_ID> --write-to-disk true

# Capture window from different workspace (tested workspace 2 while on workspace 1)
WINDOW_ID=$(niri msg --json windows | jq -r '.[] | select(.workspace_id == 2) | .id' | head -1)
niri msg action screenshot-window --id "$WINDOW_ID" --write-to-disk true
# Result: Invisible capture, no workspace switch, screenshot saved
```

Verifying niri capabilities:
```bash
# Check grim stdout capability
grim -g "0,0 100x100" - | file -
# Output: /dev/stdin: PNG image data (proves stdout works)

# Test niri overview mode
niri msg action toggle-overview
sleep 0.5
grim /tmp/overview-test.png
niri msg action toggle-overview
# Result: Captures all workspaces but causes visible flicker

# Get niri window info
niri msg --json focused-window | jq '.'
niri msg --json windows | jq '.[0]'
# Returns: id, title, app_id, workspace_id, layout info
```

Audit log viewing:
```bash
# View all captures
journalctl --user -t niri-capture

# Recent captures
journalctl --user -t niri-capture -n 20

# Today's captures
journalctl --user -t niri-capture --since today

# Follow live
journalctl --user -t niri-capture -f
```

** Architecture Notes

**niri compositor window rendering architecture** (discovered via source code research):

1. **Window buffer lifecycle**:
   - Applications render to Wayland surface buffers continuously
   - niri compositor holds references via `Mapped` struct containing `Window` (smithay)
   - Buffers exist in memory regardless of workspace visibility
   - Compositor decides what to composite to outputs, but buffers persist

2. **Direct buffer rendering** (key discovery):
   ```rust
   // From niri/src/niri.rs screenshot_window()
   let elements = mapped.render(
       renderer,
       mapped.window.geometry().loc.to_f64(),
       scale,
       alpha,
       RenderTarget::ScreenCapture,  // ← Key: not Output
   );
   ```
   - `RenderTarget::ScreenCapture` renders to offscreen texture
   - No compositing to screen output required
   - Works for windows on any workspace

3. **Security model**:
   - Access control: niri IPC socket permissions (`srwxr-xr-x` user-private)
   - Any process as user can capture any window
   - Protection: niri window rules `block-out-from "screen-capture"`
   - Audit: systemd journal via logger

4. **Clipboard behavior** (hardcoded):
   ```rust
   // From save_screenshot()
   set_data_device_selection(
       &state.niri.display_handle,
       &state.niri.seat,
       vec![String::from("image/png")],
       buf.clone(),
   );
   ```
   - Always copies PNG to clipboard
   - No flag to disable
   - Runs in separate thread after encoding

** Security Considerations

**Threat model** (documented in SECURITY.md):
- **Local privilege escalation**: Any compromised process as user can capture any window
- **Cross-workspace privacy**: Users may assume inactive workspaces are "private" - they're not
- **Clipboard side channel**: Every capture overwrites clipboard, persists in clipboard history
- **No audit trail**: Added via logger -t niri-capture (systemd journal)
- **Invisible to user**: No workspace switch, no screen flicker (except notification popup)

**Mitigations implemented**:
1. Audit logging: All captures logged with window ID, title, workspace
2. Security documentation: 196-line SECURITY.md with threat analysis
3. Clear warnings: Security notices in SKILL.md and README.md
4. Example protection: Block-out rules for password managers in docs
5. Logged metadata: Can review what was captured via journalctl

**Protection mechanisms recommended to users**:
1. Enable niri window rules for sensitive apps:
   ```kdl
   window-rule {
       match app-id=r#"^org\.keepassxc\.KeePassXC$"#
       block-out-from "screen-capture"
   }
   ```
2. Review audit logs regularly: `journalctl --user -t niri-capture`
3. Ensure screenshot directory private: `chmod 700 ~/Pictures/Screenshots`
4. Clear sensitive screenshots after AI analysis
5. Be aware clipboard contains last screenshot

* Process and Workflow

** What Worked Well
- **Source code research**: Diving into niri source revealed invisible capture capability vs assuming overview was only option
- **Security-first thinking**: Stopping to think like Security Engineer caught major privacy implications
- **Iterative exploration**: grim → overview → source code → screenshot-window discovery path
- **Following dotfiles patterns**: logger usage matches existing system, no new patterns invented
- **Testing on real system**: Verified cross-workspace capture actually works invisibly
- **Comprehensive documentation**: Security analysis forced clarity about risks and mitigations
- **User involvement**: Security discussion led to clear decisions on what to implement vs skip

** What Was Challenging
- **Scope creep awareness**: Started with "find screenshot" became "invisible window capture" - had to recognize the pivot
- **Security vs usability tension**: Powerful capability has privacy implications - balancing both
- **Clipboard limitation**: niri hardcodes clipboard copy, no way around it, had to accept and document
- **jq JSON parsing**: Multiple match objects required different jq syntax than expected
- **Deciding what not to build**: Resisting adding user prompts, sensitive filtering, clipboard workarounds
- **Documentation depth**: Security analysis took longer than code implementation (~90 min vs ~60 min)

* Learning and Insights

** Technical Insights

**Wayland compositor architecture**:
- Compositors maintain window surface buffers in memory continuously
- Applications render to buffers regardless of workspace visibility
- "Invisible workspace" just means "not composited to output" not "buffer doesn't exist"
- Overview mode doesn't create renders - composites existing buffers at smaller scale
- Direct buffer rendering (ScreenCapture target) bypasses screen output entirely

**niri implementation details**:
- Uses smithay library for Wayland protocol handling
- Mapped struct wraps Window which wraps surface buffers
- screenshot-window action calls mapped.render() with ScreenCapture target
- Renders to offscreen texture, converts to PNG, saves to file
- Clipboard copy hardcoded in save_screenshot() - no conditional logic

**Audit logging pattern**:
- logger -t <tag> sends to systemd journal
- journalctl --user -t <tag> queries by tag
- Standard Linux utility from util-linux package
- Dotfiles already use this pattern (lid-suspend, power management)
- Better than custom log files (integrated with system logging)

** Process Insights

**When to research source code**:
- When assumptions limit solution space (overview only? wrong)
- When documentation doesn't cover use case (invisible capture not documented)
- When security implications unclear (need to understand internals)
- When API behavior seems inconsistent (clipboard always copied - why?)
- Cost: 90 minutes research. Benefit: Unlocked invisible capture + understood security model.

**Security documentation value**:
- Forces explicit threat modeling
- Reveals hidden assumptions (user thinks workspace 2 is "private")
- Clarifies trust boundaries (compositor IPC socket = security boundary)
- Documents mitigations for future reference
- Helps users make informed deployment decisions
- 196 lines of security docs = confidence in deployment

**Specification vs implementation timing**:
- Simple problems (find latest file): Code first, document after
- Complex problems (invisible capture): Research first, build second
- Security-sensitive features: Document threats before building
- Unknown capabilities: Research, prototype, then specify
- This problem: Research revealed capability, then built + documented simultaneously

** Architectural Insights

**Compositor as security boundary**:
- Wayland design: compositor is trusted, clients are not
- Compositor has god-mode access to all window buffers
- Access control is IPC socket permissions (user-level)
- Applications cannot capture each other (must go through compositor)
- This skill leverages compositor IPC to do what apps cannot

**Buffer vs display separation**:
- Window buffers: Always exist, continuously updated by apps
- Screen composition: Compositor's choice what to display when
- This separation enables: invisible capture, overview modes, effects
- Security implication: "hidden" windows aren't hidden from compositor

**Audit trail architecture**:
- Systemd journal as system-wide audit log
- Tagged entries (logger -t) for filtering
- Centralized vs per-tool log files
- Query interface (journalctl) with time ranges, filtering
- Integration with system logging infrastructure

* Context for Future Work

** Open Questions

**Clipboard behavior**:
- Will niri upstream accept --no-clipboard flag? (template ready to file)
- Can clipboard save/restore work reliably for all mime types?
- Should AI clear clipboard after reading screenshot?
- How do clipboard history tools handle image/png? (privacy leak)

**Security enhancements**:
- Should notification popup be suppressed for invisible captures?
- Does mako support per-app notification filtering?
- Should captures from other workspaces trigger different notification?
- Is there value in upstream niri audit logging vs logger?

**User experience**:
- Will users actually read 196-line SECURITY.md?
- Should there be a quickstart with "minimum security setup"?
- How to make audit log review part of normal workflow?
- Should skill refuse to capture if block-out rules not configured?

**Integration**:
- How does this skill compose with other skills?
- Should screenshot-latest and niri-window-capture be merged?
- Can this enable new use cases (find error messages across all workspaces)?
- Should there be skill for "capture all windows and search"?

** Next Steps

**Immediate** (user actions):
1. Review SECURITY.md thoroughly
2. Configure niri block-out rules for password managers
3. Test skill: `./skills/niri-window-capture/scripts/capture-focused.sh`
4. Review audit log: `journalctl --user -t niri-capture`
5. Decide whether to deploy to ~/.claude/skills/

**Short term** (if deployed):
1. Monitor audit logs for unexpected captures
2. Test cross-workspace capture workflows
3. Verify block-out rules work (try capturing password manager)
4. Get user feedback on security comfort level

**Upstream niri**:
1. File issue using UPSTREAM-REQUEST.md template
2. Request --no-clipboard flag for screenshot-window action
3. Discuss security documentation for invisible capture
4. Potentially contribute PR for flag (if accepted)

**Documentation improvements**:
1. Add quickstart security setup guide
2. Create video/diagram showing invisible capture flow
3. Document common use cases (find error messages, compare windows)
4. Write integration examples with other skills

** Related Work
- screenshot-latest skill: Simple file-finding (completed)
- niri compositor: https://github.com/YaLTeR/niri
- Wayland security model: Compositor as security boundary
- Dotfiles logging pattern: ~/proj/dotfiles/bin/lid-suspend-action.sh
- Previous worklog: docs/worklogs/2025-11-08-screenshot-analysis-over-engineering-discovery.org
- Smithay Wayland library: https://github.com/Smithay/smithay
- wl-clipboard tools: wl-copy, wl-paste for Wayland clipboard
- systemd journal: journalctl for audit log viewing

* Raw Notes

**Session flow**:
1. Resumed from previous session's over-engineering discovery
2. User asked: "let's go back and focus on what's possible in terms of skipping the screenshot"
3. Tested grim - (stdout): works
4. Explored overview mode: works but visible flicker
5. User asked: "what about for what's not on the active workspace/windows"
6. Deep dive into niri source code → discovered invisible capture
7. User caught flash: notification popup, investigated clipboard
8. User: "Let's think this entire thing through from the perspective of a Security Engineer"
9. Security analysis → threat model → mitigations → audit logging
10. Implementation with security docs
11. User: "Ok, Break down how the skill works for me"
12. Created detailed technical explanation with diagram
13. Worklog requested

**Key user decisions from security discussion**:
- Window blocking: handled in niri config, not skill's responsibility
- Audit logging: yes, use logger (dotfiles pattern)
- User confirmation: no (too invasive)
- Sensitive title filtering: no (niri block-out handles it)
- Clipboard clearing: maybe, but can't avoid clipboard involvement
- Upstream request: yes, file for --no-clipboard flag

**Testing results**:
- ✓ capture-focused.sh works
- ✓ capture-by-title.sh works (after fixing jq syntax)
- ✓ Cross-workspace capture works invisibly (workspace 2 from workspace 1)
- ✓ Audit logging works (journalctl shows entries)
- ✓ Notification popup visible (mako)
- ✗ clipboard always polluted (confirmed hardcoded)

**Interesting discoveries**:
- niri overview mode doesn't create new renders - just composites existing buffers
- Window buffers exist even when not displayed (continuous application rendering)
- screenshot-window --id bypasses screen compositor entirely
- Security boundary is compositor IPC socket (user-private)
- Dotfiles already use logger pattern - consistency win

**Comparison to original over-specification**:
- Original: 635 lines spec, 82 tasks, 115 min, 0 code
- This skill: 703 lines total, 107 lines code, ~180 min, working + security docs
- Key difference: Built it, understood it, documented threats, shipped with security analysis

**Files structure created**:
```
skills/
├── screenshot-latest/          # Simple file-finding (185 lines)
│   ├── SKILL.md
│   ├── README.md
│   └── scripts/find-latest.sh
└── niri-window-capture/        # Invisible capture (703 lines)
    ├── SKILL.md                # Agent instructions
    ├── SECURITY.md             # Threat analysis (196 lines!)
    ├── README.md               # User guide
    ├── UPSTREAM-REQUEST.md     # Feature request template
    ├── IMPLEMENTATION-NOTES.md # Technical details
    ├── scripts/
    │   ├── capture-focused.sh
    │   ├── capture-by-title.sh
    │   └── capture-all-windows.sh
    └── examples/
        ├── window-list.txt
        └── usage-example.sh
```

**Timeline estimate**:
- Source code research: 90 min
- Security analysis: 90 min
- Implementation: 60 min
- Documentation: 60 min
- Testing: 30 min
- Total: ~330 min (~5.5 hours)

* Session Metrics
- Commits made: 1 (initial repo commit)
- Files created: 27 (untracked)
- Lines of code: 107 (bash scripts)
- Lines of documentation: 596 (SKILL.md + README + SECURITY + UPSTREAM)
- Lines total: ~1500+ (including specs, analysis docs, worklogs)
- Skills completed: 2 (screenshot-latest, niri-window-capture)
- Security threats identified: 5 (documented in SECURITY.md)
- Audit log entries: 3 (from testing)
- Source files researched: ~10 (niri compositor codebase)