skills/specs/001-screenshot-analysis/spec.md
dan 5fea49b7c0 feat(tufte-press): evolve skill to complete workflow with JSON generation and build automation
- Transform tufte-press from reference guide to conversation-aware generator
- Add JSON generation from conversation context following strict schema
- Create build automation scripts with Nix environment handling
- Integrate CUPS printing with duplex support
- Add comprehensive workflow documentation

Scripts added:
- skills/tufte-press/scripts/generate-and-build.sh (242 lines)
- skills/tufte-press/scripts/build-card.sh (23 lines)

Documentation:
- Updated SKILL.md with complete workflow instructions (370 lines)
- Updated README.md with usage examples (340 lines)
- Created SKILL-DEVELOPMENT-STRATEGY-tufte-press.md (450 lines)
- Added worklog: 2025-11-10-tufte-press-skill-evolution.org

Features:
- Agent generates valid JSON from conversation
- Schema validation before build (catches errors early)
- Automatic Nix shell entry for dependencies
- PDF build via tufte-press toolchain
- Optional print with duplex support
- Self-contained margin notes enforced
- Complete end-to-end testing

Workflow: Conversation → JSON → Validate → Build → Print

Related: niri-window-capture, screenshot-latest, worklog skills
2025-11-10 15:03:44 -08:00

9.6 KiB

Feature Specification: Screenshot Analysis Skill

Feature Branch: 001-screenshot-analysis
Created: 2025-11-08
Status: Draft
Input: User description: "We want to start thinking about a skill that has the AI look at the last screenshot, it's mostly so we don't have to type 'they're in ~/Pictures/Screenshots' everytime."

Clarifications

Session 2025-11-08

  • Q: Configuration storage mechanism? → A: Skill-specific config file (e.g., ~/.config/opencode/skills/screenshot-analysis/config.json)
  • Q: Symlink handling behavior? → A: Ignore symlinks (skip any symlinked screenshot files)
  • Q: Same-timestamp file handling? → A: Use filename lexicographic ordering as tiebreaker

User Scenarios & Testing (mandatory)

User Story 1 - Quick Screenshot Analysis (Priority: P1)

A user takes a screenshot and immediately asks the AI agent to analyze it without having to specify the file path or location.

Why this priority: This is the core value proposition - eliminating the need to type file paths repeatedly. This single feature delivers immediate value and addresses the primary user pain point.

Independent Test: Can be fully tested by taking a screenshot, asking "analyze the last screenshot", and verifying the agent finds and analyzes the correct file without requiring a path.

Acceptance Scenarios:

  1. Given a screenshot was just taken and saved to ~/Pictures/Screenshots, When user requests "look at my last screenshot", Then the agent locates the most recent file and analyzes it
  2. Given multiple screenshots exist in the directory, When user requests screenshot analysis, Then the agent identifies and uses the most recently created file
  3. Given user asks "what's in my latest screenshot", When the skill executes, Then the agent reads the screenshot file and provides visual analysis

User Story 2 - Reference Previous Screenshots (Priority: P2)

A user wants to reference screenshots from earlier in the conversation or session without re-uploading or specifying paths.

Why this priority: Extends the basic functionality to support conversation continuity and reduces friction when working with multiple screenshots over time.

Independent Test: Take multiple screenshots over time, then reference them using relative terms like "the screenshot from 5 minutes ago" or "the second-to-last screenshot".

Acceptance Scenarios:

  1. Given three screenshots taken at different times, When user requests "show me the previous screenshot", Then the agent selects the second-most-recent file
  2. Given a screenshot from earlier in the session, When user requests "compare this to the earlier screenshot", Then the agent retrieves both the latest and a previous screenshot
  3. Given user asks for "screenshots from today", When the skill executes, Then the agent lists or analyzes all screenshots created today

User Story 3 - Custom Screenshot Directory Support (Priority: P3)

A user who stores screenshots in a different location can configure the skill to use their preferred directory.

Why this priority: Enables flexibility for users with non-standard configurations, but the default location (~/Pictures/Screenshots) covers the majority use case.

Independent Test: Configure a custom screenshot directory, take a screenshot there, and verify the skill finds it correctly.

Acceptance Scenarios:

  1. Given user has configured a custom screenshot directory, When they request screenshot analysis, Then the skill searches the configured location instead of the default
  2. Given no custom directory is configured, When the skill executes, Then it defaults to ~/Pictures/Screenshots
  3. Given the configured directory doesn't exist, When the skill runs, Then it provides a clear error message and falls back to checking the default location

Edge Cases

  • What happens when ~/Pictures/Screenshots is empty (no screenshots exist)?
  • How does the system handle permission errors when reading the directory?
  • What if multiple screenshots have the same timestamp? (Resolved: use lexicographic filename ordering as tiebreaker per FR-002)
  • How does the skill behave if the screenshot file is corrupted or unreadable?
  • What if the user's system uses a different default screenshot location (e.g., macOS vs Linux)?
  • How does the skill handle very large screenshot files?
  • What if the directory contains symlinks to screenshot files (should be ignored per FR-002a)?

Requirements (mandatory)

Functional Requirements

  • FR-001: Skill MUST automatically locate the most recent screenshot file in ~/Pictures/Screenshots without user-provided path
  • FR-002: Skill MUST determine file recency based on file modification time; when multiple files have identical timestamps, use filename lexicographic ordering as tiebreaker (later in alphabet = more recent)
  • FR-002a: Skill MUST ignore symlinks when scanning for screenshot files (only consider regular files)
  • FR-003: Skill MUST support common screenshot formats (PNG, JPG, JPEG)
  • FR-004: Skill MUST provide clear error messages if no screenshots are found
  • FR-005: Skill MUST be invokable through natural language triggers (e.g., "look at my last screenshot", "analyze my recent screenshot")
  • FR-006: Skill MUST pass the screenshot file path to the agent's image analysis capability
  • FR-007: Skill MUST handle missing or inaccessible screenshot directory gracefully
  • FR-008: Skill SHOULD support relative time references (e.g., "screenshot from 5 minutes ago")
  • FR-009: Skill SHOULD allow configuration of custom screenshot directories via skill-specific config file (e.g., ~/.config/opencode/skills/screenshot-analysis/config.json or ~/.claude/skills/screenshot-analysis/config.json)
  • FR-010: Skill SHOULD support finding the Nth most recent screenshot (e.g., "previous screenshot", "second-to-last screenshot")

Key Entities

  • Screenshot File: Image file in the screenshots directory with metadata (path, timestamp, format)
  • Screenshot Directory: Configurable location where screenshots are stored (default: ~/Pictures/Screenshots)
  • Skill Configuration: Optional JSON config file at ~/.config/opencode/skills/screenshot-analysis/config.json (or ~/.claude/skills/screenshot-analysis/config.json for Claude Code) with fields: screenshot_dir (custom directory path)

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Users can request screenshot analysis without typing file paths in 100% of cases where screenshots exist
  • SC-002: Skill correctly identifies the most recent screenshot in under 1 second for directories with up to 1000 files
  • SC-003: Skill successfully locates screenshots in 95% of user requests when screenshots exist
  • SC-004: Error messages are clear and actionable when screenshots cannot be found or accessed
  • SC-005: Reduce user keystrokes by an average of 40+ characters per screenshot analysis request (eliminating "~/Pictures/Screenshots/filename.png")

Assumptions

Default Behavior

  • Users store screenshots in the standard location (~/Pictures/Screenshots) on Linux systems
  • Screenshot filenames include timestamps or modification times that allow reliable sorting by recency
  • The agent has image analysis capabilities (can read and analyze image files)

Technical Environment

  • File system is accessible and readable
  • Standard Unix/Linux file utilities are available
  • Screenshot files use standard image formats

User Interaction

  • Users will use natural language to request screenshot analysis
  • Users understand relative time references ("last", "latest", "recent", "previous")
  • Users expect immediate analysis without additional prompts

Configuration

  • Custom configuration is optional - defaults work for most users
  • Configuration stored in skill-specific JSON file at ~/.config/opencode/skills/screenshot-analysis/config.json or ~/.claude/skills/screenshot-analysis/config.json
  • Config file format: {"screenshot_dir": "/path/to/screenshots"}

Out of Scope

The following are explicitly NOT included in this feature:

  • Screenshot capture functionality (assumes screenshots already exist)
  • Image editing or manipulation
  • Screenshot organization or tagging
  • Screenshot upload to external services
  • Optical Character Recognition (OCR) - unless built into agent's image analysis
  • Screenshot comparison or diff functionality (may be future enhancement)
  • Cross-platform screenshot location detection (focuses on Linux ~/Pictures/Screenshots)
  • Screenshot history management or database

Dependencies

  • Agent must support image file analysis
  • File system access (read permissions on screenshots directory)
  • Bash scripting environment for helper scripts
  • Standard Unix tools (ls, find, stat for file operations)

Notes

  • This skill is a convenience wrapper that eliminates repetitive path typing
  • The actual image analysis is delegated to the agent's existing capabilities
  • Focus is on file discovery and path resolution, not image processing
  • Should work with both Claude Code and OpenCode agents