ops-jrz1/specs/003-maubot-integration/spec.md
Dan 8826d62bcc Add maubot integration and infrastructure updates
- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
2025-12-08 15:55:12 -08:00

20 KiB

Feature Specification: Matrix Bot Framework (Maubot) Integration

Feature Branch: 003-maubot-integration Created: 2025-10-26 Status: Draft Input: User description: "Begin maubot feature spec. instagram bot is one of our goals."

Clarifications

Session 2025-10-26

  • Q: Instagram bot activation behavior - should it respond to all Instagram URLs, only when mentioned, or in designated rooms? → A: Bot responds to Instagram URLs only in designated bot-enabled rooms
  • Q: Bot error notification method - how should errors be communicated to administrators? → A: Error notification behavior based on severity levels (DEBUG/INFO logs only, WARN logs + dashboard visibility, ERROR/CRITICAL logs + dashboard + Matrix admin room notifications)
  • Q: Room enablement mechanism - how do administrators enable bot in specific rooms? → A: Edit bot configuration file with room IDs, restart bot instance
  • Q: Admin notification room configuration - should each bot have dedicated admin room, shared room, or reuse homeserver admin room? → A: Reuse Matrix homeserver admin room for bot ERROR/CRITICAL notifications
  • Q: Management interface authentication - single shared account, multi-user, or Matrix homeserver auth? → A: Single shared admin account (username/password configured in sops-nix secrets)

User Scenarios & Testing (mandatory)

User Story 1 - Instagram Content Sharing to Matrix (Priority: P1)

A team member shares an Instagram post URL in a Matrix room, and the bot automatically fetches and displays the content (image, caption, metadata) directly in the chat, allowing team members to view and discuss Instagram content without leaving Matrix.

Why this priority: This is the core value proposition - bringing Instagram content into team communication. Demonstrates immediate utility of the bot framework and validates the integration works correctly.

Independent Test: Can be fully tested by posting an Instagram URL in a Matrix room and verifying the bot responds with content preview, delivering immediate value as an Instagram content viewer.

Acceptance Scenarios:

  1. Given Instagram bot is enabled in a specific Matrix room, When user posts "https://instagram.com/p/ABC123/" in that room, Then bot responds within 5 seconds with image, caption, and post metadata (likes, comments count)
  2. Given Instagram bot is NOT enabled in a Matrix room, When user posts Instagram URL in that room, Then bot ignores the URL and does not respond
  3. Given bot receives Instagram URL in enabled room, When content is a video, Then bot provides video thumbnail, caption, and download link
  4. Given bot receives Instagram URL in enabled room, When content is a carousel (multiple images), Then bot displays all images in sequence with navigation
  5. Given bot receives Instagram profile URL in enabled room, When URL is "https://instagram.com/username", Then bot displays profile info (bio, follower count, recent posts preview)
  6. Given bot encounters rate limiting in enabled room, When too many requests in short period, Then bot queues request and notifies user of delay

User Story 2 - Bot Management Interface (Priority: P2)

Platform administrators can configure, start, stop, and monitor bots through a web-based management interface without editing configuration files or restarting services.

Why this priority: Essential for operational management and enables non-developer administrators to manage bots. Required for long-term maintainability but bot can work without it initially.

Independent Test: Can be tested by accessing management interface, creating a test bot instance, and verifying it appears in Matrix - demonstrates full bot lifecycle management.

Acceptance Scenarios:

  1. Given administrator accesses Maubot management UI, When they log in with shared admin credentials, Then dashboard displays all bot instances, their status, and health metrics
  2. Given administrator wants to deploy Instagram bot, When they upload maubot plugin file (.mbp), Then plugin appears in available plugins list
  3. Given plugin is uploaded, When administrator creates new bot instance with Matrix user credentials and room subscription list, Then bot appears online in Matrix within 30 seconds and only responds in configured rooms
  4. Given administrator wants to change enabled rooms, When they edit bot configuration file with new room IDs and restart bot instance, Then bot begins responding only in newly configured rooms
  5. Given bot is running, When administrator clicks "Stop" button, Then bot goes offline and stops responding to commands
  6. Given bot encounters error, When viewing bot logs in UI, Then error messages are displayed with timestamps, severity level, and context
  7. Given bot experiences CRITICAL error, When error occurs, Then notification is sent to Matrix homeserver admin room with error details and affected bot instance

User Story 3 - Bot Framework Service Reliability (Priority: P2)

The Maubot service starts automatically on server boot, maintains bot instances across restarts, and recovers from failures without manual intervention.

Why this priority: Critical for production use but can be validated after basic functionality works. Prevents the bot framework from being a maintenance burden.

Independent Test: Can be tested by rebooting the server and verifying Maubot service auto-starts and all bot instances resume operation automatically.

Acceptance Scenarios:

  1. Given server reboots, When system comes back online, Then Maubot service starts automatically within 2 minutes and all bot instances reconnect to Matrix
  2. Given Matrix homeserver restarts, When homeserver is available again, Then bot instances re-establish connections and resume operation without manual intervention
  3. Given bot instance crashes, When Maubot detects failure, Then service attempts automatic restart with exponential backoff
  4. Given bot encounters persistent error (ERROR/CRITICAL severity), When restart attempts fail, Then service logs detailed diagnostics, updates dashboard status, and sends notification to Matrix homeserver admin room
  5. Given database connection lost, When connectivity is restored, Then Maubot reconnects automatically and restores bot state

User Story 4 - Additional Bot Deployment (Priority: P3)

Platform administrators can deploy additional custom bots beyond Instagram bot by uploading plugin files and configuring bot instances, enabling extensible bot functionality for future team needs.

Why this priority: Demonstrates platform extensibility and future-proofs the investment, but not required for initial value delivery. Can be added after Instagram bot proves value.

Independent Test: Can be tested by deploying a simple echo bot or reaction bot from maubot plugin repository and verifying it works independently.

Acceptance Scenarios:

  1. Given administrator has custom maubot plugin (.mbp file), When they upload via management interface, Then plugin is validated and added to available plugins
  2. Given plugin requires configuration, When creating bot instance, Then administrator can provide plugin-specific settings through UI
  3. Given multiple bot instances exist, When administrator views dashboard, Then all bots are clearly listed with their types, status, and resource usage
  4. Given bot requires database storage, When bot instance is created, Then Maubot automatically provisions isolated database for that bot
  5. Given plugin has dependencies, When uploading plugin, Then Maubot validates dependencies and reports missing requirements

Requirements (mandatory)

Functional Requirements

  • FR-001: System MUST extract and deploy maubot module from ops-base repository to ops-jrz1 infrastructure
  • FR-002: System MUST integrate Maubot with existing conduwuit Matrix homeserver on clarun.xyz
  • FR-003: System MUST provide web-based management interface on dedicated port (default: 29316) accessible to platform administrators via single shared admin account credentials stored in sops-nix secrets
  • FR-004: Maubot service MUST support automatic startup on system boot and auto-recovery from failures
  • FR-005: System MUST support Instagram bot plugin deployment with content fetching capabilities
  • FR-006: Instagram bot MUST fetch and display images, videos, captions, and metadata from Instagram URLs posted only in designated bot-enabled Matrix rooms (bot ignores URLs in rooms where it is not explicitly enabled)
  • FR-007: Instagram bot MUST handle rate limiting gracefully with user-friendly error messages
  • FR-008: System MUST support multiple bot instances running concurrently with isolated configurations (architecture supports 3+ instances per SC-002, production deploys 1 instance initially per quickstart.md)
  • FR-009: System MUST persist bot configurations and state to survive service restarts
  • FR-010: Administrators MUST be able to configure bot room subscriptions by editing bot configuration file with Matrix room IDs and restarting the bot instance
  • FR-011: System MUST provide health monitoring for bot instances with status indicators (health check API endpoint and dashboard status display via management interface)
  • FR-012: System MUST integrate with existing sops-nix secrets management for bot credentials
  • FR-013: System MUST support uploading and deploying additional maubot plugins (.mbp files) - functionality inherited from ops-base maubot.nix module, validated in T029
  • FR-014: System MUST provide logging capabilities for bot activity and errors accessible via management interface with severity-based propagation (DEBUG/INFO to logs only, WARN to logs and dashboard, ERROR/CRITICAL to logs, dashboard, and Matrix homeserver admin room)
  • FR-015: Bot instances MUST authenticate with Matrix homeserver using registration tokens (conduwuit compatibility requirement, shared secret not supported)
  • FR-016: System MUST support per-bot database storage with automatic provisioning

Key Entities

  • Maubot Service: Plugin-based Matrix bot framework that manages multiple bot instances, provides management interface, and handles Matrix homeserver integration
  • Bot Instance: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment (e.g., "instagram-bot-1")
  • Plugin: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies (e.g., Instagram content fetcher, echo bot, reaction bot)
  • Bot Configuration: Settings specific to bot instance including Matrix credentials, plugin settings, room subscriptions (list of enabled room IDs), and command prefixes
  • Management Interface: Web UI for administrators to create, configure, monitor, and control bot instances, displaying logs with severity levels and real-time status updates
  • Admin Notification: ERROR and CRITICAL level bot notifications sent to existing Matrix homeserver admin room (shared with other platform notifications)
  • Bot Database: Per-instance isolated SQLite database for plugin state and data persistence

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: Instagram bot responds to Instagram URLs with content preview within 5 seconds under normal conditions
  • SC-002: System supports at least 3 concurrent bot instances without performance degradation
  • SC-003: Maubot service maintains 99% uptime over 7-day testing period
  • SC-004: Bot instances automatically recover within 2 minutes after service restart
  • SC-005: Administrators can deploy a new bot instance from scratch in under 10 minutes
  • SC-006: Instagram bot successfully fetches content for 95% of public Instagram post URLs
  • SC-007: Management interface loads and displays bot status within 2 seconds
  • SC-008: System handles server reboot without data loss or manual intervention required

Validation Note: SC-001, SC-002, SC-003, SC-004, SC-008 have explicit task validation (T026, T042, T038, T034, T034). SC-005, SC-006, SC-007 are measured during the 7-day operational validation period (T038) and documented in deployment worklog (T044).

Scope (mandatory)

In Scope

  • Extract and adapt maubot.nix module from ops-base to ops-jrz1
  • Configure Maubot to integrate with conduwuit Matrix homeserver
  • Deploy Instagram bot plugin as primary use case
  • Set up management web interface with authentication
  • Implement health monitoring and auto-recovery mechanisms
  • Configure sops-nix secrets for bot credentials
  • Document bot deployment and management procedures including room subscription configuration workflow
  • Support for uploading additional maubot plugins

Out of Scope

  • Custom Instagram bot development (use existing maubot Instagram plugin from community)
  • Migration of other bots from ops-base besides Instagram bot
  • Advanced analytics or metrics dashboard for bot performance
  • Multi-homeserver support (only clarun.xyz)
  • Custom plugin development beyond Instagram bot deployment
  • Mobile app for bot management (web interface only)
  • Automatic Instagram authentication (manual token provisioning acceptable)
  • Real-time Instagram feed monitoring or notifications

Constraints (mandatory)

Technical Constraints

  • Must work with conduwuit Matrix homeserver (ops-base used continuwuity, may require compatibility testing)
  • Limited to Python 3.11 for maubot runtime (nixpkgs availability)
  • Instagram bot functionality depends on Instagram API/scraping availability and rate limits
  • Must adapt from ops-base VM-based deployment pattern to ops-jrz1 VPS single-host pattern
  • Dependent on deprecated olm-3.2.16 library for Matrix encryption (known CVEs, acceptable risk documented in ops-base)

Operational Constraints

  • Deployment must not disrupt existing services (Matrix homeserver, Slack bridge, Forgejo)
  • Management interface must be secured (single admin account authentication, localhost-only access)
  • Management interface credentials must be stored in sops-nix encrypted secrets
  • Bot Matrix accounts require registration tokens from homeserver
  • Instagram tokens may require periodic renewal based on Instagram API policies

Resource Constraints

  • Maubot service limited to 512M memory (as per ops-base configuration)
  • Additional database space required for bot state (estimated <100MB initially)
  • Management interface port 29316 must not conflict with existing services

Dependencies (mandatory)

External Dependencies

  • ops-base repository access to extract maubot.nix module and documentation
  • Instagram bot plugin from maubot community or ops-base implementation
  • Instagram authentication tokens (if required by current Instagram API policies)
  • Matrix homeserver registration token for bot user creation

Internal Dependencies

  • conduwuit Matrix homeserver must be operational on clarun.xyz
  • sops-nix secrets management must be configured for bot credentials
  • SQLite for bot state storage (decision per plan.md research: lightweight isolation better than shared PostgreSQL)
  • Existing NixOS infrastructure and deployment patterns

Blocking Issues

  • Need to verify conduwuit compatibility with maubot (ops-base used continuwuity)
  • Need to assess current Instagram API access requirements and scraping feasibility
  • Need to extract and adapt ops-base module configuration options from services.matrix-vm.maubot to services.dev-platform.maubot

Assumptions (mandatory)

  • Instagram content fetching remains technically feasible (no major Instagram API changes blocking access)
  • Maubot works with conduwuit Matrix homeserver with minimal or no modifications
  • ops-base maubot module can be adapted to VPS deployment with reasonable effort
  • Instagram bot plugin from ops-base is functional and can be reused or community plugin exists
  • Team accepts olm-3.2.16 security risk with documented mitigation plan (migration to vodozemac when available)
  • Bot traffic will remain under Instagram rate limits for small team usage (<100 requests/hour)
  • Single VPS deployment sufficient (no distributed bot architecture needed)
  • Single shared admin account sufficient for initial deployment (no multi-user management required)

Non-Goals (optional)

  • Automated Instagram post monitoring or scheduled fetching
  • Direct posting to Instagram from Matrix (read-only integration)
  • Instagram DM integration or two-way messaging
  • Advanced content moderation or filtering
  • Custom Instagram analytics or engagement tracking
  • Multi-tenant bot hosting for external teams
  • Commercial Instagram API integration (acceptable to use community scraping approaches)
  • Real-time Instagram notifications or webhooks

Known Limitations (optional)

The following edge cases are known limitations not addressed in MVP scope:

  • Deleted/private Instagram posts: Bot does not handle posts that become private or deleted after initial fetch (content remains in Matrix chat history)
  • Instagram rate limiting: System may experience delays during high-traffic periods (429 responses). FR-007 requires graceful handling with user notifications.
  • Matrix account credential expiry: Bot user account credentials are managed via registration tokens and do not expire automatically. Manual re-authentication required if revoked.
  • Instagram story URLs: 24-hour expiry stories not supported (yt-dlp limitation for ephemeral content)
  • Command collision: Multiple bot instances in same room may respond to overlapping triggers. Recommendation: enable only one bot per room or use distinct command prefixes.
  • Age-restricted/geo-blocked content: Instagram content with access restrictions may fail to fetch depending on VPS location and yt-dlp capabilities
  • Management interface connection loss: If Maubot loses connection to Matrix homeserver, bot instances stop responding until connection restored (monitored via health checks in FR-011)
  • Database corruption: No automated backup/recovery. Recommendation: implement manual backup procedure for /var/lib/maubot/ during operational period.

Risks (optional)

Technical Risks

  • Risk: Instagram API/scraping methods may break with Instagram updates

    • Mitigation: Document bot as best-effort, plan for periodic maintenance, monitor Instagram bot community for updates
  • Risk: Conduwuit compatibility issues with maubot not discovered until integration

    • Mitigation: Test maubot registration and basic functionality early in implementation phase
  • Risk: olm-3.2.16 vulnerabilities may be exploited

    • Mitigation: Follow ops-base mitigation strategy - monitor for vodozemac migration, limit bot network exposure, document accepted risk

Operational Risks

  • Risk: Instagram rate limiting may impact bot responsiveness during high usage

    • Mitigation: Implement request queuing, user notifications for delays, consider rate limit monitoring
  • Risk: Bot management interface security breach could compromise Matrix homeserver

    • Mitigation: Require strong authentication, limit network exposure, regular security audits, use sops-nix for credential storage
  • Risk: Bot instance failure may go unnoticed without monitoring

    • Mitigation: Implement health checks, automated restarts, log monitoring, administrator alerts for persistent failures

Clarified Requirements (resolved 2025-10-26)

Instagram Authentication Approach

Decision: Use community scraping methods (instaloader, yt-dlp) for Instagram content fetching.

Rationale: Easier to set up immediately without requiring Facebook developer account approval. Acceptable for internal team use with understanding that scraping methods may require periodic updates if Instagram changes their interface.

Management Interface Network Exposure

Decision: Restrict management interface to localhost only, requiring SSH tunnel for remote administration.

Rationale: Maximizes security by eliminating network attack surface. Administrators already have SSH access for deployment, so tunnel setup is acceptable operational overhead for the security benefit.

Bot Instance Quantity Planning

Decision: Support single Instagram bot instance initially (1 instance).

Rationale: Minimal resource requirements, proves concept quickly, demonstrates value before scaling. Architecture can support additional instances later if needed without major rework.

Note: SC-002 requires validating 3-instance capability during testing to ensure architecture can scale when needed, but production deployment starts with single instance.