ops-jrz1/specs/003-maubot-integration/spec.md
Dan 8826d62bcc Add maubot integration and infrastructure updates
- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
2025-12-08 15:55:12 -08:00

288 lines
20 KiB
Markdown

# Feature Specification: Matrix Bot Framework (Maubot) Integration
**Feature Branch**: `003-maubot-integration`
**Created**: 2025-10-26
**Status**: Draft
**Input**: User description: "Begin maubot feature spec. instagram bot is one of our goals."
## Clarifications
### Session 2025-10-26
- Q: Instagram bot activation behavior - should it respond to all Instagram URLs, only when mentioned, or in designated rooms? → A: Bot responds to Instagram URLs only in designated bot-enabled rooms
- Q: Bot error notification method - how should errors be communicated to administrators? → A: Error notification behavior based on severity levels (DEBUG/INFO logs only, WARN logs + dashboard visibility, ERROR/CRITICAL logs + dashboard + Matrix admin room notifications)
- Q: Room enablement mechanism - how do administrators enable bot in specific rooms? → A: Edit bot configuration file with room IDs, restart bot instance
- Q: Admin notification room configuration - should each bot have dedicated admin room, shared room, or reuse homeserver admin room? → A: Reuse Matrix homeserver admin room for bot ERROR/CRITICAL notifications
- Q: Management interface authentication - single shared account, multi-user, or Matrix homeserver auth? → A: Single shared admin account (username/password configured in sops-nix secrets)
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Instagram Content Sharing to Matrix (Priority: P1)
A team member shares an Instagram post URL in a Matrix room, and the bot automatically fetches and displays the content (image, caption, metadata) directly in the chat, allowing team members to view and discuss Instagram content without leaving Matrix.
**Why this priority**: This is the core value proposition - bringing Instagram content into team communication. Demonstrates immediate utility of the bot framework and validates the integration works correctly.
**Independent Test**: Can be fully tested by posting an Instagram URL in a Matrix room and verifying the bot responds with content preview, delivering immediate value as an Instagram content viewer.
**Acceptance Scenarios**:
1. **Given** Instagram bot is enabled in a specific Matrix room, **When** user posts "https://instagram.com/p/ABC123/" in that room, **Then** bot responds within 5 seconds with image, caption, and post metadata (likes, comments count)
2. **Given** Instagram bot is NOT enabled in a Matrix room, **When** user posts Instagram URL in that room, **Then** bot ignores the URL and does not respond
3. **Given** bot receives Instagram URL in enabled room, **When** content is a video, **Then** bot provides video thumbnail, caption, and download link
4. **Given** bot receives Instagram URL in enabled room, **When** content is a carousel (multiple images), **Then** bot displays all images in sequence with navigation
5. **Given** bot receives Instagram profile URL in enabled room, **When** URL is "https://instagram.com/username", **Then** bot displays profile info (bio, follower count, recent posts preview)
6. **Given** bot encounters rate limiting in enabled room, **When** too many requests in short period, **Then** bot queues request and notifies user of delay
---
### User Story 2 - Bot Management Interface (Priority: P2)
Platform administrators can configure, start, stop, and monitor bots through a web-based management interface without editing configuration files or restarting services.
**Why this priority**: Essential for operational management and enables non-developer administrators to manage bots. Required for long-term maintainability but bot can work without it initially.
**Independent Test**: Can be tested by accessing management interface, creating a test bot instance, and verifying it appears in Matrix - demonstrates full bot lifecycle management.
**Acceptance Scenarios**:
1. **Given** administrator accesses Maubot management UI, **When** they log in with shared admin credentials, **Then** dashboard displays all bot instances, their status, and health metrics
2. **Given** administrator wants to deploy Instagram bot, **When** they upload maubot plugin file (.mbp), **Then** plugin appears in available plugins list
3. **Given** plugin is uploaded, **When** administrator creates new bot instance with Matrix user credentials and room subscription list, **Then** bot appears online in Matrix within 30 seconds and only responds in configured rooms
4. **Given** administrator wants to change enabled rooms, **When** they edit bot configuration file with new room IDs and restart bot instance, **Then** bot begins responding only in newly configured rooms
5. **Given** bot is running, **When** administrator clicks "Stop" button, **Then** bot goes offline and stops responding to commands
6. **Given** bot encounters error, **When** viewing bot logs in UI, **Then** error messages are displayed with timestamps, severity level, and context
7. **Given** bot experiences CRITICAL error, **When** error occurs, **Then** notification is sent to Matrix homeserver admin room with error details and affected bot instance
---
### User Story 3 - Bot Framework Service Reliability (Priority: P2)
The Maubot service starts automatically on server boot, maintains bot instances across restarts, and recovers from failures without manual intervention.
**Why this priority**: Critical for production use but can be validated after basic functionality works. Prevents the bot framework from being a maintenance burden.
**Independent Test**: Can be tested by rebooting the server and verifying Maubot service auto-starts and all bot instances resume operation automatically.
**Acceptance Scenarios**:
1. **Given** server reboots, **When** system comes back online, **Then** Maubot service starts automatically within 2 minutes and all bot instances reconnect to Matrix
2. **Given** Matrix homeserver restarts, **When** homeserver is available again, **Then** bot instances re-establish connections and resume operation without manual intervention
3. **Given** bot instance crashes, **When** Maubot detects failure, **Then** service attempts automatic restart with exponential backoff
4. **Given** bot encounters persistent error (ERROR/CRITICAL severity), **When** restart attempts fail, **Then** service logs detailed diagnostics, updates dashboard status, and sends notification to Matrix homeserver admin room
5. **Given** database connection lost, **When** connectivity is restored, **Then** Maubot reconnects automatically and restores bot state
---
### User Story 4 - Additional Bot Deployment (Priority: P3)
Platform administrators can deploy additional custom bots beyond Instagram bot by uploading plugin files and configuring bot instances, enabling extensible bot functionality for future team needs.
**Why this priority**: Demonstrates platform extensibility and future-proofs the investment, but not required for initial value delivery. Can be added after Instagram bot proves value.
**Independent Test**: Can be tested by deploying a simple echo bot or reaction bot from maubot plugin repository and verifying it works independently.
**Acceptance Scenarios**:
1. **Given** administrator has custom maubot plugin (.mbp file), **When** they upload via management interface, **Then** plugin is validated and added to available plugins
2. **Given** plugin requires configuration, **When** creating bot instance, **Then** administrator can provide plugin-specific settings through UI
3. **Given** multiple bot instances exist, **When** administrator views dashboard, **Then** all bots are clearly listed with their types, status, and resource usage
4. **Given** bot requires database storage, **When** bot instance is created, **Then** Maubot automatically provisions isolated database for that bot
5. **Given** plugin has dependencies, **When** uploading plugin, **Then** Maubot validates dependencies and reports missing requirements
---
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: System MUST extract and deploy maubot module from ops-base repository to ops-jrz1 infrastructure
- **FR-002**: System MUST integrate Maubot with existing conduwuit Matrix homeserver on clarun.xyz
- **FR-003**: System MUST provide web-based management interface on dedicated port (default: 29316) accessible to platform administrators via single shared admin account credentials stored in sops-nix secrets
- **FR-004**: Maubot service MUST support automatic startup on system boot and auto-recovery from failures
- **FR-005**: System MUST support Instagram bot plugin deployment with content fetching capabilities
- **FR-006**: Instagram bot MUST fetch and display images, videos, captions, and metadata from Instagram URLs posted only in designated bot-enabled Matrix rooms (bot ignores URLs in rooms where it is not explicitly enabled)
- **FR-007**: Instagram bot MUST handle rate limiting gracefully with user-friendly error messages
- **FR-008**: System MUST support multiple bot instances running concurrently with isolated configurations (architecture supports 3+ instances per SC-002, production deploys 1 instance initially per quickstart.md)
- **FR-009**: System MUST persist bot configurations and state to survive service restarts
- **FR-010**: Administrators MUST be able to configure bot room subscriptions by editing bot configuration file with Matrix room IDs and restarting the bot instance
- **FR-011**: System MUST provide health monitoring for bot instances with status indicators (health check API endpoint and dashboard status display via management interface)
- **FR-012**: System MUST integrate with existing sops-nix secrets management for bot credentials
- **FR-013**: System MUST support uploading and deploying additional maubot plugins (.mbp files) - functionality inherited from ops-base maubot.nix module, validated in T029
- **FR-014**: System MUST provide logging capabilities for bot activity and errors accessible via management interface with severity-based propagation (DEBUG/INFO to logs only, WARN to logs and dashboard, ERROR/CRITICAL to logs, dashboard, and Matrix homeserver admin room)
- **FR-015**: Bot instances MUST authenticate with Matrix homeserver using registration tokens (conduwuit compatibility requirement, shared secret not supported)
- **FR-016**: System MUST support per-bot database storage with automatic provisioning
### Key Entities
- **Maubot Service**: Plugin-based Matrix bot framework that manages multiple bot instances, provides management interface, and handles Matrix homeserver integration
- **Bot Instance**: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment (e.g., "instagram-bot-1")
- **Plugin**: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies (e.g., Instagram content fetcher, echo bot, reaction bot)
- **Bot Configuration**: Settings specific to bot instance including Matrix credentials, plugin settings, room subscriptions (list of enabled room IDs), and command prefixes
- **Management Interface**: Web UI for administrators to create, configure, monitor, and control bot instances, displaying logs with severity levels and real-time status updates
- **Admin Notification**: ERROR and CRITICAL level bot notifications sent to existing Matrix homeserver admin room (shared with other platform notifications)
- **Bot Database**: Per-instance isolated SQLite database for plugin state and data persistence
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: Instagram bot responds to Instagram URLs with content preview within 5 seconds under normal conditions
- **SC-002**: System supports at least 3 concurrent bot instances without performance degradation
- **SC-003**: Maubot service maintains 99% uptime over 7-day testing period
- **SC-004**: Bot instances automatically recover within 2 minutes after service restart
- **SC-005**: Administrators can deploy a new bot instance from scratch in under 10 minutes
- **SC-006**: Instagram bot successfully fetches content for 95% of public Instagram post URLs
- **SC-007**: Management interface loads and displays bot status within 2 seconds
- **SC-008**: System handles server reboot without data loss or manual intervention required
**Validation Note**: SC-001, SC-002, SC-003, SC-004, SC-008 have explicit task validation (T026, T042, T038, T034, T034). SC-005, SC-006, SC-007 are measured during the 7-day operational validation period (T038) and documented in deployment worklog (T044).
## Scope *(mandatory)*
### In Scope
- Extract and adapt maubot.nix module from ops-base to ops-jrz1
- Configure Maubot to integrate with conduwuit Matrix homeserver
- Deploy Instagram bot plugin as primary use case
- Set up management web interface with authentication
- Implement health monitoring and auto-recovery mechanisms
- Configure sops-nix secrets for bot credentials
- Document bot deployment and management procedures including room subscription configuration workflow
- Support for uploading additional maubot plugins
### Out of Scope
- Custom Instagram bot development (use existing maubot Instagram plugin from community)
- Migration of other bots from ops-base besides Instagram bot
- Advanced analytics or metrics dashboard for bot performance
- Multi-homeserver support (only clarun.xyz)
- Custom plugin development beyond Instagram bot deployment
- Mobile app for bot management (web interface only)
- Automatic Instagram authentication (manual token provisioning acceptable)
- Real-time Instagram feed monitoring or notifications
## Constraints *(mandatory)*
### Technical Constraints
- Must work with conduwuit Matrix homeserver (ops-base used continuwuity, may require compatibility testing)
- Limited to Python 3.11 for maubot runtime (nixpkgs availability)
- Instagram bot functionality depends on Instagram API/scraping availability and rate limits
- Must adapt from ops-base VM-based deployment pattern to ops-jrz1 VPS single-host pattern
- Dependent on deprecated olm-3.2.16 library for Matrix encryption (known CVEs, acceptable risk documented in ops-base)
### Operational Constraints
- Deployment must not disrupt existing services (Matrix homeserver, Slack bridge, Forgejo)
- Management interface must be secured (single admin account authentication, localhost-only access)
- Management interface credentials must be stored in sops-nix encrypted secrets
- Bot Matrix accounts require registration tokens from homeserver
- Instagram tokens may require periodic renewal based on Instagram API policies
### Resource Constraints
- Maubot service limited to 512M memory (as per ops-base configuration)
- Additional database space required for bot state (estimated <100MB initially)
- Management interface port 29316 must not conflict with existing services
## Dependencies *(mandatory)*
### External Dependencies
- ops-base repository access to extract maubot.nix module and documentation
- Instagram bot plugin from maubot community or ops-base implementation
- Instagram authentication tokens (if required by current Instagram API policies)
- Matrix homeserver registration token for bot user creation
### Internal Dependencies
- conduwuit Matrix homeserver must be operational on clarun.xyz
- sops-nix secrets management must be configured for bot credentials
- SQLite for bot state storage (decision per plan.md research: lightweight isolation better than shared PostgreSQL)
- Existing NixOS infrastructure and deployment patterns
### Blocking Issues
- Need to verify conduwuit compatibility with maubot (ops-base used continuwuity)
- Need to assess current Instagram API access requirements and scraping feasibility
- Need to extract and adapt ops-base module configuration options from `services.matrix-vm.maubot` to `services.dev-platform.maubot`
## Assumptions *(mandatory)*
- Instagram content fetching remains technically feasible (no major Instagram API changes blocking access)
- Maubot works with conduwuit Matrix homeserver with minimal or no modifications
- ops-base maubot module can be adapted to VPS deployment with reasonable effort
- Instagram bot plugin from ops-base is functional and can be reused or community plugin exists
- Team accepts olm-3.2.16 security risk with documented mitigation plan (migration to vodozemac when available)
- Bot traffic will remain under Instagram rate limits for small team usage (<100 requests/hour)
- Single VPS deployment sufficient (no distributed bot architecture needed)
- Single shared admin account sufficient for initial deployment (no multi-user management required)
## Non-Goals *(optional)*
- Automated Instagram post monitoring or scheduled fetching
- Direct posting to Instagram from Matrix (read-only integration)
- Instagram DM integration or two-way messaging
- Advanced content moderation or filtering
- Custom Instagram analytics or engagement tracking
- Multi-tenant bot hosting for external teams
- Commercial Instagram API integration (acceptable to use community scraping approaches)
- Real-time Instagram notifications or webhooks
## Known Limitations *(optional)*
The following edge cases are known limitations not addressed in MVP scope:
- **Deleted/private Instagram posts**: Bot does not handle posts that become private or deleted after initial fetch (content remains in Matrix chat history)
- **Instagram rate limiting**: System may experience delays during high-traffic periods (429 responses). FR-007 requires graceful handling with user notifications.
- **Matrix account credential expiry**: Bot user account credentials are managed via registration tokens and do not expire automatically. Manual re-authentication required if revoked.
- **Instagram story URLs**: 24-hour expiry stories not supported (yt-dlp limitation for ephemeral content)
- **Command collision**: Multiple bot instances in same room may respond to overlapping triggers. Recommendation: enable only one bot per room or use distinct command prefixes.
- **Age-restricted/geo-blocked content**: Instagram content with access restrictions may fail to fetch depending on VPS location and yt-dlp capabilities
- **Management interface connection loss**: If Maubot loses connection to Matrix homeserver, bot instances stop responding until connection restored (monitored via health checks in FR-011)
- **Database corruption**: No automated backup/recovery. Recommendation: implement manual backup procedure for /var/lib/maubot/ during operational period.
## Risks *(optional)*
### Technical Risks
- **Risk**: Instagram API/scraping methods may break with Instagram updates
- **Mitigation**: Document bot as best-effort, plan for periodic maintenance, monitor Instagram bot community for updates
- **Risk**: Conduwuit compatibility issues with maubot not discovered until integration
- **Mitigation**: Test maubot registration and basic functionality early in implementation phase
- **Risk**: olm-3.2.16 vulnerabilities may be exploited
- **Mitigation**: Follow ops-base mitigation strategy - monitor for vodozemac migration, limit bot network exposure, document accepted risk
### Operational Risks
- **Risk**: Instagram rate limiting may impact bot responsiveness during high usage
- **Mitigation**: Implement request queuing, user notifications for delays, consider rate limit monitoring
- **Risk**: Bot management interface security breach could compromise Matrix homeserver
- **Mitigation**: Require strong authentication, limit network exposure, regular security audits, use sops-nix for credential storage
- **Risk**: Bot instance failure may go unnoticed without monitoring
- **Mitigation**: Implement health checks, automated restarts, log monitoring, administrator alerts for persistent failures
## Clarified Requirements *(resolved 2025-10-26)*
### Instagram Authentication Approach
**Decision**: Use community scraping methods (instaloader, yt-dlp) for Instagram content fetching.
**Rationale**: Easier to set up immediately without requiring Facebook developer account approval. Acceptable for internal team use with understanding that scraping methods may require periodic updates if Instagram changes their interface.
### Management Interface Network Exposure
**Decision**: Restrict management interface to localhost only, requiring SSH tunnel for remote administration.
**Rationale**: Maximizes security by eliminating network attack surface. Administrators already have SSH access for deployment, so tunnel setup is acceptable operational overhead for the security benefit.
### Bot Instance Quantity Planning
**Decision**: Support single Instagram bot instance initially (1 instance).
**Rationale**: Minimal resource requirements, proves concept quickly, demonstrates value before scaling. Architecture can support additional instances later if needed without major rework.
**Note**: SC-002 requires validating 3-instance capability during testing to ensure architecture can scale when needed, but production deployment starts with single instance.