# Data Model: Maubot Integration **Feature**: 003-maubot-integration **Date**: 2025-10-26 **Status**: Phase 1 design ## Overview This document defines the data structures, state machines, and relationships for the maubot integration feature. Since maubot is an infrastructure service (not an application with user-facing data), the focus is on service configuration, runtime state, and operational entities. --- ## Core Entities ### 1. Maubot Service **Description**: The maubot framework service that manages bot instances and provides the web-based management interface. **Attributes**: - `homeserver_url`: string (URL) - Matrix homeserver endpoint (e.g., `http://127.0.0.1:8008`) - `server_name`: string (domain) - Matrix server domain (e.g., `clarun.xyz`) - `port`: integer - Management interface port (default: 29316) - `database_uri`: string - SQLite database path (e.g., `sqlite:///var/lib/maubot/bot.db`) - `admin_username`: string - Admin UI login username - `admin_password_hash`: string (secret) - Hashed admin password - `secret_key`: string (secret) - Session signing key - `config_path`: string (path) - Runtime config location (`/var/lib/maubot/config/config.yaml`) **Relationships**: - Has many: Bot Instances (1:N) - Has many: Plugins (1:N) - Connects to: Matrix Homeserver (1:1) **State Machine**: N/A (service-level, managed by systemd) **Validation Rules**: - `homeserver_url` MUST be IPv4 `127.0.0.1:PORT` (not localhost - conduwuit compatibility) - `port` MUST NOT conflict with existing services (check: 8008 Matrix, 29319 Slack bridge, 3000 Forgejo) - `admin_password_hash` MUST be bcrypt with cost >=12 - `secret_key` MUST be >=32 bytes random **Storage**: - NixOS module configuration: `/home/dan/proj/ops-jrz1/modules/maubot.nix` - Runtime config: `/var/lib/maubot/config/config.yaml` - Secrets: `/run/secrets/maubot-*` (sops-nix decrypted) --- ### 2. Bot Instance **Description**: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment. **Attributes**: - `id`: string (slug) - Instance identifier (e.g., `instagram-bot-1`) - `type`: string - Plugin ID (e.g., `sna.instagram`) - `primary_user`: string (MXID) - Matrix user ID (e.g., `@instagram-bot:clarun.xyz`) - `enabled`: boolean - Whether bot is active - `config`: object (JSON) - Plugin-specific configuration - For Instagram bot: `{"enabled": true, "max_file_size": 50000000, "room_subscriptions": ["!roomid1:clarun.xyz"]}` - `access_token`: string (secret) - Matrix access token (ephemeral, stored in bot DB) - `device_id`: string - Matrix device identifier - `database_path`: string (optional) - Per-bot database if plugin requires (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`) **Relationships**: - Belongs to: Maubot Service (N:1) - Uses: Plugin (N:1) - Authenticated as: Matrix User (1:1) - Subscribed to: Matrix Rooms (N:M via room_subscriptions config) **State Machine**: ``` [created] ↓ [configured] ─→ disabled ↓ ↓ [enabled] ←───────┘ ↓ [running] ←→ [stopped] ↓ [failed] → [restarting] ``` **States**: - `created`: Instance exists in maubot DB but not yet configured - `configured`: Config provided, Matrix user created, not yet enabled - `enabled`: Marked as active in config - `running`: Bot process active, connected to Matrix, responding to events - `stopped`: Manually stopped via management UI - `failed`: Encountered error (logged to maubot service journal) - `restarting`: Auto-recovery in progress **Validation Rules**: - `primary_user` MUST match pattern `@[a-z0-9-]+:clarun.xyz` - `type` MUST reference an uploaded Plugin - `config.room_subscriptions` MUST be array of valid Matrix room IDs (format: `!...clarun.xyz`) - `enabled=true` requires `access_token` to be set (bot authenticated) **Storage**: - Instance metadata: Maubot SQLite DB (`/var/lib/maubot/bot.db` table: `instance`) - Access tokens: Maubot SQLite DB (encrypted at rest) - Plugin config: Maubot SQLite DB (JSON blob) --- ### 3. Plugin **Description**: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies. **Attributes**: - `id`: string - Plugin identifier (e.g., `sna.instagram`) - `version`: string (semver) - Plugin version (e.g., `1.0.0`) - `main_class`: string - Python class name (e.g., `InstagramBot`) - `modules`: array[string] - Python module list (e.g., `["instagram_bot"]`) - `dependencies`: array[string] - Python package dependencies (e.g., `["yt-dlp>=2023.1.6", "aiohttp"]`) - `database`: boolean - Whether plugin requires dedicated database - `config_schema`: object (JSON Schema) - Plugin configuration validation schema - `upload_path`: string (path) - Storage location (e.g., `/var/lib/maubot/plugins/sna.instagram-v1.0.0.mbp`) **Relationships**: - Belongs to: Maubot Service (N:1) - Used by: Bot Instances (1:N) **State Machine**: ``` [uploaded] ↓ [validated] ─→ [rejected] (invalid metadata) ↓ [loaded] ←→ [disabled] ↓ [active] (used by >=1 running instance) ↓ [trashed] → [deleted] ``` **Validation Rules**: - `id` MUST match pattern `[a-z][a-z0-9._-]+` - `version` MUST be valid semver - `main_class` MUST exist in provided modules - `.mbp` file MUST be valid zip containing `maubot.yaml` + Python files - `dependencies` MUST be available in nixpkgs (e.g., yt-dlp is available, instaloader is not) **Storage**: - Active plugins: `/var/lib/maubot/plugins/` - Trashed plugins: `/var/lib/maubot/trash/` - Metadata: Maubot SQLite DB (table: `plugin`) --- ### 4. Bot Configuration **Description**: Settings specific to bot instance including Matrix credentials, plugin settings, and room subscriptions. **Attributes**: - `instance_id`: string (foreign key) - References Bot Instance - `room_subscriptions`: array[string] - List of Matrix room IDs where bot is active - Example: `["!abc123:clarun.xyz", "!def456:clarun.xyz"]` - `command_prefix`: string (optional) - Bot command trigger (e.g., `!instagram`, `!ig`) - `enabled_features`: object - Feature flags for plugin - For Instagram bot: `{"auto_fetch": true, "rate_limiting": true, "caching": false}` - `rate_limit_config`: object - Rate limiting parameters - Example: `{"max_requests_per_minute": 10, "burst_size": 3, "backoff_seconds": 30}` - `error_notification_level`: string (enum) - Minimum severity for admin notifications - Values: `DEBUG`, `INFO`, `WARN`, `ERROR`, `CRITICAL` - Default: `ERROR` (per spec FR-013) **Relationships**: - Belongs to: Bot Instance (1:1) - References: Matrix Rooms (N:M via room_subscriptions) **Validation Rules**: - `room_subscriptions` items MUST be valid Matrix room IDs - `command_prefix` MUST NOT conflict with other bots (user responsibility) - `error_notification_level` MUST be one of valid enum values - `rate_limit_config.max_requests_per_minute` MUST be >0 and <=60 **Storage**: - Stored in Bot Instance config JSON blob - Editable via: 1. Maubot web UI (management interface) 2. Direct config file edit + bot restart (per FR-010) --- ### 5. Admin Notification **Description**: ERROR and CRITICAL level bot notifications sent to Matrix homeserver admin room (shared with other platform notifications). **Attributes**: - `timestamp`: datetime (ISO 8601) - When notification was generated - `source_instance`: string - Bot instance ID that triggered notification - `severity`: string (enum) - Log level (`ERROR` or `CRITICAL`) - `message`: string - Human-readable error description - `context`: object (JSON) - Additional metadata - `room_id`: string (optional) - Matrix room where error occurred - `event_id`: string (optional) - Matrix event that triggered error - `exception_type`: string (optional) - Python exception class - `stack_trace`: string (optional) - Abbreviated stack trace (last 10 lines) **Relationships**: - Triggered by: Bot Instance (N:1) - Sent to: Matrix Admin Room (N:1, shared room: defined in ops-jrz1 config) **State Machine**: N/A (notifications are fire-and-forget events) **Validation Rules**: - `severity` MUST be `ERROR` or `CRITICAL` (DEBUG/INFO/WARN go to logs only per FR-013) - `message` MUST be non-empty - Matrix admin room MUST exist and bot MUST have send permission **Storage**: - Not persisted (real-time notification) - Logged to systemd journal: `journalctl -u maubot.service` - Visible in maubot management dashboard (recent notifications) --- ### 6. Bot Database **Description**: Per-instance isolated SQLite database for plugin state and data persistence. **Attributes**: - `instance_id`: string (foreign key) - References Bot Instance - `database_path`: string (path) - SQLite file location (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`) - `schema_version`: integer - Plugin-defined schema version - `size_bytes`: integer - Database file size - `last_accessed`: datetime - Last read/write timestamp **Relationships**: - Belongs to: Bot Instance (1:1, optional - only if plugin requires DB) - Managed by: Plugin code (plugin-defined schema) **State Machine**: ``` [initialized] (schema created) ↓ [active] (read/write operations) ↓ [migrating] (schema upgrade in progress) ↓ [active] ↓ [archived] (bot deleted, DB preserved) ``` **Validation Rules**: - `database_path` MUST be within `/var/lib/maubot/plugins/` directory - Schema migrations MUST be handled by plugin code (not maubot framework) - Database MUST be owned by `maubot` user/group **Storage**: - Location: `/var/lib/maubot/plugins/.db` - Backup: Manual (part of `/var/lib/maubot/` directory backup) --- ## Relationships Diagram ``` ┌─────────────────────┐ │ Matrix Homeserver │ │ (conduwuit) │ └──────────┬──────────┘ │ authenticates │ ┌──────────▼──────────┐ │ Maubot Service │ │ ┌──────────────┐ │ │ │ Admin UI │ │ ← admin login (sops-nix secrets) │ │ :29316 │ │ │ └──────────────┘ │ │ │ │ manages ↓ │ │ │ │ ┌──────────────┐ │ │ │ Bot Instance │───┼──→ uses Plugin (.mbp) │ │ (instagram) │ │ │ └───┬──────────┘ │ │ │ has config │ │ ↓ │ │ ┌──────────────┐ │ │ │ Bot Config │ │ │ │ - rooms[] │ │ │ │ - settings │ │ │ └──────────────┘ │ │ │ │ stores ↓ │ │ │ │ ┌──────────────┐ │ │ │ Bot Database │ │ (optional, plugin-specific) │ │ (SQLite) │ │ │ └──────────────┘ │ └─────────────────────┘ │ sends notifications ↓ ┌─────────────────────┐ │ Matrix Admin Room │ (shared with platform) └─────────────────────┘ ``` --- ## Configuration File Structures ### Maubot Service Config **File**: `/var/lib/maubot/config/config.yaml` **Structure**: ```yaml database: "sqlite:///var/lib/maubot/bot.db" server: hostname: 0.0.0.0 port: 29316 admins: admin: # Replaced at runtime homeservers: clarun.xyz: url: http://127.0.0.1:8008 secret: # Optional, for auto-registration logging: level: INFO handlers: - type: journal # Log to systemd journal api_features: login: true plugin: true plugin_upload: true instance: true instance_database: true log: true ``` **Generation**: 1. Maubot example config generated via `maubot -c config.yaml -e` 2. Python script merges NixOS module overrides 3. Secrets injected from `$CREDENTIALS_DIRECTORY` (systemd LoadCredential) 4. Final config written to `/var/lib/maubot/config/config.yaml` --- ### Bot Instance Config **Stored in**: Maubot SQLite DB (not file-based) **Access methods**: 1. Maubot web UI (http://localhost:29316/_matrix/maubot) 2. Direct database edit (advanced, not recommended) 3. File-based config edit + restart (for room subscriptions per FR-010) **Example config** (Instagram bot): ```json { "enabled": true, "max_file_size": 50000000, "room_subscriptions": [ "!abc123def:clarun.xyz", "!xyz789ghi:clarun.xyz" ], "rate_limiting": { "enabled": true, "max_requests_per_minute": 10, "backoff_seconds": 30 }, "error_notification_level": "ERROR" } ``` --- ### Plugin Metadata **File**: `maubot.yaml` (inside .mbp archive) **Structure**: ```yaml id: sna.instagram version: 1.0.0 license: MIT modules: - instagram_bot main_class: InstagramBot database: false # Plugin doesn't use dedicated DB config: true # Plugin accepts configuration config_schema: type: object properties: enabled: type: boolean default: true max_file_size: type: integer default: 50000000 room_subscriptions: type: array items: type: string pattern: "^!.+:.+$" dependencies: - yt-dlp>=2023.1.6 - aiohttp - pillow ``` --- ## State Persistence ### Service State **Location**: `/var/lib/maubot/bot.db` (SQLite) **Tables**: - `instance` - Bot instance metadata - `plugin` - Uploaded plugin metadata - `client` - Matrix client credentials (access tokens) - `log` - Recent bot activity logs **Backup strategy**: - Included in `/var/lib/maubot/` directory backup - Rollback via NixOS generations (service config) - Database can be wiped and rebuilt from scratch (bot re-registration required) ### Runtime State **Location**: Memory (maubot service process) **Contents**: - Active bot instances (Python objects) - Matrix client connections (aiohttp sessions) - Event handlers (registered callbacks) - Plugin instances (loaded Python classes) **Recovery**: - Automatic on maubot service restart - Bot instances reconnect to Matrix - Plugin state reloaded from DB (if applicable) --- ## Security Model ### Secrets Hierarchy 1. **Service-level secrets** (sops-nix encrypted): - `maubot-admin-password` - Management UI login - `maubot-secret-key` - Session signing - `matrix-registration-token` - Bot user creation (reused from Matrix homeserver) 2. **Bot-level secrets** (stored in maubot DB): - Matrix access tokens (per bot instance) - Matrix device IDs - Plugin-specific credentials (if any) 3. **Runtime secrets** (ephemeral): - Active session tokens (management UI) - Matrix sync tokens (E2EE keys if enabled) ### Permissions **File permissions**: ``` /var/lib/maubot/ → drwxr-x--- maubot:maubot /var/lib/maubot/config/ → drwx------ maubot:maubot /var/lib/maubot/config/config.yaml → -rw------- maubot:maubot (contains secrets) /var/lib/maubot/bot.db → -rw-r----- maubot:maubot /var/lib/maubot/plugins/ → drwxr-xr-x maubot:maubot /run/secrets/maubot-* → -r-------- maubot:maubot (0400) ``` **Network access**: - Management interface: localhost:29316 only (SSH tunnel required for remote access per spec) - Matrix homeserver: localhost:8008 (IPv4, conduwuit compatibility) - No external network access (except Matrix federation via homeserver) --- ## Operational Entities ### Health Check State **Attributes**: - `last_check_timestamp`: datetime - `service_status`: enum (`healthy`, `degraded`, `failed`) - `maubot_version_endpoint`: boolean - `/maubot/v1/version` accessible - `active_instances_count`: integer - `failed_instances`: array[string] - Instance IDs with errors - `last_successful_message_timestamp`: datetime (per bot instance) **Storage**: Systemd timer state + systemd journal logs **Health indicators** (per spec SC-003): - Service responds to HTTP health check (curl to version endpoint) - Active instances count matches enabled instances count - No ERROR/CRITICAL logs in last 5 minutes - All enabled bots have recent Matrix sync activity (<10 minutes) --- ## Data Flow Diagrams ### Instagram URL Processing Flow ``` 1. User posts Instagram URL in Matrix room ↓ 2. Matrix homeserver distributes event to all clients ↓ 3. Bot instance receives event (if subscribed to that room) ↓ 4. Plugin regex matches Instagram URL pattern ↓ 5. Plugin calls yt-dlp extraction (async thread pool) ↓ 6. yt-dlp downloads media to temporary directory ↓ 7. Plugin uploads media to Matrix homeserver ↓ 8. Plugin sends Matrix message event with media attachment ↓ 9. Cleanup temporary files ↓ 10. Log extraction success/failure (severity-based notification if ERROR/CRITICAL) ``` ### Bot Registration Flow ``` 1. Admin accesses maubot web UI via SSH tunnel ↓ 2. Create new bot client (provide Matrix user ID) ↓ 3. Maubot attempts registration via conduwuit registration token ↓ 4. If successful: Access token stored in maubot DB ↓ 5. Create bot instance (select plugin, provide config) ↓ 6. Bot connects to Matrix homeserver ↓ 7. Bot joins configured rooms (from room_subscriptions) ↓ 8. Bot starts listening for events ``` --- ## Validation Rules Summary ### Configuration Validation - All Matrix room IDs MUST match pattern `!.+:.+` - Homeserver URL MUST be `http://127.0.0.1:PORT` (IPv4, not localhost) - Admin password MUST meet minimum strength (length >=16, bcrypt cost >=12) - Plugin IDs MUST be globally unique within maubot instance - File paths MUST be absolute and within permitted directories ### Runtime Validation - Bot instances CANNOT start without valid Matrix access token - Room subscriptions MUST reference existing rooms (checked at runtime, logged if invalid) - Plugin dependencies MUST be available in NixOS environment - Rate limiting MUST be enforced before external API calls (Instagram) ### Security Validation - Secrets MUST NEVER appear in logs or config files (placeholders only) - Management interface MUST bind localhost only (0.0.0.0 for within-container, but not exposed externally) - Database files MUST have restrictive permissions (0600 or 0640) - ERROR/CRITICAL notifications MUST include sanitized context (no credentials in stack traces) --- ## Migration Strategy ### From ops-base to ops-jrz1 **Data migration**: Not required (fresh deployment) **Configuration migration**: 1. Extract maubot.nix module from ops-base 2. Adapt namespace: `services.matrix-vm.maubot` → `services.maubot` 3. Update homeserver URL: `continuwuity` → `conduwuit` 4. Remove registration_secrets (not supported by conduwuit) 5. Add registration token configuration **Plugin migration**: 1. Copy Instagram bot .mbp file from ops-base: `/home/dan/proj/sna/sna-instagram-bot.mbp` 2. Upload to ops-jrz1 maubot via web UI or API 3. Create bot instance with room subscriptions 4. Test content fetching in designated rooms **No database migration needed** (SQLite DB created fresh on ops-jrz1) --- ## Capacity Planning ### Single Instagram Bot Instance **Estimated resource usage**: - Memory: ~100MB (maubot service + bot instance + yt-dlp subprocess) - Disk: - Maubot DB: <10MB (metadata only) - Plugins: ~1MB per .mbp file - Temporary files: Up to 50MB (during media download, auto-cleanup) - CPU: Burst during media extraction (yt-dlp), idle otherwise - Network: <1GB/day (assuming <20 Instagram fetches/day at ~50MB each) **Scale validation** (per SC-002): - Maubot service supports 3+ concurrent instances without degradation - Each additional bot: ~50MB memory, minimal CPU/network impact - Shared resources: Maubot DB (SQLite supports concurrent reads), management UI --- **Status**: Data model complete. Ready for quickstart.md generation.