ops-jrz1/specs/003-maubot-integration/data-model.md

626 lines
20 KiB
Markdown

# Data Model: Maubot Integration
**Feature**: 003-maubot-integration
**Date**: 2025-10-26
**Status**: Phase 1 design
## Overview
This document defines the data structures, state machines, and relationships for the maubot integration feature. Since maubot is an infrastructure service (not an application with user-facing data), the focus is on service configuration, runtime state, and operational entities.
---
## Core Entities
### 1. Maubot Service
**Description**: The maubot framework service that manages bot instances and provides the web-based management interface.
**Attributes**:
- `homeserver_url`: string (URL) - Matrix homeserver endpoint (e.g., `http://127.0.0.1:8008`)
- `server_name`: string (domain) - Matrix server domain (e.g., `clarun.xyz`)
- `port`: integer - Management interface port (default: 29316)
- `database_uri`: string - SQLite database path (e.g., `sqlite:///var/lib/maubot/bot.db`)
- `admin_username`: string - Admin UI login username
- `admin_password_hash`: string (secret) - Hashed admin password
- `secret_key`: string (secret) - Session signing key
- `config_path`: string (path) - Runtime config location (`/var/lib/maubot/config/config.yaml`)
**Relationships**:
- Has many: Bot Instances (1:N)
- Has many: Plugins (1:N)
- Connects to: Matrix Homeserver (1:1)
**State Machine**: N/A (service-level, managed by systemd)
**Validation Rules**:
- `homeserver_url` MUST be IPv4 `127.0.0.1:PORT` (not localhost - conduwuit compatibility)
- `port` MUST NOT conflict with existing services (check: 8008 Matrix, 29319 Slack bridge, 3000 Forgejo)
- `admin_password_hash` MUST be bcrypt with cost >=12
- `secret_key` MUST be >=32 bytes random
**Storage**:
- NixOS module configuration: `/home/dan/proj/ops-jrz1/modules/maubot.nix`
- Runtime config: `/var/lib/maubot/config/config.yaml`
- Secrets: `/run/secrets/maubot-*` (sops-nix decrypted)
---
### 2. Bot Instance
**Description**: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment.
**Attributes**:
- `id`: string (slug) - Instance identifier (e.g., `instagram-bot-1`)
- `type`: string - Plugin ID (e.g., `sna.instagram`)
- `primary_user`: string (MXID) - Matrix user ID (e.g., `@instagram-bot:clarun.xyz`)
- `enabled`: boolean - Whether bot is active
- `config`: object (JSON) - Plugin-specific configuration
- For Instagram bot: `{"enabled": true, "max_file_size": 50000000, "room_subscriptions": ["!roomid1:clarun.xyz"]}`
- `access_token`: string (secret) - Matrix access token (ephemeral, stored in bot DB)
- `device_id`: string - Matrix device identifier
- `database_path`: string (optional) - Per-bot database if plugin requires (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`)
**Relationships**:
- Belongs to: Maubot Service (N:1)
- Uses: Plugin (N:1)
- Authenticated as: Matrix User (1:1)
- Subscribed to: Matrix Rooms (N:M via room_subscriptions config)
**State Machine**:
```
[created]
[configured] ─→ disabled
↓ ↓
[enabled] ←───────┘
[running] ←→ [stopped]
[failed] → [restarting]
```
**States**:
- `created`: Instance exists in maubot DB but not yet configured
- `configured`: Config provided, Matrix user created, not yet enabled
- `enabled`: Marked as active in config
- `running`: Bot process active, connected to Matrix, responding to events
- `stopped`: Manually stopped via management UI
- `failed`: Encountered error (logged to maubot service journal)
- `restarting`: Auto-recovery in progress
**Validation Rules**:
- `primary_user` MUST match pattern `@[a-z0-9-]+:clarun.xyz`
- `type` MUST reference an uploaded Plugin
- `config.room_subscriptions` MUST be array of valid Matrix room IDs (format: `!...clarun.xyz`)
- `enabled=true` requires `access_token` to be set (bot authenticated)
**Storage**:
- Instance metadata: Maubot SQLite DB (`/var/lib/maubot/bot.db` table: `instance`)
- Access tokens: Maubot SQLite DB (encrypted at rest)
- Plugin config: Maubot SQLite DB (JSON blob)
---
### 3. Plugin
**Description**: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies.
**Attributes**:
- `id`: string - Plugin identifier (e.g., `sna.instagram`)
- `version`: string (semver) - Plugin version (e.g., `1.0.0`)
- `main_class`: string - Python class name (e.g., `InstagramBot`)
- `modules`: array[string] - Python module list (e.g., `["instagram_bot"]`)
- `dependencies`: array[string] - Python package dependencies (e.g., `["yt-dlp>=2023.1.6", "aiohttp"]`)
- `database`: boolean - Whether plugin requires dedicated database
- `config_schema`: object (JSON Schema) - Plugin configuration validation schema
- `upload_path`: string (path) - Storage location (e.g., `/var/lib/maubot/plugins/sna.instagram-v1.0.0.mbp`)
**Relationships**:
- Belongs to: Maubot Service (N:1)
- Used by: Bot Instances (1:N)
**State Machine**:
```
[uploaded]
[validated] ─→ [rejected] (invalid metadata)
[loaded] ←→ [disabled]
[active] (used by >=1 running instance)
[trashed] → [deleted]
```
**Validation Rules**:
- `id` MUST match pattern `[a-z][a-z0-9._-]+`
- `version` MUST be valid semver
- `main_class` MUST exist in provided modules
- `.mbp` file MUST be valid zip containing `maubot.yaml` + Python files
- `dependencies` MUST be available in nixpkgs (e.g., yt-dlp is available, instaloader is not)
**Storage**:
- Active plugins: `/var/lib/maubot/plugins/`
- Trashed plugins: `/var/lib/maubot/trash/`
- Metadata: Maubot SQLite DB (table: `plugin`)
---
### 4. Bot Configuration
**Description**: Settings specific to bot instance including Matrix credentials, plugin settings, and room subscriptions.
**Attributes**:
- `instance_id`: string (foreign key) - References Bot Instance
- `room_subscriptions`: array[string] - List of Matrix room IDs where bot is active
- Example: `["!abc123:clarun.xyz", "!def456:clarun.xyz"]`
- `command_prefix`: string (optional) - Bot command trigger (e.g., `!instagram`, `!ig`)
- `enabled_features`: object - Feature flags for plugin
- For Instagram bot: `{"auto_fetch": true, "rate_limiting": true, "caching": false}`
- `rate_limit_config`: object - Rate limiting parameters
- Example: `{"max_requests_per_minute": 10, "burst_size": 3, "backoff_seconds": 30}`
- `error_notification_level`: string (enum) - Minimum severity for admin notifications
- Values: `DEBUG`, `INFO`, `WARN`, `ERROR`, `CRITICAL`
- Default: `ERROR` (per spec FR-013)
**Relationships**:
- Belongs to: Bot Instance (1:1)
- References: Matrix Rooms (N:M via room_subscriptions)
**Validation Rules**:
- `room_subscriptions` items MUST be valid Matrix room IDs
- `command_prefix` MUST NOT conflict with other bots (user responsibility)
- `error_notification_level` MUST be one of valid enum values
- `rate_limit_config.max_requests_per_minute` MUST be >0 and <=60
**Storage**:
- Stored in Bot Instance config JSON blob
- Editable via:
1. Maubot web UI (management interface)
2. Direct config file edit + bot restart (per FR-010)
---
### 5. Admin Notification
**Description**: ERROR and CRITICAL level bot notifications sent to Matrix homeserver admin room (shared with other platform notifications).
**Attributes**:
- `timestamp`: datetime (ISO 8601) - When notification was generated
- `source_instance`: string - Bot instance ID that triggered notification
- `severity`: string (enum) - Log level (`ERROR` or `CRITICAL`)
- `message`: string - Human-readable error description
- `context`: object (JSON) - Additional metadata
- `room_id`: string (optional) - Matrix room where error occurred
- `event_id`: string (optional) - Matrix event that triggered error
- `exception_type`: string (optional) - Python exception class
- `stack_trace`: string (optional) - Abbreviated stack trace (last 10 lines)
**Relationships**:
- Triggered by: Bot Instance (N:1)
- Sent to: Matrix Admin Room (N:1, shared room: defined in ops-jrz1 config)
**State Machine**: N/A (notifications are fire-and-forget events)
**Validation Rules**:
- `severity` MUST be `ERROR` or `CRITICAL` (DEBUG/INFO/WARN go to logs only per FR-013)
- `message` MUST be non-empty
- Matrix admin room MUST exist and bot MUST have send permission
**Storage**:
- Not persisted (real-time notification)
- Logged to systemd journal: `journalctl -u maubot.service`
- Visible in maubot management dashboard (recent notifications)
---
### 6. Bot Database
**Description**: Per-instance isolated SQLite database for plugin state and data persistence.
**Attributes**:
- `instance_id`: string (foreign key) - References Bot Instance
- `database_path`: string (path) - SQLite file location (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`)
- `schema_version`: integer - Plugin-defined schema version
- `size_bytes`: integer - Database file size
- `last_accessed`: datetime - Last read/write timestamp
**Relationships**:
- Belongs to: Bot Instance (1:1, optional - only if plugin requires DB)
- Managed by: Plugin code (plugin-defined schema)
**State Machine**:
```
[initialized] (schema created)
[active] (read/write operations)
[migrating] (schema upgrade in progress)
[active]
[archived] (bot deleted, DB preserved)
```
**Validation Rules**:
- `database_path` MUST be within `/var/lib/maubot/plugins/` directory
- Schema migrations MUST be handled by plugin code (not maubot framework)
- Database MUST be owned by `maubot` user/group
**Storage**:
- Location: `/var/lib/maubot/plugins/<instance-id>.db`
- Backup: Manual (part of `/var/lib/maubot/` directory backup)
---
## Relationships Diagram
```
┌─────────────────────┐
│ Matrix Homeserver │
│ (conduwuit) │
└──────────┬──────────┘
│ authenticates
┌──────────▼──────────┐
│ Maubot Service │
│ ┌──────────────┐ │
│ │ Admin UI │ │ ← admin login (sops-nix secrets)
│ │ :29316 │ │
│ └──────────────┘ │
│ │
│ manages ↓ │
│ │
│ ┌──────────────┐ │
│ │ Bot Instance │───┼──→ uses Plugin (.mbp)
│ │ (instagram) │ │
│ └───┬──────────┘ │
│ │ has config │
│ ↓ │
│ ┌──────────────┐ │
│ │ Bot Config │ │
│ │ - rooms[] │ │
│ │ - settings │ │
│ └──────────────┘ │
│ │
│ stores ↓ │
│ │
│ ┌──────────────┐ │
│ │ Bot Database │ │ (optional, plugin-specific)
│ │ (SQLite) │ │
│ └──────────────┘ │
└─────────────────────┘
│ sends notifications
┌─────────────────────┐
│ Matrix Admin Room │ (shared with platform)
└─────────────────────┘
```
---
## Configuration File Structures
### Maubot Service Config
**File**: `/var/lib/maubot/config/config.yaml`
**Structure**:
```yaml
database: "sqlite:///var/lib/maubot/bot.db"
server:
hostname: 0.0.0.0
port: 29316
admins:
admin: <INJECTED_FROM_CREDENTIALS_DIRECTORY> # Replaced at runtime
homeservers:
clarun.xyz:
url: http://127.0.0.1:8008
secret: <INJECTED_REGISTRATION_TOKEN> # Optional, for auto-registration
logging:
level: INFO
handlers:
- type: journal # Log to systemd journal
api_features:
login: true
plugin: true
plugin_upload: true
instance: true
instance_database: true
log: true
```
**Generation**:
1. Maubot example config generated via `maubot -c config.yaml -e`
2. Python script merges NixOS module overrides
3. Secrets injected from `$CREDENTIALS_DIRECTORY` (systemd LoadCredential)
4. Final config written to `/var/lib/maubot/config/config.yaml`
---
### Bot Instance Config
**Stored in**: Maubot SQLite DB (not file-based)
**Access methods**:
1. Maubot web UI (http://localhost:29316/_matrix/maubot)
2. Direct database edit (advanced, not recommended)
3. File-based config edit + restart (for room subscriptions per FR-010)
**Example config** (Instagram bot):
```json
{
"enabled": true,
"max_file_size": 50000000,
"room_subscriptions": [
"!abc123def:clarun.xyz",
"!xyz789ghi:clarun.xyz"
],
"rate_limiting": {
"enabled": true,
"max_requests_per_minute": 10,
"backoff_seconds": 30
},
"error_notification_level": "ERROR"
}
```
---
### Plugin Metadata
**File**: `maubot.yaml` (inside .mbp archive)
**Structure**:
```yaml
id: sna.instagram
version: 1.0.0
license: MIT
modules:
- instagram_bot
main_class: InstagramBot
database: false # Plugin doesn't use dedicated DB
config: true # Plugin accepts configuration
config_schema:
type: object
properties:
enabled:
type: boolean
default: true
max_file_size:
type: integer
default: 50000000
room_subscriptions:
type: array
items:
type: string
pattern: "^!.+:.+$"
dependencies:
- yt-dlp>=2023.1.6
- aiohttp
- pillow
```
---
## State Persistence
### Service State
**Location**: `/var/lib/maubot/bot.db` (SQLite)
**Tables**:
- `instance` - Bot instance metadata
- `plugin` - Uploaded plugin metadata
- `client` - Matrix client credentials (access tokens)
- `log` - Recent bot activity logs
**Backup strategy**:
- Included in `/var/lib/maubot/` directory backup
- Rollback via NixOS generations (service config)
- Database can be wiped and rebuilt from scratch (bot re-registration required)
### Runtime State
**Location**: Memory (maubot service process)
**Contents**:
- Active bot instances (Python objects)
- Matrix client connections (aiohttp sessions)
- Event handlers (registered callbacks)
- Plugin instances (loaded Python classes)
**Recovery**:
- Automatic on maubot service restart
- Bot instances reconnect to Matrix
- Plugin state reloaded from DB (if applicable)
---
## Security Model
### Secrets Hierarchy
1. **Service-level secrets** (sops-nix encrypted):
- `maubot-admin-password` - Management UI login
- `maubot-secret-key` - Session signing
- `matrix-registration-token` - Bot user creation (reused from Matrix homeserver)
2. **Bot-level secrets** (stored in maubot DB):
- Matrix access tokens (per bot instance)
- Matrix device IDs
- Plugin-specific credentials (if any)
3. **Runtime secrets** (ephemeral):
- Active session tokens (management UI)
- Matrix sync tokens (E2EE keys if enabled)
### Permissions
**File permissions**:
```
/var/lib/maubot/ → drwxr-x--- maubot:maubot
/var/lib/maubot/config/ → drwx------ maubot:maubot
/var/lib/maubot/config/config.yaml → -rw------- maubot:maubot (contains secrets)
/var/lib/maubot/bot.db → -rw-r----- maubot:maubot
/var/lib/maubot/plugins/ → drwxr-xr-x maubot:maubot
/run/secrets/maubot-* → -r-------- maubot:maubot (0400)
```
**Network access**:
- Management interface: localhost:29316 only (SSH tunnel required for remote access per spec)
- Matrix homeserver: localhost:8008 (IPv4, conduwuit compatibility)
- No external network access (except Matrix federation via homeserver)
---
## Operational Entities
### Health Check State
**Attributes**:
- `last_check_timestamp`: datetime
- `service_status`: enum (`healthy`, `degraded`, `failed`)
- `maubot_version_endpoint`: boolean - `/maubot/v1/version` accessible
- `active_instances_count`: integer
- `failed_instances`: array[string] - Instance IDs with errors
- `last_successful_message_timestamp`: datetime (per bot instance)
**Storage**: Systemd timer state + systemd journal logs
**Health indicators** (per spec SC-003):
- Service responds to HTTP health check (curl to version endpoint)
- Active instances count matches enabled instances count
- No ERROR/CRITICAL logs in last 5 minutes
- All enabled bots have recent Matrix sync activity (<10 minutes)
---
## Data Flow Diagrams
### Instagram URL Processing Flow
```
1. User posts Instagram URL in Matrix room
2. Matrix homeserver distributes event to all clients
3. Bot instance receives event (if subscribed to that room)
4. Plugin regex matches Instagram URL pattern
5. Plugin calls yt-dlp extraction (async thread pool)
6. yt-dlp downloads media to temporary directory
7. Plugin uploads media to Matrix homeserver
8. Plugin sends Matrix message event with media attachment
9. Cleanup temporary files
10. Log extraction success/failure (severity-based notification if ERROR/CRITICAL)
```
### Bot Registration Flow
```
1. Admin accesses maubot web UI via SSH tunnel
2. Create new bot client (provide Matrix user ID)
3. Maubot attempts registration via conduwuit registration token
4. If successful: Access token stored in maubot DB
5. Create bot instance (select plugin, provide config)
6. Bot connects to Matrix homeserver
7. Bot joins configured rooms (from room_subscriptions)
8. Bot starts listening for events
```
---
## Validation Rules Summary
### Configuration Validation
- All Matrix room IDs MUST match pattern `!.+:.+`
- Homeserver URL MUST be `http://127.0.0.1:PORT` (IPv4, not localhost)
- Admin password MUST meet minimum strength (length >=16, bcrypt cost >=12)
- Plugin IDs MUST be globally unique within maubot instance
- File paths MUST be absolute and within permitted directories
### Runtime Validation
- Bot instances CANNOT start without valid Matrix access token
- Room subscriptions MUST reference existing rooms (checked at runtime, logged if invalid)
- Plugin dependencies MUST be available in NixOS environment
- Rate limiting MUST be enforced before external API calls (Instagram)
### Security Validation
- Secrets MUST NEVER appear in logs or config files (placeholders only)
- Management interface MUST bind localhost only (0.0.0.0 for within-container, but not exposed externally)
- Database files MUST have restrictive permissions (0600 or 0640)
- ERROR/CRITICAL notifications MUST include sanitized context (no credentials in stack traces)
---
## Migration Strategy
### From ops-base to ops-jrz1
**Data migration**: Not required (fresh deployment)
**Configuration migration**:
1. Extract maubot.nix module from ops-base
2. Adapt namespace: `services.matrix-vm.maubot``services.maubot`
3. Update homeserver URL: `continuwuity``conduwuit`
4. Remove registration_secrets (not supported by conduwuit)
5. Add registration token configuration
**Plugin migration**:
1. Copy Instagram bot .mbp file from ops-base: `/home/dan/proj/sna/sna-instagram-bot.mbp`
2. Upload to ops-jrz1 maubot via web UI or API
3. Create bot instance with room subscriptions
4. Test content fetching in designated rooms
**No database migration needed** (SQLite DB created fresh on ops-jrz1)
---
## Capacity Planning
### Single Instagram Bot Instance
**Estimated resource usage**:
- Memory: ~100MB (maubot service + bot instance + yt-dlp subprocess)
- Disk:
- Maubot DB: <10MB (metadata only)
- Plugins: ~1MB per .mbp file
- Temporary files: Up to 50MB (during media download, auto-cleanup)
- CPU: Burst during media extraction (yt-dlp), idle otherwise
- Network: <1GB/day (assuming <20 Instagram fetches/day at ~50MB each)
**Scale validation** (per SC-002):
- Maubot service supports 3+ concurrent instances without degradation
- Each additional bot: ~50MB memory, minimal CPU/network impact
- Shared resources: Maubot DB (SQLite supports concurrent reads), management UI
---
**Status**: Data model complete. Ready for quickstart.md generation.