ops-jrz1/specs/003-maubot-integration/data-model.md

20 KiB

Data Model: Maubot Integration

Feature: 003-maubot-integration Date: 2025-10-26 Status: Phase 1 design

Overview

This document defines the data structures, state machines, and relationships for the maubot integration feature. Since maubot is an infrastructure service (not an application with user-facing data), the focus is on service configuration, runtime state, and operational entities.


Core Entities

1. Maubot Service

Description: The maubot framework service that manages bot instances and provides the web-based management interface.

Attributes:

  • homeserver_url: string (URL) - Matrix homeserver endpoint (e.g., http://127.0.0.1:8008)
  • server_name: string (domain) - Matrix server domain (e.g., clarun.xyz)
  • port: integer - Management interface port (default: 29316)
  • database_uri: string - SQLite database path (e.g., sqlite:///var/lib/maubot/bot.db)
  • admin_username: string - Admin UI login username
  • admin_password_hash: string (secret) - Hashed admin password
  • secret_key: string (secret) - Session signing key
  • config_path: string (path) - Runtime config location (/var/lib/maubot/config/config.yaml)

Relationships:

  • Has many: Bot Instances (1:N)
  • Has many: Plugins (1:N)
  • Connects to: Matrix Homeserver (1:1)

State Machine: N/A (service-level, managed by systemd)

Validation Rules:

  • homeserver_url MUST be IPv4 127.0.0.1:PORT (not localhost - conduwuit compatibility)
  • port MUST NOT conflict with existing services (check: 8008 Matrix, 29319 Slack bridge, 3000 Forgejo)
  • admin_password_hash MUST be bcrypt with cost >=12
  • secret_key MUST be >=32 bytes random

Storage:

  • NixOS module configuration: /home/dan/proj/ops-jrz1/modules/maubot.nix
  • Runtime config: /var/lib/maubot/config/config.yaml
  • Secrets: /run/secrets/maubot-* (sops-nix decrypted)

2. Bot Instance

Description: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment.

Attributes:

  • id: string (slug) - Instance identifier (e.g., instagram-bot-1)
  • type: string - Plugin ID (e.g., sna.instagram)
  • primary_user: string (MXID) - Matrix user ID (e.g., @instagram-bot:clarun.xyz)
  • enabled: boolean - Whether bot is active
  • config: object (JSON) - Plugin-specific configuration
    • For Instagram bot: {"enabled": true, "max_file_size": 50000000, "room_subscriptions": ["!roomid1:clarun.xyz"]}
  • access_token: string (secret) - Matrix access token (ephemeral, stored in bot DB)
  • device_id: string - Matrix device identifier
  • database_path: string (optional) - Per-bot database if plugin requires (e.g., /var/lib/maubot/plugins/instagram-bot-1.db)

Relationships:

  • Belongs to: Maubot Service (N:1)
  • Uses: Plugin (N:1)
  • Authenticated as: Matrix User (1:1)
  • Subscribed to: Matrix Rooms (N:M via room_subscriptions config)

State Machine:

  [created]
     ↓
  [configured] ─→ disabled
     ↓              ↓
  [enabled] ←───────┘
     ↓
  [running] ←→ [stopped]
     ↓
  [failed] → [restarting]

States:

  • created: Instance exists in maubot DB but not yet configured
  • configured: Config provided, Matrix user created, not yet enabled
  • enabled: Marked as active in config
  • running: Bot process active, connected to Matrix, responding to events
  • stopped: Manually stopped via management UI
  • failed: Encountered error (logged to maubot service journal)
  • restarting: Auto-recovery in progress

Validation Rules:

  • primary_user MUST match pattern @[a-z0-9-]+:clarun.xyz
  • type MUST reference an uploaded Plugin
  • config.room_subscriptions MUST be array of valid Matrix room IDs (format: !...clarun.xyz)
  • enabled=true requires access_token to be set (bot authenticated)

Storage:

  • Instance metadata: Maubot SQLite DB (/var/lib/maubot/bot.db table: instance)
  • Access tokens: Maubot SQLite DB (encrypted at rest)
  • Plugin config: Maubot SQLite DB (JSON blob)

3. Plugin

Description: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies.

Attributes:

  • id: string - Plugin identifier (e.g., sna.instagram)
  • version: string (semver) - Plugin version (e.g., 1.0.0)
  • main_class: string - Python class name (e.g., InstagramBot)
  • modules: array[string] - Python module list (e.g., ["instagram_bot"])
  • dependencies: array[string] - Python package dependencies (e.g., ["yt-dlp>=2023.1.6", "aiohttp"])
  • database: boolean - Whether plugin requires dedicated database
  • config_schema: object (JSON Schema) - Plugin configuration validation schema
  • upload_path: string (path) - Storage location (e.g., /var/lib/maubot/plugins/sna.instagram-v1.0.0.mbp)

Relationships:

  • Belongs to: Maubot Service (N:1)
  • Used by: Bot Instances (1:N)

State Machine:

  [uploaded]
     ↓
  [validated] ─→ [rejected] (invalid metadata)
     ↓
  [loaded] ←→ [disabled]
     ↓
  [active] (used by >=1 running instance)
     ↓
  [trashed] → [deleted]

Validation Rules:

  • id MUST match pattern [a-z][a-z0-9._-]+
  • version MUST be valid semver
  • main_class MUST exist in provided modules
  • .mbp file MUST be valid zip containing maubot.yaml + Python files
  • dependencies MUST be available in nixpkgs (e.g., yt-dlp is available, instaloader is not)

Storage:

  • Active plugins: /var/lib/maubot/plugins/
  • Trashed plugins: /var/lib/maubot/trash/
  • Metadata: Maubot SQLite DB (table: plugin)

4. Bot Configuration

Description: Settings specific to bot instance including Matrix credentials, plugin settings, and room subscriptions.

Attributes:

  • instance_id: string (foreign key) - References Bot Instance
  • room_subscriptions: array[string] - List of Matrix room IDs where bot is active
    • Example: ["!abc123:clarun.xyz", "!def456:clarun.xyz"]
  • command_prefix: string (optional) - Bot command trigger (e.g., !instagram, !ig)
  • enabled_features: object - Feature flags for plugin
    • For Instagram bot: {"auto_fetch": true, "rate_limiting": true, "caching": false}
  • rate_limit_config: object - Rate limiting parameters
    • Example: {"max_requests_per_minute": 10, "burst_size": 3, "backoff_seconds": 30}
  • error_notification_level: string (enum) - Minimum severity for admin notifications
    • Values: DEBUG, INFO, WARN, ERROR, CRITICAL
    • Default: ERROR (per spec FR-013)

Relationships:

  • Belongs to: Bot Instance (1:1)
  • References: Matrix Rooms (N:M via room_subscriptions)

Validation Rules:

  • room_subscriptions items MUST be valid Matrix room IDs
  • command_prefix MUST NOT conflict with other bots (user responsibility)
  • error_notification_level MUST be one of valid enum values
  • rate_limit_config.max_requests_per_minute MUST be >0 and <=60

Storage:

  • Stored in Bot Instance config JSON blob
  • Editable via:
    1. Maubot web UI (management interface)
    2. Direct config file edit + bot restart (per FR-010)

5. Admin Notification

Description: ERROR and CRITICAL level bot notifications sent to Matrix homeserver admin room (shared with other platform notifications).

Attributes:

  • timestamp: datetime (ISO 8601) - When notification was generated
  • source_instance: string - Bot instance ID that triggered notification
  • severity: string (enum) - Log level (ERROR or CRITICAL)
  • message: string - Human-readable error description
  • context: object (JSON) - Additional metadata
    • room_id: string (optional) - Matrix room where error occurred
    • event_id: string (optional) - Matrix event that triggered error
    • exception_type: string (optional) - Python exception class
    • stack_trace: string (optional) - Abbreviated stack trace (last 10 lines)

Relationships:

  • Triggered by: Bot Instance (N:1)
  • Sent to: Matrix Admin Room (N:1, shared room: defined in ops-jrz1 config)

State Machine: N/A (notifications are fire-and-forget events)

Validation Rules:

  • severity MUST be ERROR or CRITICAL (DEBUG/INFO/WARN go to logs only per FR-013)
  • message MUST be non-empty
  • Matrix admin room MUST exist and bot MUST have send permission

Storage:

  • Not persisted (real-time notification)
  • Logged to systemd journal: journalctl -u maubot.service
  • Visible in maubot management dashboard (recent notifications)

6. Bot Database

Description: Per-instance isolated SQLite database for plugin state and data persistence.

Attributes:

  • instance_id: string (foreign key) - References Bot Instance
  • database_path: string (path) - SQLite file location (e.g., /var/lib/maubot/plugins/instagram-bot-1.db)
  • schema_version: integer - Plugin-defined schema version
  • size_bytes: integer - Database file size
  • last_accessed: datetime - Last read/write timestamp

Relationships:

  • Belongs to: Bot Instance (1:1, optional - only if plugin requires DB)
  • Managed by: Plugin code (plugin-defined schema)

State Machine:

  [initialized] (schema created)
     ↓
  [active] (read/write operations)
     ↓
  [migrating] (schema upgrade in progress)
     ↓
  [active]
     ↓
  [archived] (bot deleted, DB preserved)

Validation Rules:

  • database_path MUST be within /var/lib/maubot/plugins/ directory
  • Schema migrations MUST be handled by plugin code (not maubot framework)
  • Database MUST be owned by maubot user/group

Storage:

  • Location: /var/lib/maubot/plugins/<instance-id>.db
  • Backup: Manual (part of /var/lib/maubot/ directory backup)

Relationships Diagram

┌─────────────────────┐
│  Matrix Homeserver  │
│   (conduwuit)       │
└──────────┬──────────┘
           │ authenticates
           │
┌──────────▼──────────┐
│  Maubot Service     │
│  ┌──────────────┐   │
│  │ Admin UI     │   │ ← admin login (sops-nix secrets)
│  │ :29316       │   │
│  └──────────────┘   │
│                     │
│  manages ↓          │
│                     │
│  ┌──────────────┐   │
│  │ Bot Instance │───┼──→ uses Plugin (.mbp)
│  │ (instagram)  │   │
│  └───┬──────────┘   │
│      │ has config   │
│      ↓              │
│  ┌──────────────┐   │
│  │ Bot Config   │   │
│  │ - rooms[]    │   │
│  │ - settings   │   │
│  └──────────────┘   │
│                     │
│  stores ↓           │
│                     │
│  ┌──────────────┐   │
│  │ Bot Database │   │ (optional, plugin-specific)
│  │ (SQLite)     │   │
│  └──────────────┘   │
└─────────────────────┘
           │ sends notifications
           ↓
┌─────────────────────┐
│  Matrix Admin Room  │ (shared with platform)
└─────────────────────┘

Configuration File Structures

Maubot Service Config

File: /var/lib/maubot/config/config.yaml

Structure:

database: "sqlite:///var/lib/maubot/bot.db"

server:
  hostname: 0.0.0.0
  port: 29316

admins:
  admin: <INJECTED_FROM_CREDENTIALS_DIRECTORY>  # Replaced at runtime

homeservers:
  clarun.xyz:
    url: http://127.0.0.1:8008
    secret: <INJECTED_REGISTRATION_TOKEN>  # Optional, for auto-registration

logging:
  level: INFO
  handlers:
    - type: journal  # Log to systemd journal

api_features:
  login: true
  plugin: true
  plugin_upload: true
  instance: true
  instance_database: true
  log: true

Generation:

  1. Maubot example config generated via maubot -c config.yaml -e
  2. Python script merges NixOS module overrides
  3. Secrets injected from $CREDENTIALS_DIRECTORY (systemd LoadCredential)
  4. Final config written to /var/lib/maubot/config/config.yaml

Bot Instance Config

Stored in: Maubot SQLite DB (not file-based)

Access methods:

  1. Maubot web UI (http://localhost:29316/_matrix/maubot)
  2. Direct database edit (advanced, not recommended)
  3. File-based config edit + restart (for room subscriptions per FR-010)

Example config (Instagram bot):

{
  "enabled": true,
  "max_file_size": 50000000,
  "room_subscriptions": [
    "!abc123def:clarun.xyz",
    "!xyz789ghi:clarun.xyz"
  ],
  "rate_limiting": {
    "enabled": true,
    "max_requests_per_minute": 10,
    "backoff_seconds": 30
  },
  "error_notification_level": "ERROR"
}

Plugin Metadata

File: maubot.yaml (inside .mbp archive)

Structure:

id: sna.instagram
version: 1.0.0
license: MIT
modules:
  - instagram_bot
main_class: InstagramBot

database: false  # Plugin doesn't use dedicated DB

config: true  # Plugin accepts configuration
config_schema:
  type: object
  properties:
    enabled:
      type: boolean
      default: true
    max_file_size:
      type: integer
      default: 50000000
    room_subscriptions:
      type: array
      items:
        type: string
        pattern: "^!.+:.+$"

dependencies:
  - yt-dlp>=2023.1.6
  - aiohttp
  - pillow

State Persistence

Service State

Location: /var/lib/maubot/bot.db (SQLite)

Tables:

  • instance - Bot instance metadata
  • plugin - Uploaded plugin metadata
  • client - Matrix client credentials (access tokens)
  • log - Recent bot activity logs

Backup strategy:

  • Included in /var/lib/maubot/ directory backup
  • Rollback via NixOS generations (service config)
  • Database can be wiped and rebuilt from scratch (bot re-registration required)

Runtime State

Location: Memory (maubot service process)

Contents:

  • Active bot instances (Python objects)
  • Matrix client connections (aiohttp sessions)
  • Event handlers (registered callbacks)
  • Plugin instances (loaded Python classes)

Recovery:

  • Automatic on maubot service restart
  • Bot instances reconnect to Matrix
  • Plugin state reloaded from DB (if applicable)

Security Model

Secrets Hierarchy

  1. Service-level secrets (sops-nix encrypted):

    • maubot-admin-password - Management UI login
    • maubot-secret-key - Session signing
    • matrix-registration-token - Bot user creation (reused from Matrix homeserver)
  2. Bot-level secrets (stored in maubot DB):

    • Matrix access tokens (per bot instance)
    • Matrix device IDs
    • Plugin-specific credentials (if any)
  3. Runtime secrets (ephemeral):

    • Active session tokens (management UI)
    • Matrix sync tokens (E2EE keys if enabled)

Permissions

File permissions:

/var/lib/maubot/                   → drwxr-x--- maubot:maubot
/var/lib/maubot/config/            → drwx------ maubot:maubot
/var/lib/maubot/config/config.yaml → -rw------- maubot:maubot (contains secrets)
/var/lib/maubot/bot.db             → -rw-r----- maubot:maubot
/var/lib/maubot/plugins/           → drwxr-xr-x maubot:maubot
/run/secrets/maubot-*              → -r-------- maubot:maubot (0400)

Network access:

  • Management interface: localhost:29316 only (SSH tunnel required for remote access per spec)
  • Matrix homeserver: localhost:8008 (IPv4, conduwuit compatibility)
  • No external network access (except Matrix federation via homeserver)

Operational Entities

Health Check State

Attributes:

  • last_check_timestamp: datetime
  • service_status: enum (healthy, degraded, failed)
  • maubot_version_endpoint: boolean - /maubot/v1/version accessible
  • active_instances_count: integer
  • failed_instances: array[string] - Instance IDs with errors
  • last_successful_message_timestamp: datetime (per bot instance)

Storage: Systemd timer state + systemd journal logs

Health indicators (per spec SC-003):

  • Service responds to HTTP health check (curl to version endpoint)
  • Active instances count matches enabled instances count
  • No ERROR/CRITICAL logs in last 5 minutes
  • All enabled bots have recent Matrix sync activity (<10 minutes)

Data Flow Diagrams

Instagram URL Processing Flow

1. User posts Instagram URL in Matrix room
   ↓
2. Matrix homeserver distributes event to all clients
   ↓
3. Bot instance receives event (if subscribed to that room)
   ↓
4. Plugin regex matches Instagram URL pattern
   ↓
5. Plugin calls yt-dlp extraction (async thread pool)
   ↓
6. yt-dlp downloads media to temporary directory
   ↓
7. Plugin uploads media to Matrix homeserver
   ↓
8. Plugin sends Matrix message event with media attachment
   ↓
9. Cleanup temporary files
   ↓
10. Log extraction success/failure (severity-based notification if ERROR/CRITICAL)

Bot Registration Flow

1. Admin accesses maubot web UI via SSH tunnel
   ↓
2. Create new bot client (provide Matrix user ID)
   ↓
3. Maubot attempts registration via conduwuit registration token
   ↓
4. If successful: Access token stored in maubot DB
   ↓
5. Create bot instance (select plugin, provide config)
   ↓
6. Bot connects to Matrix homeserver
   ↓
7. Bot joins configured rooms (from room_subscriptions)
   ↓
8. Bot starts listening for events

Validation Rules Summary

Configuration Validation

  • All Matrix room IDs MUST match pattern !.+:.+
  • Homeserver URL MUST be http://127.0.0.1:PORT (IPv4, not localhost)
  • Admin password MUST meet minimum strength (length >=16, bcrypt cost >=12)
  • Plugin IDs MUST be globally unique within maubot instance
  • File paths MUST be absolute and within permitted directories

Runtime Validation

  • Bot instances CANNOT start without valid Matrix access token
  • Room subscriptions MUST reference existing rooms (checked at runtime, logged if invalid)
  • Plugin dependencies MUST be available in NixOS environment
  • Rate limiting MUST be enforced before external API calls (Instagram)

Security Validation

  • Secrets MUST NEVER appear in logs or config files (placeholders only)
  • Management interface MUST bind localhost only (0.0.0.0 for within-container, but not exposed externally)
  • Database files MUST have restrictive permissions (0600 or 0640)
  • ERROR/CRITICAL notifications MUST include sanitized context (no credentials in stack traces)

Migration Strategy

From ops-base to ops-jrz1

Data migration: Not required (fresh deployment)

Configuration migration:

  1. Extract maubot.nix module from ops-base
  2. Adapt namespace: services.matrix-vm.maubotservices.maubot
  3. Update homeserver URL: continuwuityconduwuit
  4. Remove registration_secrets (not supported by conduwuit)
  5. Add registration token configuration

Plugin migration:

  1. Copy Instagram bot .mbp file from ops-base: /home/dan/proj/sna/sna-instagram-bot.mbp
  2. Upload to ops-jrz1 maubot via web UI or API
  3. Create bot instance with room subscriptions
  4. Test content fetching in designated rooms

No database migration needed (SQLite DB created fresh on ops-jrz1)


Capacity Planning

Single Instagram Bot Instance

Estimated resource usage:

  • Memory: ~100MB (maubot service + bot instance + yt-dlp subprocess)
  • Disk:
    • Maubot DB: <10MB (metadata only)
    • Plugins: ~1MB per .mbp file
    • Temporary files: Up to 50MB (during media download, auto-cleanup)
  • CPU: Burst during media extraction (yt-dlp), idle otherwise
  • Network: <1GB/day (assuming <20 Instagram fetches/day at ~50MB each)

Scale validation (per SC-002):

  • Maubot service supports 3+ concurrent instances without degradation
  • Each additional bot: ~50MB memory, minimal CPU/network impact
  • Shared resources: Maubot DB (SQLite supports concurrent reads), management UI

Status: Data model complete. Ready for quickstart.md generation.