diff --git a/specs/003-maubot-integration/checklists/requirements.md b/specs/003-maubot-integration/checklists/requirements.md new file mode 100644 index 0000000..b6ac4ca --- /dev/null +++ b/specs/003-maubot-integration/checklists/requirements.md @@ -0,0 +1,42 @@ +# Specification Quality Checklist: Matrix Bot Framework (Maubot) Integration + +**Purpose**: Validate specification completeness and quality before proceeding to planning +**Created**: 2025-10-26 +**Feature**: [spec.md](../spec.md) + +## Content Quality + +- [x] No implementation details (languages, frameworks, APIs) +- [x] Focused on user value and business needs +- [x] Written for non-technical stakeholders +- [x] All mandatory sections completed + +## Requirement Completeness + +- [x] No [NEEDS CLARIFICATION] markers remain +- [x] Requirements are testable and unambiguous +- [x] Success criteria are measurable +- [x] Success criteria are technology-agnostic (no implementation details) +- [x] All acceptance scenarios are defined +- [x] Edge cases are identified +- [x] Scope is clearly bounded +- [x] Dependencies and assumptions identified + +## Feature Readiness + +- [x] All functional requirements have clear acceptance criteria +- [x] User scenarios cover primary flows +- [x] Feature meets measurable outcomes defined in Success Criteria +- [x] No implementation details leak into specification + +## Notes + +**All clarification questions resolved** (2025-10-26): + +1. ✅ Instagram authentication approach: Community scraping methods (instaloader, yt-dlp) +2. ✅ Management interface network exposure: Localhost only (SSH tunnel for remote access) +3. ✅ Bot instance quantity planning: Single Instagram bot instance initially + +**Specification is complete and ready for planning phase.** + +All checklist items pass validation. No blocking issues identified. diff --git a/specs/003-maubot-integration/data-model.md b/specs/003-maubot-integration/data-model.md new file mode 100644 index 0000000..fce2282 --- /dev/null +++ b/specs/003-maubot-integration/data-model.md @@ -0,0 +1,625 @@ +# Data Model: Maubot Integration + +**Feature**: 003-maubot-integration +**Date**: 2025-10-26 +**Status**: Phase 1 design + +## Overview + +This document defines the data structures, state machines, and relationships for the maubot integration feature. Since maubot is an infrastructure service (not an application with user-facing data), the focus is on service configuration, runtime state, and operational entities. + +--- + +## Core Entities + +### 1. Maubot Service + +**Description**: The maubot framework service that manages bot instances and provides the web-based management interface. + +**Attributes**: +- `homeserver_url`: string (URL) - Matrix homeserver endpoint (e.g., `http://127.0.0.1:8008`) +- `server_name`: string (domain) - Matrix server domain (e.g., `clarun.xyz`) +- `port`: integer - Management interface port (default: 29316) +- `database_uri`: string - SQLite database path (e.g., `sqlite:///var/lib/maubot/bot.db`) +- `admin_username`: string - Admin UI login username +- `admin_password_hash`: string (secret) - Hashed admin password +- `secret_key`: string (secret) - Session signing key +- `config_path`: string (path) - Runtime config location (`/var/lib/maubot/config/config.yaml`) + +**Relationships**: +- Has many: Bot Instances (1:N) +- Has many: Plugins (1:N) +- Connects to: Matrix Homeserver (1:1) + +**State Machine**: N/A (service-level, managed by systemd) + +**Validation Rules**: +- `homeserver_url` MUST be IPv4 `127.0.0.1:PORT` (not localhost - conduwuit compatibility) +- `port` MUST NOT conflict with existing services (check: 8008 Matrix, 29319 Slack bridge, 3000 Forgejo) +- `admin_password_hash` MUST be bcrypt with cost >=12 +- `secret_key` MUST be >=32 bytes random + +**Storage**: +- NixOS module configuration: `/home/dan/proj/ops-jrz1/modules/maubot.nix` +- Runtime config: `/var/lib/maubot/config/config.yaml` +- Secrets: `/run/secrets/maubot-*` (sops-nix decrypted) + +--- + +### 2. Bot Instance + +**Description**: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment. + +**Attributes**: +- `id`: string (slug) - Instance identifier (e.g., `instagram-bot-1`) +- `type`: string - Plugin ID (e.g., `sna.instagram`) +- `primary_user`: string (MXID) - Matrix user ID (e.g., `@instagram-bot:clarun.xyz`) +- `enabled`: boolean - Whether bot is active +- `config`: object (JSON) - Plugin-specific configuration + - For Instagram bot: `{"enabled": true, "max_file_size": 50000000, "room_subscriptions": ["!roomid1:clarun.xyz"]}` +- `access_token`: string (secret) - Matrix access token (ephemeral, stored in bot DB) +- `device_id`: string - Matrix device identifier +- `database_path`: string (optional) - Per-bot database if plugin requires (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`) + +**Relationships**: +- Belongs to: Maubot Service (N:1) +- Uses: Plugin (N:1) +- Authenticated as: Matrix User (1:1) +- Subscribed to: Matrix Rooms (N:M via room_subscriptions config) + +**State Machine**: +``` + [created] + ↓ + [configured] ─→ disabled + ↓ ↓ + [enabled] ←───────┘ + ↓ + [running] ←→ [stopped] + ↓ + [failed] → [restarting] +``` + +**States**: +- `created`: Instance exists in maubot DB but not yet configured +- `configured`: Config provided, Matrix user created, not yet enabled +- `enabled`: Marked as active in config +- `running`: Bot process active, connected to Matrix, responding to events +- `stopped`: Manually stopped via management UI +- `failed`: Encountered error (logged to maubot service journal) +- `restarting`: Auto-recovery in progress + +**Validation Rules**: +- `primary_user` MUST match pattern `@[a-z0-9-]+:clarun.xyz` +- `type` MUST reference an uploaded Plugin +- `config.room_subscriptions` MUST be array of valid Matrix room IDs (format: `!...clarun.xyz`) +- `enabled=true` requires `access_token` to be set (bot authenticated) + +**Storage**: +- Instance metadata: Maubot SQLite DB (`/var/lib/maubot/bot.db` table: `instance`) +- Access tokens: Maubot SQLite DB (encrypted at rest) +- Plugin config: Maubot SQLite DB (JSON blob) + +--- + +### 3. Plugin + +**Description**: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies. + +**Attributes**: +- `id`: string - Plugin identifier (e.g., `sna.instagram`) +- `version`: string (semver) - Plugin version (e.g., `1.0.0`) +- `main_class`: string - Python class name (e.g., `InstagramBot`) +- `modules`: array[string] - Python module list (e.g., `["instagram_bot"]`) +- `dependencies`: array[string] - Python package dependencies (e.g., `["yt-dlp>=2023.1.6", "aiohttp"]`) +- `database`: boolean - Whether plugin requires dedicated database +- `config_schema`: object (JSON Schema) - Plugin configuration validation schema +- `upload_path`: string (path) - Storage location (e.g., `/var/lib/maubot/plugins/sna.instagram-v1.0.0.mbp`) + +**Relationships**: +- Belongs to: Maubot Service (N:1) +- Used by: Bot Instances (1:N) + +**State Machine**: +``` + [uploaded] + ↓ + [validated] ─→ [rejected] (invalid metadata) + ↓ + [loaded] ←→ [disabled] + ↓ + [active] (used by >=1 running instance) + ↓ + [trashed] → [deleted] +``` + +**Validation Rules**: +- `id` MUST match pattern `[a-z][a-z0-9._-]+` +- `version` MUST be valid semver +- `main_class` MUST exist in provided modules +- `.mbp` file MUST be valid zip containing `maubot.yaml` + Python files +- `dependencies` MUST be available in nixpkgs (e.g., yt-dlp is available, instaloader is not) + +**Storage**: +- Active plugins: `/var/lib/maubot/plugins/` +- Trashed plugins: `/var/lib/maubot/trash/` +- Metadata: Maubot SQLite DB (table: `plugin`) + +--- + +### 4. Bot Configuration + +**Description**: Settings specific to bot instance including Matrix credentials, plugin settings, and room subscriptions. + +**Attributes**: +- `instance_id`: string (foreign key) - References Bot Instance +- `room_subscriptions`: array[string] - List of Matrix room IDs where bot is active + - Example: `["!abc123:clarun.xyz", "!def456:clarun.xyz"]` +- `command_prefix`: string (optional) - Bot command trigger (e.g., `!instagram`, `!ig`) +- `enabled_features`: object - Feature flags for plugin + - For Instagram bot: `{"auto_fetch": true, "rate_limiting": true, "caching": false}` +- `rate_limit_config`: object - Rate limiting parameters + - Example: `{"max_requests_per_minute": 10, "burst_size": 3, "backoff_seconds": 30}` +- `error_notification_level`: string (enum) - Minimum severity for admin notifications + - Values: `DEBUG`, `INFO`, `WARN`, `ERROR`, `CRITICAL` + - Default: `ERROR` (per spec FR-013) + +**Relationships**: +- Belongs to: Bot Instance (1:1) +- References: Matrix Rooms (N:M via room_subscriptions) + +**Validation Rules**: +- `room_subscriptions` items MUST be valid Matrix room IDs +- `command_prefix` MUST NOT conflict with other bots (user responsibility) +- `error_notification_level` MUST be one of valid enum values +- `rate_limit_config.max_requests_per_minute` MUST be >0 and <=60 + +**Storage**: +- Stored in Bot Instance config JSON blob +- Editable via: + 1. Maubot web UI (management interface) + 2. Direct config file edit + bot restart (per FR-010) + +--- + +### 5. Admin Notification + +**Description**: ERROR and CRITICAL level bot notifications sent to Matrix homeserver admin room (shared with other platform notifications). + +**Attributes**: +- `timestamp`: datetime (ISO 8601) - When notification was generated +- `source_instance`: string - Bot instance ID that triggered notification +- `severity`: string (enum) - Log level (`ERROR` or `CRITICAL`) +- `message`: string - Human-readable error description +- `context`: object (JSON) - Additional metadata + - `room_id`: string (optional) - Matrix room where error occurred + - `event_id`: string (optional) - Matrix event that triggered error + - `exception_type`: string (optional) - Python exception class + - `stack_trace`: string (optional) - Abbreviated stack trace (last 10 lines) + +**Relationships**: +- Triggered by: Bot Instance (N:1) +- Sent to: Matrix Admin Room (N:1, shared room: defined in ops-jrz1 config) + +**State Machine**: N/A (notifications are fire-and-forget events) + +**Validation Rules**: +- `severity` MUST be `ERROR` or `CRITICAL` (DEBUG/INFO/WARN go to logs only per FR-013) +- `message` MUST be non-empty +- Matrix admin room MUST exist and bot MUST have send permission + +**Storage**: +- Not persisted (real-time notification) +- Logged to systemd journal: `journalctl -u maubot.service` +- Visible in maubot management dashboard (recent notifications) + +--- + +### 6. Bot Database + +**Description**: Per-instance isolated SQLite database for plugin state and data persistence. + +**Attributes**: +- `instance_id`: string (foreign key) - References Bot Instance +- `database_path`: string (path) - SQLite file location (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`) +- `schema_version`: integer - Plugin-defined schema version +- `size_bytes`: integer - Database file size +- `last_accessed`: datetime - Last read/write timestamp + +**Relationships**: +- Belongs to: Bot Instance (1:1, optional - only if plugin requires DB) +- Managed by: Plugin code (plugin-defined schema) + +**State Machine**: +``` + [initialized] (schema created) + ↓ + [active] (read/write operations) + ↓ + [migrating] (schema upgrade in progress) + ↓ + [active] + ↓ + [archived] (bot deleted, DB preserved) +``` + +**Validation Rules**: +- `database_path` MUST be within `/var/lib/maubot/plugins/` directory +- Schema migrations MUST be handled by plugin code (not maubot framework) +- Database MUST be owned by `maubot` user/group + +**Storage**: +- Location: `/var/lib/maubot/plugins/.db` +- Backup: Manual (part of `/var/lib/maubot/` directory backup) + +--- + +## Relationships Diagram + +``` +┌─────────────────────┐ +│ Matrix Homeserver │ +│ (conduwuit) │ +└──────────┬──────────┘ + │ authenticates + │ +┌──────────▼──────────┐ +│ Maubot Service │ +│ ┌──────────────┐ │ +│ │ Admin UI │ │ ← admin login (sops-nix secrets) +│ │ :29316 │ │ +│ └──────────────┘ │ +│ │ +│ manages ↓ │ +│ │ +│ ┌──────────────┐ │ +│ │ Bot Instance │───┼──→ uses Plugin (.mbp) +│ │ (instagram) │ │ +│ └───┬──────────┘ │ +│ │ has config │ +│ ↓ │ +│ ┌──────────────┐ │ +│ │ Bot Config │ │ +│ │ - rooms[] │ │ +│ │ - settings │ │ +│ └──────────────┘ │ +│ │ +│ stores ↓ │ +│ │ +│ ┌──────────────┐ │ +│ │ Bot Database │ │ (optional, plugin-specific) +│ │ (SQLite) │ │ +│ └──────────────┘ │ +└─────────────────────┘ + │ sends notifications + ↓ +┌─────────────────────┐ +│ Matrix Admin Room │ (shared with platform) +└─────────────────────┘ +``` + +--- + +## Configuration File Structures + +### Maubot Service Config + +**File**: `/var/lib/maubot/config/config.yaml` + +**Structure**: +```yaml +database: "sqlite:///var/lib/maubot/bot.db" + +server: + hostname: 0.0.0.0 + port: 29316 + +admins: + admin: # Replaced at runtime + +homeservers: + clarun.xyz: + url: http://127.0.0.1:8008 + secret: # Optional, for auto-registration + +logging: + level: INFO + handlers: + - type: journal # Log to systemd journal + +api_features: + login: true + plugin: true + plugin_upload: true + instance: true + instance_database: true + log: true +``` + +**Generation**: +1. Maubot example config generated via `maubot -c config.yaml -e` +2. Python script merges NixOS module overrides +3. Secrets injected from `$CREDENTIALS_DIRECTORY` (systemd LoadCredential) +4. Final config written to `/var/lib/maubot/config/config.yaml` + +--- + +### Bot Instance Config + +**Stored in**: Maubot SQLite DB (not file-based) + +**Access methods**: +1. Maubot web UI (http://localhost:29316/_matrix/maubot) +2. Direct database edit (advanced, not recommended) +3. File-based config edit + restart (for room subscriptions per FR-010) + +**Example config** (Instagram bot): +```json +{ + "enabled": true, + "max_file_size": 50000000, + "room_subscriptions": [ + "!abc123def:clarun.xyz", + "!xyz789ghi:clarun.xyz" + ], + "rate_limiting": { + "enabled": true, + "max_requests_per_minute": 10, + "backoff_seconds": 30 + }, + "error_notification_level": "ERROR" +} +``` + +--- + +### Plugin Metadata + +**File**: `maubot.yaml` (inside .mbp archive) + +**Structure**: +```yaml +id: sna.instagram +version: 1.0.0 +license: MIT +modules: + - instagram_bot +main_class: InstagramBot + +database: false # Plugin doesn't use dedicated DB + +config: true # Plugin accepts configuration +config_schema: + type: object + properties: + enabled: + type: boolean + default: true + max_file_size: + type: integer + default: 50000000 + room_subscriptions: + type: array + items: + type: string + pattern: "^!.+:.+$" + +dependencies: + - yt-dlp>=2023.1.6 + - aiohttp + - pillow +``` + +--- + +## State Persistence + +### Service State + +**Location**: `/var/lib/maubot/bot.db` (SQLite) + +**Tables**: +- `instance` - Bot instance metadata +- `plugin` - Uploaded plugin metadata +- `client` - Matrix client credentials (access tokens) +- `log` - Recent bot activity logs + +**Backup strategy**: +- Included in `/var/lib/maubot/` directory backup +- Rollback via NixOS generations (service config) +- Database can be wiped and rebuilt from scratch (bot re-registration required) + +### Runtime State + +**Location**: Memory (maubot service process) + +**Contents**: +- Active bot instances (Python objects) +- Matrix client connections (aiohttp sessions) +- Event handlers (registered callbacks) +- Plugin instances (loaded Python classes) + +**Recovery**: +- Automatic on maubot service restart +- Bot instances reconnect to Matrix +- Plugin state reloaded from DB (if applicable) + +--- + +## Security Model + +### Secrets Hierarchy + +1. **Service-level secrets** (sops-nix encrypted): + - `maubot-admin-password` - Management UI login + - `maubot-secret-key` - Session signing + - `matrix-registration-token` - Bot user creation (reused from Matrix homeserver) + +2. **Bot-level secrets** (stored in maubot DB): + - Matrix access tokens (per bot instance) + - Matrix device IDs + - Plugin-specific credentials (if any) + +3. **Runtime secrets** (ephemeral): + - Active session tokens (management UI) + - Matrix sync tokens (E2EE keys if enabled) + +### Permissions + +**File permissions**: +``` +/var/lib/maubot/ → drwxr-x--- maubot:maubot +/var/lib/maubot/config/ → drwx------ maubot:maubot +/var/lib/maubot/config/config.yaml → -rw------- maubot:maubot (contains secrets) +/var/lib/maubot/bot.db → -rw-r----- maubot:maubot +/var/lib/maubot/plugins/ → drwxr-xr-x maubot:maubot +/run/secrets/maubot-* → -r-------- maubot:maubot (0400) +``` + +**Network access**: +- Management interface: localhost:29316 only (SSH tunnel required for remote access per spec) +- Matrix homeserver: localhost:8008 (IPv4, conduwuit compatibility) +- No external network access (except Matrix federation via homeserver) + +--- + +## Operational Entities + +### Health Check State + +**Attributes**: +- `last_check_timestamp`: datetime +- `service_status`: enum (`healthy`, `degraded`, `failed`) +- `maubot_version_endpoint`: boolean - `/maubot/v1/version` accessible +- `active_instances_count`: integer +- `failed_instances`: array[string] - Instance IDs with errors +- `last_successful_message_timestamp`: datetime (per bot instance) + +**Storage**: Systemd timer state + systemd journal logs + +**Health indicators** (per spec SC-003): +- Service responds to HTTP health check (curl to version endpoint) +- Active instances count matches enabled instances count +- No ERROR/CRITICAL logs in last 5 minutes +- All enabled bots have recent Matrix sync activity (<10 minutes) + +--- + +## Data Flow Diagrams + +### Instagram URL Processing Flow + +``` +1. User posts Instagram URL in Matrix room + ↓ +2. Matrix homeserver distributes event to all clients + ↓ +3. Bot instance receives event (if subscribed to that room) + ↓ +4. Plugin regex matches Instagram URL pattern + ↓ +5. Plugin calls yt-dlp extraction (async thread pool) + ↓ +6. yt-dlp downloads media to temporary directory + ↓ +7. Plugin uploads media to Matrix homeserver + ↓ +8. Plugin sends Matrix message event with media attachment + ↓ +9. Cleanup temporary files + ↓ +10. Log extraction success/failure (severity-based notification if ERROR/CRITICAL) +``` + +### Bot Registration Flow + +``` +1. Admin accesses maubot web UI via SSH tunnel + ↓ +2. Create new bot client (provide Matrix user ID) + ↓ +3. Maubot attempts registration via conduwuit registration token + ↓ +4. If successful: Access token stored in maubot DB + ↓ +5. Create bot instance (select plugin, provide config) + ↓ +6. Bot connects to Matrix homeserver + ↓ +7. Bot joins configured rooms (from room_subscriptions) + ↓ +8. Bot starts listening for events +``` + +--- + +## Validation Rules Summary + +### Configuration Validation + +- All Matrix room IDs MUST match pattern `!.+:.+` +- Homeserver URL MUST be `http://127.0.0.1:PORT` (IPv4, not localhost) +- Admin password MUST meet minimum strength (length >=16, bcrypt cost >=12) +- Plugin IDs MUST be globally unique within maubot instance +- File paths MUST be absolute and within permitted directories + +### Runtime Validation + +- Bot instances CANNOT start without valid Matrix access token +- Room subscriptions MUST reference existing rooms (checked at runtime, logged if invalid) +- Plugin dependencies MUST be available in NixOS environment +- Rate limiting MUST be enforced before external API calls (Instagram) + +### Security Validation + +- Secrets MUST NEVER appear in logs or config files (placeholders only) +- Management interface MUST bind localhost only (0.0.0.0 for within-container, but not exposed externally) +- Database files MUST have restrictive permissions (0600 or 0640) +- ERROR/CRITICAL notifications MUST include sanitized context (no credentials in stack traces) + +--- + +## Migration Strategy + +### From ops-base to ops-jrz1 + +**Data migration**: Not required (fresh deployment) + +**Configuration migration**: +1. Extract maubot.nix module from ops-base +2. Adapt namespace: `services.matrix-vm.maubot` → `services.maubot` +3. Update homeserver URL: `continuwuity` → `conduwuit` +4. Remove registration_secrets (not supported by conduwuit) +5. Add registration token configuration + +**Plugin migration**: +1. Copy Instagram bot .mbp file from ops-base: `/home/dan/proj/sna/sna-instagram-bot.mbp` +2. Upload to ops-jrz1 maubot via web UI or API +3. Create bot instance with room subscriptions +4. Test content fetching in designated rooms + +**No database migration needed** (SQLite DB created fresh on ops-jrz1) + +--- + +## Capacity Planning + +### Single Instagram Bot Instance + +**Estimated resource usage**: +- Memory: ~100MB (maubot service + bot instance + yt-dlp subprocess) +- Disk: + - Maubot DB: <10MB (metadata only) + - Plugins: ~1MB per .mbp file + - Temporary files: Up to 50MB (during media download, auto-cleanup) +- CPU: Burst during media extraction (yt-dlp), idle otherwise +- Network: <1GB/day (assuming <20 Instagram fetches/day at ~50MB each) + +**Scale validation** (per SC-002): +- Maubot service supports 3+ concurrent instances without degradation +- Each additional bot: ~50MB memory, minimal CPU/network impact +- Shared resources: Maubot DB (SQLite supports concurrent reads), management UI + +--- + +**Status**: Data model complete. Ready for quickstart.md generation. diff --git a/specs/003-maubot-integration/research.md b/specs/003-maubot-integration/research.md new file mode 100644 index 0000000..e5278e0 --- /dev/null +++ b/specs/003-maubot-integration/research.md @@ -0,0 +1,527 @@ +# Research Findings: Maubot Integration + +**Feature**: 003-maubot-integration +**Date**: 2025-10-26 +**Status**: Phase 0 complete + +## Overview + +Research conducted to resolve technical unknowns for extracting maubot from ops-base and deploying to ops-jrz1 with Instagram bot functionality. + +--- + +## Decision 1: Maubot-Conduwuit Compatibility + +### Decision +**YES - Maubot is fully compatible with conduwuit** with registration method modifications + +### Rationale +- ops-base successfully runs maubot 0.5.2+ on continuwuity (conduwuit fork) at matrix.talu.uno +- Over 10 production maubot instances confirmed working with conduwuit +- Maubot uses standard Matrix Client-Server API (homeserver-agnostic) +- ops-jrz1 conduwuit (0.5.0-rc.8) supports all required Matrix APIs + +### Key Finding: Registration Method Differs +**ops-base pattern (continuwuity)**: +```nix +registration_secrets: + matrix.talu.uno: + url: http://127.0.0.1:6167 + secret: REPLACE_REGISTRATION_SECRET # Shared secret registration +``` + +**ops-jrz1 requirement (conduwuit)**: +- Conduwuit does NOT support `registration_shared_secret` like Synapse +- Must use **registration tokens** or **admin room commands** for bot user creation + +### Recommended Approach +**Registration Token Method** (simpler, more secure): +1. Configure conduwuit with registration token (from sops-nix) +2. During bot client creation in maubot web UI, provide registration token +3. Bot registers via standard Matrix client registration API + +**Alternative: Admin Room Commands**: +``` +!admin users create-user maubot-bot-1 +# Returns generated password +``` + +### Integration Pattern +- Remove `registration_secrets` section from maubot config +- Remove `registrationSecretFile` option from NixOS module +- Document registration token workflow in quickstart.md + +### Compatibility Notes +- **Database**: SQLite works (no changes needed) +- **Network**: Use IPv4 `127.0.0.1:8008` (not `localhost` - conduwuit binds IPv4 only) +- **Encryption**: maubot 0.5.2+ supports E2EE with conduwuit +- **Appservice**: Maubot bots are regular users, not appservice users (no appservice registration needed) + +### Known Issues (Resolved) +- maubot < 0.5.2 had bug causing excessive key uploads (fixed in 0.5.2+) +- Use latest stable maubot from nixpkgs + +### References +- ops-base maubot.nix:387 +- ops-base maubot-deployment-instructions.md +- ops-base conduwuit admin room discovery worklog + +--- + +## Decision 2: Instagram Content Fetching + +### Decision +**Use yt-dlp (primary) for Instagram content extraction** + +### Rationale +- ops-base Instagram bot uses yt-dlp >=2023.1.6 (available in nixpkgs) +- Proven working implementation at `/home/dan/proj/sna/instagram_bot.py` +- Packaged as `sna-instagram-bot.mbp` and deployed successfully +- Source bot had instaloader fallback, but instaloader not in nixpkgs (yt-dlp-only mode in production) + +### Implementation Pattern + +**Extraction Architecture**: +```python +class InstagramBot(Plugin): # Inherits from maubot.Plugin + + @event.on(EventType.ROOM_MESSAGE) + async def handle_message(self, event: MessageEvent): + # 1. Detect Instagram URLs via regex + # 2. Extract content with yt-dlp (async thread pool) + # 3. Upload media to Matrix homeserver + # 4. Send to room with metadata (caption, uploader, dimensions) +``` + +**Content Types Supported**: +- Posts (images) +- Reels (videos) +- IGTV (videos) +- Stories (if publicly accessible) + +**File Handling**: +- Temporary directory for downloads (auto-cleanup) +- Max file size: 50MB (configurable) +- Supported formats: mp4, jpg, jpeg, png, webp +- MIME type detection for proper Matrix msgtype + +**Metadata Extraction**: +- Title, description, uploader +- Dimensions (width x height) +- Duration (for videos) +- Posted as separate text message after media + +### Rate Limiting Strategy + +**Current State**: No rate limiting implemented in ops-base bot + +**Risks**: +- Burst of URLs in high-traffic room could trigger Instagram rate limits +- No request tracking, queuing, or throttling +- Extraction failures logged but no retry logic + +**Recommendations for 003-maubot-integration**: +1. Add per-room request tracking +2. Implement exponential backoff on extraction failures +3. Queue URLs and process with delays (e.g., 5 seconds between requests) +4. Add configuration for max requests/minute +5. Monitor extraction failure rates as health indicator + +### Known Limitations + +1. **Instagram API changes**: yt-dlp requires updates when Instagram changes interface +2. **Private content**: Cannot access private posts/stories (public only) +3. **Rate limiting exposure**: Heavy usage may cause temporary failures +4. **No retry logic**: Failed extractions not queued for later attempt +5. **File size limits**: 50MB hard limit, Matrix homeserver may have separate limits +6. **No caching**: Frequently shared URLs re-extracted every time + +### Plugin Packaging + +**Format**: `.mbp` archive (zip file) + +**Structure**: +``` +sna-instagram-bot.mbp: + instagram_bot.py (11,643 bytes) + maubot.yaml (plugin metadata) + README.md (documentation) +``` + +**Metadata** (maubot.yaml): +```yaml +id: sna.instagram +version: 1.0.0 +main_class: InstagramBot +modules: [instagram_bot] +``` + +**Creation**: +```bash +cd /path/to/plugin +zip -r instagram-bot.mbp instagram_bot.py maubot.yaml README.md +``` + +**Deployment Methods**: +1. **API upload** (automated): + ```bash + curl -X POST \ + -H "Authorization: Bearer $TOKEN" \ + -F "file=@instagram-bot.mbp" \ + "http://localhost:29316/_matrix/maubot/v1/plugins/upload" + ``` + +2. **Web UI** (manual): Upload via http://localhost:29316/_matrix/maubot (SSH tunnel) + +### Source Files to Adapt +- Plugin source: `/home/dan/proj/sna/instagram_bot.py` +- Plugin package: `/home/dan/proj/sna/sna-instagram-bot.mbp` +- Deployment scripts: `/home/dan/proj/ops-base/scripts/*instagram-bot.sh` + +### Alternatives Considered + +**instaloader**: +- Rejected: Not available in nixpkgs +- ops-base bot had fallback support, but unused in production + +**Official Instagram API**: +- Rejected: Requires Facebook developer approval (per spec clarifications) +- Community scraping approach acceptable for internal team use + +--- + +## Decision 3: NixOS Module Adaptation Strategy + +### Decision +**Two-layer module pattern** matching mautrix-slack architecture + +### Rationale +- ops-jrz1 established pattern with mautrix-slack module +- Low-level module (`services.maubot`) provides full configuration surface +- High-level wrapper (`services.dev-platform.maubot`) simplifies common usage +- Consistent with existing infrastructure patterns + +### Source Pattern: ops-base maubot.nix + +**Module namespace**: `services.matrix-vm.maubot` + +**Key characteristics**: +- Runtime config generation with placeholder substitution +- systemd `LoadCredential` for secrets injection +- Python script in `ExecStartPre` replaces placeholders +- SQLite database at `/var/lib/maubot/bot.db` +- Timer-based health monitoring (5min check + 10min auto-restart) +- Config template at `/etc/maubot/config.yaml` → runtime config at `/run/maubot/config.yaml` + +**Secrets pattern**: +```nix +LoadCredential = [ + "admin-password:${cfg.adminPasswordFile}" + "secret-key:${cfg.secretKeyFile}" + "registration-secret:${cfg.registrationSecretFile}" # REMOVE for conduwuit +]; +``` + +### Target Pattern: ops-jrz1 Services + +**mautrix-slack.nix pattern**: +- Module namespace: `services.mautrix-slack` (low-level) +- Wrapper: `services.dev-platform.slackBridge` in `modules/dev-services.nix` +- Config: Example config generation + YAML merging via Python +- Database: PostgreSQL via unix socket +- Secrets: No LoadCredential (tokens from interactive login) +- State: `/var/lib/mautrix_slack/config/config.yaml` (within StateDirectory) + +**Adaptation decisions**: + +| Aspect | ops-base | ops-jrz1 Target | +|--------|----------|-----------------| +| **Namespace** | `services.matrix-vm.maubot` | `services.maubot` + `services.dev-platform.maubot` | +| **Config location** | `/run/maubot/config.yaml` | `/var/lib/maubot/config/config.yaml` | +| **Config approach** | Template substitution | Example config + YAML merge + secret substitution | +| **Secrets** | LoadCredential + Python replacement | LoadCredential + Python replacement (retain ops-base pattern) | +| **Database** | SQLite `/var/lib/maubot/bot.db` | SQLite (same path) | +| **Logs** | File + journal | Journal only (StandardOutput) | +| **State** | Manual StateDirectory + tmpfiles | `StateDirectory = "maubot"` (systemd managed) | +| **Health checks** | Timer-based (5min + 10min) | Retain ops-base pattern | +| **User/group** | `maubot:maubot` | `maubot:maubot` + `matrix-appservices` supplementary | + +### Configuration Generation Hybrid Approach + +**Recommendation**: Combine mautrix-slack example config pattern with ops-base secrets injection + +**Steps**: +1. Run `maubot -c config.yaml -e` to generate example config (ensures structure completeness) +2. Python script merges structured overrides (like mautrix-slack) +3. Write config with placeholders to StateDirectory +4. Second step reads from `CREDENTIALS_DIRECTORY` and replaces placeholders +5. Final config written with proper permissions (0600) + +**Why hybrid**: +- Example config ensures YAML structure stays valid across maubot versions +- LoadCredential provides better security than storing secrets in Nix store +- Proven pattern from both source (ops-base) and target (mautrix-slack) + +### Database Decision + +**Recommendation**: SQLite (match ops-base) + +**Rationale**: +- Maubot workload is lightweight (bot state, plugin configs) +- ops-base SQLite deployment proven stable +- Simpler backup/restore (single file) +- Isolation from shared PostgreSQL (Forgejo, mautrix-slack use it) +- Less complex dependency chain +- Adequate for small team usage (<10 bot instances) + +**Path**: `/var/lib/maubot/bot.db` + +**Future**: Support PostgreSQL via config option if scaling needs emerge + +### Secrets Management + +**Recommendation**: Retain ops-base LoadCredential pattern + +**Secrets required**: +```yaml +# In secrets/secrets.yaml (add) +maubot-admin-password: "..." # Admin UI login +maubot-secret-key: "..." # Session signing key +# matrix-registration-token: "..." # Already exists, reuse for bot user creation +``` + +**systemd configuration**: +```nix +LoadCredential = [ + "admin-password:/run/secrets/maubot-admin-password" + "secret-key:/run/secrets/maubot-secret-key" + "registration-token:/run/secrets/matrix-registration-token" # Reused +]; +``` + +**Substitution in ExecStartPre** (Python script): +```python +# Read from $CREDENTIALS_DIRECTORY +admin_pw = Path(os.environ['CREDENTIALS_DIRECTORY'], 'admin-password').read_text().strip() +# Replace placeholders in config +config = config.replace('REPLACE_ADMIN_PASSWORD', admin_pw) +``` + +**Why not mautrix-slack pattern**: +- mautrix-slack gets tokens via interactive login (no pre-provisioning needed) +- Maubot requires secrets before service starts (admin UI, signing key) +- LoadCredential keeps secrets out of Nix store and config files + +### Health Monitoring + +**Recommendation**: Retain ops-base timer-based pattern + +**Implementation**: +- `maubot-health.service` (oneshot): Curl to `http://localhost:29316/_matrix/maubot/v1/version` every 5 minutes +- `maubot-health-restart.service` (oneshot): Check for failed health checks, restart if needed (every 10 minutes) +- `systemd.timers` for scheduling + +**Why retain**: +- Maubot provides explicit health endpoint (unlike mautrix-slack) +- ops-base pattern proven reliable +- mautrix-slack has no health monitoring (only log-based Socket Mode checks) +- Valuable for production stability (auto-recovery) + +### Directory Structure + +**Target layout**: +``` +/var/lib/maubot/ + ├── config/ + │ └── config.yaml # Generated runtime config + ├── plugins/ # Plugin storage (.mbp files) + ├── trash/ # Deleted plugins + └── bot.db # SQLite database +``` + +**Changes from ops-base**: +- Config in StateDirectory (not `/run/maubot/`) +- Logs via journal (remove `/var/log/maubot/`) +- Use `StateDirectory = "maubot"` (systemd automatic management) + +### Security Hardening + +**Apply from mautrix-slack**: +- `StateDirectory = "maubot"` +- `StateDirectoryMode = "0750"` +- `PrivateTmp = true` +- `ProtectSystem = "strict"` +- `ReadWritePaths = [ cfg.dataDir ]` +- `MemoryMax = "512M"` (match ops-base) +- Standard systemd hardening flags + +**Remove from ops-base**: +- `RuntimeDirectory` (use StateDirectory) +- `LogsDirectory` (use journal) +- Manual tmpfiles rules + +### Integration Points + +**hosts/ops-jrz1.nix additions**: +```nix +sops.secrets.maubot-admin-password = { mode = "0400"; }; +sops.secrets.maubot-secret-key = { mode = "0400"; }; + +services.dev-platform.maubot = { + enable = true; + port = 29316; # Management interface +}; +``` + +**modules/dev-services.nix additions**: +```nix +services.dev-platform.maubot = { + enable = mkOption { type = types.bool; default = false; }; + port = mkOption { type = types.port; default = 29316; }; +}; + +config = mkIf cfg.maubot.enable { + services.maubot = { + enable = true; + homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}"; + serverName = cfg.matrix.serverName; + port = cfg.maubot.port; + # ... map other options + }; +}; +``` + +### Alternatives Considered + +**Pure mautrix-slack pattern**: +- Rejected: Would require removing LoadCredential and storing secrets in config +- Less secure (secrets in Nix store or config files) +- More code rewrite from proven ops-base pattern + +**Keep ops-base pattern exactly**: +- Rejected: Inconsistent with ops-jrz1 conventions +- Manual directory management instead of StateDirectory +- File-based logging instead of journal +- Less integration with dev-platform namespace + +--- + +## Technical Context Summary + +**Language/Version**: Python 3.11 (maubot runtime) +**Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite +**Storage**: SQLite at `/var/lib/maubot/bot.db` +**Testing**: Manual QA (automated tests future enhancement) +**Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49) +**Project Type**: Infrastructure service (NixOS module) +**Performance Goals**: <5 second Instagram content fetch (per SC-001), 99% uptime over 7 days (per SC-003) +**Constraints**: localhost-only management interface (SSH tunnel required), single Instagram bot instance initially +**Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002) + +--- + +## Platform Vision Alignment + +### Core Philosophy Adherence + +**Build It Right Over Time**: +- ✅ Extract proven maubot module from ops-base (avoid reinvention) +- ✅ Declarative NixOS module pattern +- ✅ Self-documenting via quickstart.md and inline comments +- ✅ Sustainable pattern (matches existing mautrix-slack infrastructure) + +**Presentable State First**: +- ✅ Working Instagram bot demonstrates value immediately +- ✅ Clear documentation (research.md, quickstart.md, contracts/) +- ✅ Professional deployment pattern (consistent with mautrix-slack) + +### Architecture Principles + +**Communication Layer**: +- ✅ Maubot extends Matrix functionality (bot framework) +- ✅ Instagram bot brings external content into Matrix (enriches communication) +- ✅ Aligns with Matrix-centric hub architecture + +**Deployment Philosophy**: +- ✅ NixOS-Native pattern (module + sops-nix secrets) +- ✅ Declarative and reproducible +- ✅ Built-in rollback (NixOS generations) +- ✅ Clear separation: infrastructure (maubot service) vs application (Instagram plugin) + +**Sustainability**: +- ✅ Small team focus (single bot instance initially, validate 3-instance capability) +- ✅ Quality over speed (comprehensive research before implementation) +- ✅ Proven patterns (extract from ops-base, not experimental) + +--- + +## Risk Assessment + +### Low Risk +- SQLite database (proven, simple) +- LoadCredential secrets (ops-base pattern working) +- Health monitoring (non-intrusive timers) +- StateDirectory approach (standard systemd) + +### Medium Risk +- conduwuit compatibility (ops-base uses continuwuity fork) + - **Mitigation**: Early testing of bot registration and Matrix connection +- Two-layer module pattern (new for maubot, proven with mautrix-slack) + - **Mitigation**: Follow exact mautrix-slack pattern +- Instagram scraping stability (yt-dlp depends on Instagram not changing) + - **Mitigation**: yt-dlp actively maintained, ops-base deployment proven + +### Requires Testing +- Registration token workflow with conduwuit (different from ops-base shared secret) +- Management interface localhost binding (security requirement) +- Instagram content fetching with current yt-dlp version +- Bot response in designated rooms only (room-based activation per FR-006) +- Auto-recovery after homeserver restart (SC-004) + +--- + +## Next Steps + +### Phase 1: Design & Contracts +1. Generate data-model.md with entities: + - Maubot Service, Bot Instance, Plugin, Bot Configuration, Admin Notification, Bot Database +2. Generate contracts/ with configuration schemas (if applicable) +3. Generate quickstart.md with deployment runbook including: + - Registration token setup + - Bot creation workflow + - Room subscription configuration + - Admin room access procedure +4. Update AGENTS.md with maubot, yt-dlp context + +### Phase 2: Implementation Planning +1. Extract maubot.nix from ops-base to ops-jrz1 +2. Adapt namespace and configuration patterns +3. Add sops secrets declarations +4. Create dev-platform wrapper in dev-services.nix +5. Test service startup and conduwuit connection +6. Deploy Instagram plugin +7. Validate SC-001 through SC-008 + +--- + +## References + +### Source Files Analyzed +- `/home/dan/proj/ops-base/vm-configs/modules/maubot.nix` (387 lines) +- `/home/dan/proj/ops-base/vm-configs/modules/continuwuity.nix` (413 lines) +- `/home/dan/proj/ops-base/docs/maubot-deployment-instructions.md` +- `/home/dan/proj/ops-base/docs/continuwuit-appservice-registration-guide.md` +- `/home/dan/proj/ops-jrz1/modules/mautrix-slack.nix` (current) +- `/home/dan/proj/ops-jrz1/modules/dev-services.nix` (current) +- `/home/dan/proj/ops-jrz1/docs/platform-vision.md` (architecture principles) +- `/home/dan/proj/sna/instagram_bot.py` (11,643 bytes) +- `/home/dan/proj/sna/sna-instagram-bot.mbp` (packaged plugin) + +### External Documentation +- Maubot official docs: https://docs.mau.fi/maubot/ +- Conduwuit appservice guide: https://conduwuit.puppyirl.gay/appservices.html +- yt-dlp Instagram extractor: https://github.com/yt-dlp/yt-dlp + +--- + +**Status**: Research complete. All technical unknowns resolved. Ready for Phase 1 design.