Add maubot spec artifacts (research, data-model, checklists)

This commit is contained in:
Dan 2025-12-08 16:31:10 -08:00
parent 8826d62bcc
commit acfee9fea9
3 changed files with 1194 additions and 0 deletions

View file

@ -0,0 +1,42 @@
# Specification Quality Checklist: Matrix Bot Framework (Maubot) Integration
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2025-10-26
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
**All clarification questions resolved** (2025-10-26):
1. ✅ Instagram authentication approach: Community scraping methods (instaloader, yt-dlp)
2. ✅ Management interface network exposure: Localhost only (SSH tunnel for remote access)
3. ✅ Bot instance quantity planning: Single Instagram bot instance initially
**Specification is complete and ready for planning phase.**
All checklist items pass validation. No blocking issues identified.

View file

@ -0,0 +1,625 @@
# Data Model: Maubot Integration
**Feature**: 003-maubot-integration
**Date**: 2025-10-26
**Status**: Phase 1 design
## Overview
This document defines the data structures, state machines, and relationships for the maubot integration feature. Since maubot is an infrastructure service (not an application with user-facing data), the focus is on service configuration, runtime state, and operational entities.
---
## Core Entities
### 1. Maubot Service
**Description**: The maubot framework service that manages bot instances and provides the web-based management interface.
**Attributes**:
- `homeserver_url`: string (URL) - Matrix homeserver endpoint (e.g., `http://127.0.0.1:8008`)
- `server_name`: string (domain) - Matrix server domain (e.g., `clarun.xyz`)
- `port`: integer - Management interface port (default: 29316)
- `database_uri`: string - SQLite database path (e.g., `sqlite:///var/lib/maubot/bot.db`)
- `admin_username`: string - Admin UI login username
- `admin_password_hash`: string (secret) - Hashed admin password
- `secret_key`: string (secret) - Session signing key
- `config_path`: string (path) - Runtime config location (`/var/lib/maubot/config/config.yaml`)
**Relationships**:
- Has many: Bot Instances (1:N)
- Has many: Plugins (1:N)
- Connects to: Matrix Homeserver (1:1)
**State Machine**: N/A (service-level, managed by systemd)
**Validation Rules**:
- `homeserver_url` MUST be IPv4 `127.0.0.1:PORT` (not localhost - conduwuit compatibility)
- `port` MUST NOT conflict with existing services (check: 8008 Matrix, 29319 Slack bridge, 3000 Forgejo)
- `admin_password_hash` MUST be bcrypt with cost >=12
- `secret_key` MUST be >=32 bytes random
**Storage**:
- NixOS module configuration: `/home/dan/proj/ops-jrz1/modules/maubot.nix`
- Runtime config: `/var/lib/maubot/config/config.yaml`
- Secrets: `/run/secrets/maubot-*` (sops-nix decrypted)
---
### 2. Bot Instance
**Description**: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment.
**Attributes**:
- `id`: string (slug) - Instance identifier (e.g., `instagram-bot-1`)
- `type`: string - Plugin ID (e.g., `sna.instagram`)
- `primary_user`: string (MXID) - Matrix user ID (e.g., `@instagram-bot:clarun.xyz`)
- `enabled`: boolean - Whether bot is active
- `config`: object (JSON) - Plugin-specific configuration
- For Instagram bot: `{"enabled": true, "max_file_size": 50000000, "room_subscriptions": ["!roomid1:clarun.xyz"]}`
- `access_token`: string (secret) - Matrix access token (ephemeral, stored in bot DB)
- `device_id`: string - Matrix device identifier
- `database_path`: string (optional) - Per-bot database if plugin requires (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`)
**Relationships**:
- Belongs to: Maubot Service (N:1)
- Uses: Plugin (N:1)
- Authenticated as: Matrix User (1:1)
- Subscribed to: Matrix Rooms (N:M via room_subscriptions config)
**State Machine**:
```
[created]
[configured] ─→ disabled
↓ ↓
[enabled] ←───────┘
[running] ←→ [stopped]
[failed] → [restarting]
```
**States**:
- `created`: Instance exists in maubot DB but not yet configured
- `configured`: Config provided, Matrix user created, not yet enabled
- `enabled`: Marked as active in config
- `running`: Bot process active, connected to Matrix, responding to events
- `stopped`: Manually stopped via management UI
- `failed`: Encountered error (logged to maubot service journal)
- `restarting`: Auto-recovery in progress
**Validation Rules**:
- `primary_user` MUST match pattern `@[a-z0-9-]+:clarun.xyz`
- `type` MUST reference an uploaded Plugin
- `config.room_subscriptions` MUST be array of valid Matrix room IDs (format: `!...clarun.xyz`)
- `enabled=true` requires `access_token` to be set (bot authenticated)
**Storage**:
- Instance metadata: Maubot SQLite DB (`/var/lib/maubot/bot.db` table: `instance`)
- Access tokens: Maubot SQLite DB (encrypted at rest)
- Plugin config: Maubot SQLite DB (JSON blob)
---
### 3. Plugin
**Description**: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies.
**Attributes**:
- `id`: string - Plugin identifier (e.g., `sna.instagram`)
- `version`: string (semver) - Plugin version (e.g., `1.0.0`)
- `main_class`: string - Python class name (e.g., `InstagramBot`)
- `modules`: array[string] - Python module list (e.g., `["instagram_bot"]`)
- `dependencies`: array[string] - Python package dependencies (e.g., `["yt-dlp>=2023.1.6", "aiohttp"]`)
- `database`: boolean - Whether plugin requires dedicated database
- `config_schema`: object (JSON Schema) - Plugin configuration validation schema
- `upload_path`: string (path) - Storage location (e.g., `/var/lib/maubot/plugins/sna.instagram-v1.0.0.mbp`)
**Relationships**:
- Belongs to: Maubot Service (N:1)
- Used by: Bot Instances (1:N)
**State Machine**:
```
[uploaded]
[validated] ─→ [rejected] (invalid metadata)
[loaded] ←→ [disabled]
[active] (used by >=1 running instance)
[trashed] → [deleted]
```
**Validation Rules**:
- `id` MUST match pattern `[a-z][a-z0-9._-]+`
- `version` MUST be valid semver
- `main_class` MUST exist in provided modules
- `.mbp` file MUST be valid zip containing `maubot.yaml` + Python files
- `dependencies` MUST be available in nixpkgs (e.g., yt-dlp is available, instaloader is not)
**Storage**:
- Active plugins: `/var/lib/maubot/plugins/`
- Trashed plugins: `/var/lib/maubot/trash/`
- Metadata: Maubot SQLite DB (table: `plugin`)
---
### 4. Bot Configuration
**Description**: Settings specific to bot instance including Matrix credentials, plugin settings, and room subscriptions.
**Attributes**:
- `instance_id`: string (foreign key) - References Bot Instance
- `room_subscriptions`: array[string] - List of Matrix room IDs where bot is active
- Example: `["!abc123:clarun.xyz", "!def456:clarun.xyz"]`
- `command_prefix`: string (optional) - Bot command trigger (e.g., `!instagram`, `!ig`)
- `enabled_features`: object - Feature flags for plugin
- For Instagram bot: `{"auto_fetch": true, "rate_limiting": true, "caching": false}`
- `rate_limit_config`: object - Rate limiting parameters
- Example: `{"max_requests_per_minute": 10, "burst_size": 3, "backoff_seconds": 30}`
- `error_notification_level`: string (enum) - Minimum severity for admin notifications
- Values: `DEBUG`, `INFO`, `WARN`, `ERROR`, `CRITICAL`
- Default: `ERROR` (per spec FR-013)
**Relationships**:
- Belongs to: Bot Instance (1:1)
- References: Matrix Rooms (N:M via room_subscriptions)
**Validation Rules**:
- `room_subscriptions` items MUST be valid Matrix room IDs
- `command_prefix` MUST NOT conflict with other bots (user responsibility)
- `error_notification_level` MUST be one of valid enum values
- `rate_limit_config.max_requests_per_minute` MUST be >0 and <=60
**Storage**:
- Stored in Bot Instance config JSON blob
- Editable via:
1. Maubot web UI (management interface)
2. Direct config file edit + bot restart (per FR-010)
---
### 5. Admin Notification
**Description**: ERROR and CRITICAL level bot notifications sent to Matrix homeserver admin room (shared with other platform notifications).
**Attributes**:
- `timestamp`: datetime (ISO 8601) - When notification was generated
- `source_instance`: string - Bot instance ID that triggered notification
- `severity`: string (enum) - Log level (`ERROR` or `CRITICAL`)
- `message`: string - Human-readable error description
- `context`: object (JSON) - Additional metadata
- `room_id`: string (optional) - Matrix room where error occurred
- `event_id`: string (optional) - Matrix event that triggered error
- `exception_type`: string (optional) - Python exception class
- `stack_trace`: string (optional) - Abbreviated stack trace (last 10 lines)
**Relationships**:
- Triggered by: Bot Instance (N:1)
- Sent to: Matrix Admin Room (N:1, shared room: defined in ops-jrz1 config)
**State Machine**: N/A (notifications are fire-and-forget events)
**Validation Rules**:
- `severity` MUST be `ERROR` or `CRITICAL` (DEBUG/INFO/WARN go to logs only per FR-013)
- `message` MUST be non-empty
- Matrix admin room MUST exist and bot MUST have send permission
**Storage**:
- Not persisted (real-time notification)
- Logged to systemd journal: `journalctl -u maubot.service`
- Visible in maubot management dashboard (recent notifications)
---
### 6. Bot Database
**Description**: Per-instance isolated SQLite database for plugin state and data persistence.
**Attributes**:
- `instance_id`: string (foreign key) - References Bot Instance
- `database_path`: string (path) - SQLite file location (e.g., `/var/lib/maubot/plugins/instagram-bot-1.db`)
- `schema_version`: integer - Plugin-defined schema version
- `size_bytes`: integer - Database file size
- `last_accessed`: datetime - Last read/write timestamp
**Relationships**:
- Belongs to: Bot Instance (1:1, optional - only if plugin requires DB)
- Managed by: Plugin code (plugin-defined schema)
**State Machine**:
```
[initialized] (schema created)
[active] (read/write operations)
[migrating] (schema upgrade in progress)
[active]
[archived] (bot deleted, DB preserved)
```
**Validation Rules**:
- `database_path` MUST be within `/var/lib/maubot/plugins/` directory
- Schema migrations MUST be handled by plugin code (not maubot framework)
- Database MUST be owned by `maubot` user/group
**Storage**:
- Location: `/var/lib/maubot/plugins/<instance-id>.db`
- Backup: Manual (part of `/var/lib/maubot/` directory backup)
---
## Relationships Diagram
```
┌─────────────────────┐
│ Matrix Homeserver │
│ (conduwuit) │
└──────────┬──────────┘
│ authenticates
┌──────────▼──────────┐
│ Maubot Service │
│ ┌──────────────┐ │
│ │ Admin UI │ │ ← admin login (sops-nix secrets)
│ │ :29316 │ │
│ └──────────────┘ │
│ │
│ manages ↓ │
│ │
│ ┌──────────────┐ │
│ │ Bot Instance │───┼──→ uses Plugin (.mbp)
│ │ (instagram) │ │
│ └───┬──────────┘ │
│ │ has config │
│ ↓ │
│ ┌──────────────┐ │
│ │ Bot Config │ │
│ │ - rooms[] │ │
│ │ - settings │ │
│ └──────────────┘ │
│ │
│ stores ↓ │
│ │
│ ┌──────────────┐ │
│ │ Bot Database │ │ (optional, plugin-specific)
│ │ (SQLite) │ │
│ └──────────────┘ │
└─────────────────────┘
│ sends notifications
┌─────────────────────┐
│ Matrix Admin Room │ (shared with platform)
└─────────────────────┘
```
---
## Configuration File Structures
### Maubot Service Config
**File**: `/var/lib/maubot/config/config.yaml`
**Structure**:
```yaml
database: "sqlite:///var/lib/maubot/bot.db"
server:
hostname: 0.0.0.0
port: 29316
admins:
admin: <INJECTED_FROM_CREDENTIALS_DIRECTORY> # Replaced at runtime
homeservers:
clarun.xyz:
url: http://127.0.0.1:8008
secret: <INJECTED_REGISTRATION_TOKEN> # Optional, for auto-registration
logging:
level: INFO
handlers:
- type: journal # Log to systemd journal
api_features:
login: true
plugin: true
plugin_upload: true
instance: true
instance_database: true
log: true
```
**Generation**:
1. Maubot example config generated via `maubot -c config.yaml -e`
2. Python script merges NixOS module overrides
3. Secrets injected from `$CREDENTIALS_DIRECTORY` (systemd LoadCredential)
4. Final config written to `/var/lib/maubot/config/config.yaml`
---
### Bot Instance Config
**Stored in**: Maubot SQLite DB (not file-based)
**Access methods**:
1. Maubot web UI (http://localhost:29316/_matrix/maubot)
2. Direct database edit (advanced, not recommended)
3. File-based config edit + restart (for room subscriptions per FR-010)
**Example config** (Instagram bot):
```json
{
"enabled": true,
"max_file_size": 50000000,
"room_subscriptions": [
"!abc123def:clarun.xyz",
"!xyz789ghi:clarun.xyz"
],
"rate_limiting": {
"enabled": true,
"max_requests_per_minute": 10,
"backoff_seconds": 30
},
"error_notification_level": "ERROR"
}
```
---
### Plugin Metadata
**File**: `maubot.yaml` (inside .mbp archive)
**Structure**:
```yaml
id: sna.instagram
version: 1.0.0
license: MIT
modules:
- instagram_bot
main_class: InstagramBot
database: false # Plugin doesn't use dedicated DB
config: true # Plugin accepts configuration
config_schema:
type: object
properties:
enabled:
type: boolean
default: true
max_file_size:
type: integer
default: 50000000
room_subscriptions:
type: array
items:
type: string
pattern: "^!.+:.+$"
dependencies:
- yt-dlp>=2023.1.6
- aiohttp
- pillow
```
---
## State Persistence
### Service State
**Location**: `/var/lib/maubot/bot.db` (SQLite)
**Tables**:
- `instance` - Bot instance metadata
- `plugin` - Uploaded plugin metadata
- `client` - Matrix client credentials (access tokens)
- `log` - Recent bot activity logs
**Backup strategy**:
- Included in `/var/lib/maubot/` directory backup
- Rollback via NixOS generations (service config)
- Database can be wiped and rebuilt from scratch (bot re-registration required)
### Runtime State
**Location**: Memory (maubot service process)
**Contents**:
- Active bot instances (Python objects)
- Matrix client connections (aiohttp sessions)
- Event handlers (registered callbacks)
- Plugin instances (loaded Python classes)
**Recovery**:
- Automatic on maubot service restart
- Bot instances reconnect to Matrix
- Plugin state reloaded from DB (if applicable)
---
## Security Model
### Secrets Hierarchy
1. **Service-level secrets** (sops-nix encrypted):
- `maubot-admin-password` - Management UI login
- `maubot-secret-key` - Session signing
- `matrix-registration-token` - Bot user creation (reused from Matrix homeserver)
2. **Bot-level secrets** (stored in maubot DB):
- Matrix access tokens (per bot instance)
- Matrix device IDs
- Plugin-specific credentials (if any)
3. **Runtime secrets** (ephemeral):
- Active session tokens (management UI)
- Matrix sync tokens (E2EE keys if enabled)
### Permissions
**File permissions**:
```
/var/lib/maubot/ → drwxr-x--- maubot:maubot
/var/lib/maubot/config/ → drwx------ maubot:maubot
/var/lib/maubot/config/config.yaml → -rw------- maubot:maubot (contains secrets)
/var/lib/maubot/bot.db → -rw-r----- maubot:maubot
/var/lib/maubot/plugins/ → drwxr-xr-x maubot:maubot
/run/secrets/maubot-* → -r-------- maubot:maubot (0400)
```
**Network access**:
- Management interface: localhost:29316 only (SSH tunnel required for remote access per spec)
- Matrix homeserver: localhost:8008 (IPv4, conduwuit compatibility)
- No external network access (except Matrix federation via homeserver)
---
## Operational Entities
### Health Check State
**Attributes**:
- `last_check_timestamp`: datetime
- `service_status`: enum (`healthy`, `degraded`, `failed`)
- `maubot_version_endpoint`: boolean - `/maubot/v1/version` accessible
- `active_instances_count`: integer
- `failed_instances`: array[string] - Instance IDs with errors
- `last_successful_message_timestamp`: datetime (per bot instance)
**Storage**: Systemd timer state + systemd journal logs
**Health indicators** (per spec SC-003):
- Service responds to HTTP health check (curl to version endpoint)
- Active instances count matches enabled instances count
- No ERROR/CRITICAL logs in last 5 minutes
- All enabled bots have recent Matrix sync activity (<10 minutes)
---
## Data Flow Diagrams
### Instagram URL Processing Flow
```
1. User posts Instagram URL in Matrix room
2. Matrix homeserver distributes event to all clients
3. Bot instance receives event (if subscribed to that room)
4. Plugin regex matches Instagram URL pattern
5. Plugin calls yt-dlp extraction (async thread pool)
6. yt-dlp downloads media to temporary directory
7. Plugin uploads media to Matrix homeserver
8. Plugin sends Matrix message event with media attachment
9. Cleanup temporary files
10. Log extraction success/failure (severity-based notification if ERROR/CRITICAL)
```
### Bot Registration Flow
```
1. Admin accesses maubot web UI via SSH tunnel
2. Create new bot client (provide Matrix user ID)
3. Maubot attempts registration via conduwuit registration token
4. If successful: Access token stored in maubot DB
5. Create bot instance (select plugin, provide config)
6. Bot connects to Matrix homeserver
7. Bot joins configured rooms (from room_subscriptions)
8. Bot starts listening for events
```
---
## Validation Rules Summary
### Configuration Validation
- All Matrix room IDs MUST match pattern `!.+:.+`
- Homeserver URL MUST be `http://127.0.0.1:PORT` (IPv4, not localhost)
- Admin password MUST meet minimum strength (length >=16, bcrypt cost >=12)
- Plugin IDs MUST be globally unique within maubot instance
- File paths MUST be absolute and within permitted directories
### Runtime Validation
- Bot instances CANNOT start without valid Matrix access token
- Room subscriptions MUST reference existing rooms (checked at runtime, logged if invalid)
- Plugin dependencies MUST be available in NixOS environment
- Rate limiting MUST be enforced before external API calls (Instagram)
### Security Validation
- Secrets MUST NEVER appear in logs or config files (placeholders only)
- Management interface MUST bind localhost only (0.0.0.0 for within-container, but not exposed externally)
- Database files MUST have restrictive permissions (0600 or 0640)
- ERROR/CRITICAL notifications MUST include sanitized context (no credentials in stack traces)
---
## Migration Strategy
### From ops-base to ops-jrz1
**Data migration**: Not required (fresh deployment)
**Configuration migration**:
1. Extract maubot.nix module from ops-base
2. Adapt namespace: `services.matrix-vm.maubot``services.maubot`
3. Update homeserver URL: `continuwuity``conduwuit`
4. Remove registration_secrets (not supported by conduwuit)
5. Add registration token configuration
**Plugin migration**:
1. Copy Instagram bot .mbp file from ops-base: `/home/dan/proj/sna/sna-instagram-bot.mbp`
2. Upload to ops-jrz1 maubot via web UI or API
3. Create bot instance with room subscriptions
4. Test content fetching in designated rooms
**No database migration needed** (SQLite DB created fresh on ops-jrz1)
---
## Capacity Planning
### Single Instagram Bot Instance
**Estimated resource usage**:
- Memory: ~100MB (maubot service + bot instance + yt-dlp subprocess)
- Disk:
- Maubot DB: <10MB (metadata only)
- Plugins: ~1MB per .mbp file
- Temporary files: Up to 50MB (during media download, auto-cleanup)
- CPU: Burst during media extraction (yt-dlp), idle otherwise
- Network: <1GB/day (assuming <20 Instagram fetches/day at ~50MB each)
**Scale validation** (per SC-002):
- Maubot service supports 3+ concurrent instances without degradation
- Each additional bot: ~50MB memory, minimal CPU/network impact
- Shared resources: Maubot DB (SQLite supports concurrent reads), management UI
---
**Status**: Data model complete. Ready for quickstart.md generation.

View file

@ -0,0 +1,527 @@
# Research Findings: Maubot Integration
**Feature**: 003-maubot-integration
**Date**: 2025-10-26
**Status**: Phase 0 complete
## Overview
Research conducted to resolve technical unknowns for extracting maubot from ops-base and deploying to ops-jrz1 with Instagram bot functionality.
---
## Decision 1: Maubot-Conduwuit Compatibility
### Decision
**YES - Maubot is fully compatible with conduwuit** with registration method modifications
### Rationale
- ops-base successfully runs maubot 0.5.2+ on continuwuity (conduwuit fork) at matrix.talu.uno
- Over 10 production maubot instances confirmed working with conduwuit
- Maubot uses standard Matrix Client-Server API (homeserver-agnostic)
- ops-jrz1 conduwuit (0.5.0-rc.8) supports all required Matrix APIs
### Key Finding: Registration Method Differs
**ops-base pattern (continuwuity)**:
```nix
registration_secrets:
matrix.talu.uno:
url: http://127.0.0.1:6167
secret: REPLACE_REGISTRATION_SECRET # Shared secret registration
```
**ops-jrz1 requirement (conduwuit)**:
- Conduwuit does NOT support `registration_shared_secret` like Synapse
- Must use **registration tokens** or **admin room commands** for bot user creation
### Recommended Approach
**Registration Token Method** (simpler, more secure):
1. Configure conduwuit with registration token (from sops-nix)
2. During bot client creation in maubot web UI, provide registration token
3. Bot registers via standard Matrix client registration API
**Alternative: Admin Room Commands**:
```
!admin users create-user maubot-bot-1
# Returns generated password
```
### Integration Pattern
- Remove `registration_secrets` section from maubot config
- Remove `registrationSecretFile` option from NixOS module
- Document registration token workflow in quickstart.md
### Compatibility Notes
- **Database**: SQLite works (no changes needed)
- **Network**: Use IPv4 `127.0.0.1:8008` (not `localhost` - conduwuit binds IPv4 only)
- **Encryption**: maubot 0.5.2+ supports E2EE with conduwuit
- **Appservice**: Maubot bots are regular users, not appservice users (no appservice registration needed)
### Known Issues (Resolved)
- maubot < 0.5.2 had bug causing excessive key uploads (fixed in 0.5.2+)
- Use latest stable maubot from nixpkgs
### References
- ops-base maubot.nix:387
- ops-base maubot-deployment-instructions.md
- ops-base conduwuit admin room discovery worklog
---
## Decision 2: Instagram Content Fetching
### Decision
**Use yt-dlp (primary) for Instagram content extraction**
### Rationale
- ops-base Instagram bot uses yt-dlp >=2023.1.6 (available in nixpkgs)
- Proven working implementation at `/home/dan/proj/sna/instagram_bot.py`
- Packaged as `sna-instagram-bot.mbp` and deployed successfully
- Source bot had instaloader fallback, but instaloader not in nixpkgs (yt-dlp-only mode in production)
### Implementation Pattern
**Extraction Architecture**:
```python
class InstagramBot(Plugin): # Inherits from maubot.Plugin
@event.on(EventType.ROOM_MESSAGE)
async def handle_message(self, event: MessageEvent):
# 1. Detect Instagram URLs via regex
# 2. Extract content with yt-dlp (async thread pool)
# 3. Upload media to Matrix homeserver
# 4. Send to room with metadata (caption, uploader, dimensions)
```
**Content Types Supported**:
- Posts (images)
- Reels (videos)
- IGTV (videos)
- Stories (if publicly accessible)
**File Handling**:
- Temporary directory for downloads (auto-cleanup)
- Max file size: 50MB (configurable)
- Supported formats: mp4, jpg, jpeg, png, webp
- MIME type detection for proper Matrix msgtype
**Metadata Extraction**:
- Title, description, uploader
- Dimensions (width x height)
- Duration (for videos)
- Posted as separate text message after media
### Rate Limiting Strategy
**Current State**: No rate limiting implemented in ops-base bot
**Risks**:
- Burst of URLs in high-traffic room could trigger Instagram rate limits
- No request tracking, queuing, or throttling
- Extraction failures logged but no retry logic
**Recommendations for 003-maubot-integration**:
1. Add per-room request tracking
2. Implement exponential backoff on extraction failures
3. Queue URLs and process with delays (e.g., 5 seconds between requests)
4. Add configuration for max requests/minute
5. Monitor extraction failure rates as health indicator
### Known Limitations
1. **Instagram API changes**: yt-dlp requires updates when Instagram changes interface
2. **Private content**: Cannot access private posts/stories (public only)
3. **Rate limiting exposure**: Heavy usage may cause temporary failures
4. **No retry logic**: Failed extractions not queued for later attempt
5. **File size limits**: 50MB hard limit, Matrix homeserver may have separate limits
6. **No caching**: Frequently shared URLs re-extracted every time
### Plugin Packaging
**Format**: `.mbp` archive (zip file)
**Structure**:
```
sna-instagram-bot.mbp:
instagram_bot.py (11,643 bytes)
maubot.yaml (plugin metadata)
README.md (documentation)
```
**Metadata** (maubot.yaml):
```yaml
id: sna.instagram
version: 1.0.0
main_class: InstagramBot
modules: [instagram_bot]
```
**Creation**:
```bash
cd /path/to/plugin
zip -r instagram-bot.mbp instagram_bot.py maubot.yaml README.md
```
**Deployment Methods**:
1. **API upload** (automated):
```bash
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-F "file=@instagram-bot.mbp" \
"http://localhost:29316/_matrix/maubot/v1/plugins/upload"
```
2. **Web UI** (manual): Upload via http://localhost:29316/_matrix/maubot (SSH tunnel)
### Source Files to Adapt
- Plugin source: `/home/dan/proj/sna/instagram_bot.py`
- Plugin package: `/home/dan/proj/sna/sna-instagram-bot.mbp`
- Deployment scripts: `/home/dan/proj/ops-base/scripts/*instagram-bot.sh`
### Alternatives Considered
**instaloader**:
- Rejected: Not available in nixpkgs
- ops-base bot had fallback support, but unused in production
**Official Instagram API**:
- Rejected: Requires Facebook developer approval (per spec clarifications)
- Community scraping approach acceptable for internal team use
---
## Decision 3: NixOS Module Adaptation Strategy
### Decision
**Two-layer module pattern** matching mautrix-slack architecture
### Rationale
- ops-jrz1 established pattern with mautrix-slack module
- Low-level module (`services.maubot`) provides full configuration surface
- High-level wrapper (`services.dev-platform.maubot`) simplifies common usage
- Consistent with existing infrastructure patterns
### Source Pattern: ops-base maubot.nix
**Module namespace**: `services.matrix-vm.maubot`
**Key characteristics**:
- Runtime config generation with placeholder substitution
- systemd `LoadCredential` for secrets injection
- Python script in `ExecStartPre` replaces placeholders
- SQLite database at `/var/lib/maubot/bot.db`
- Timer-based health monitoring (5min check + 10min auto-restart)
- Config template at `/etc/maubot/config.yaml` → runtime config at `/run/maubot/config.yaml`
**Secrets pattern**:
```nix
LoadCredential = [
"admin-password:${cfg.adminPasswordFile}"
"secret-key:${cfg.secretKeyFile}"
"registration-secret:${cfg.registrationSecretFile}" # REMOVE for conduwuit
];
```
### Target Pattern: ops-jrz1 Services
**mautrix-slack.nix pattern**:
- Module namespace: `services.mautrix-slack` (low-level)
- Wrapper: `services.dev-platform.slackBridge` in `modules/dev-services.nix`
- Config: Example config generation + YAML merging via Python
- Database: PostgreSQL via unix socket
- Secrets: No LoadCredential (tokens from interactive login)
- State: `/var/lib/mautrix_slack/config/config.yaml` (within StateDirectory)
**Adaptation decisions**:
| Aspect | ops-base | ops-jrz1 Target |
|--------|----------|-----------------|
| **Namespace** | `services.matrix-vm.maubot` | `services.maubot` + `services.dev-platform.maubot` |
| **Config location** | `/run/maubot/config.yaml` | `/var/lib/maubot/config/config.yaml` |
| **Config approach** | Template substitution | Example config + YAML merge + secret substitution |
| **Secrets** | LoadCredential + Python replacement | LoadCredential + Python replacement (retain ops-base pattern) |
| **Database** | SQLite `/var/lib/maubot/bot.db` | SQLite (same path) |
| **Logs** | File + journal | Journal only (StandardOutput) |
| **State** | Manual StateDirectory + tmpfiles | `StateDirectory = "maubot"` (systemd managed) |
| **Health checks** | Timer-based (5min + 10min) | Retain ops-base pattern |
| **User/group** | `maubot:maubot` | `maubot:maubot` + `matrix-appservices` supplementary |
### Configuration Generation Hybrid Approach
**Recommendation**: Combine mautrix-slack example config pattern with ops-base secrets injection
**Steps**:
1. Run `maubot -c config.yaml -e` to generate example config (ensures structure completeness)
2. Python script merges structured overrides (like mautrix-slack)
3. Write config with placeholders to StateDirectory
4. Second step reads from `CREDENTIALS_DIRECTORY` and replaces placeholders
5. Final config written with proper permissions (0600)
**Why hybrid**:
- Example config ensures YAML structure stays valid across maubot versions
- LoadCredential provides better security than storing secrets in Nix store
- Proven pattern from both source (ops-base) and target (mautrix-slack)
### Database Decision
**Recommendation**: SQLite (match ops-base)
**Rationale**:
- Maubot workload is lightweight (bot state, plugin configs)
- ops-base SQLite deployment proven stable
- Simpler backup/restore (single file)
- Isolation from shared PostgreSQL (Forgejo, mautrix-slack use it)
- Less complex dependency chain
- Adequate for small team usage (<10 bot instances)
**Path**: `/var/lib/maubot/bot.db`
**Future**: Support PostgreSQL via config option if scaling needs emerge
### Secrets Management
**Recommendation**: Retain ops-base LoadCredential pattern
**Secrets required**:
```yaml
# In secrets/secrets.yaml (add)
maubot-admin-password: "..." # Admin UI login
maubot-secret-key: "..." # Session signing key
# matrix-registration-token: "..." # Already exists, reuse for bot user creation
```
**systemd configuration**:
```nix
LoadCredential = [
"admin-password:/run/secrets/maubot-admin-password"
"secret-key:/run/secrets/maubot-secret-key"
"registration-token:/run/secrets/matrix-registration-token" # Reused
];
```
**Substitution in ExecStartPre** (Python script):
```python
# Read from $CREDENTIALS_DIRECTORY
admin_pw = Path(os.environ['CREDENTIALS_DIRECTORY'], 'admin-password').read_text().strip()
# Replace placeholders in config
config = config.replace('REPLACE_ADMIN_PASSWORD', admin_pw)
```
**Why not mautrix-slack pattern**:
- mautrix-slack gets tokens via interactive login (no pre-provisioning needed)
- Maubot requires secrets before service starts (admin UI, signing key)
- LoadCredential keeps secrets out of Nix store and config files
### Health Monitoring
**Recommendation**: Retain ops-base timer-based pattern
**Implementation**:
- `maubot-health.service` (oneshot): Curl to `http://localhost:29316/_matrix/maubot/v1/version` every 5 minutes
- `maubot-health-restart.service` (oneshot): Check for failed health checks, restart if needed (every 10 minutes)
- `systemd.timers` for scheduling
**Why retain**:
- Maubot provides explicit health endpoint (unlike mautrix-slack)
- ops-base pattern proven reliable
- mautrix-slack has no health monitoring (only log-based Socket Mode checks)
- Valuable for production stability (auto-recovery)
### Directory Structure
**Target layout**:
```
/var/lib/maubot/
├── config/
│ └── config.yaml # Generated runtime config
├── plugins/ # Plugin storage (.mbp files)
├── trash/ # Deleted plugins
└── bot.db # SQLite database
```
**Changes from ops-base**:
- Config in StateDirectory (not `/run/maubot/`)
- Logs via journal (remove `/var/log/maubot/`)
- Use `StateDirectory = "maubot"` (systemd automatic management)
### Security Hardening
**Apply from mautrix-slack**:
- `StateDirectory = "maubot"`
- `StateDirectoryMode = "0750"`
- `PrivateTmp = true`
- `ProtectSystem = "strict"`
- `ReadWritePaths = [ cfg.dataDir ]`
- `MemoryMax = "512M"` (match ops-base)
- Standard systemd hardening flags
**Remove from ops-base**:
- `RuntimeDirectory` (use StateDirectory)
- `LogsDirectory` (use journal)
- Manual tmpfiles rules
### Integration Points
**hosts/ops-jrz1.nix additions**:
```nix
sops.secrets.maubot-admin-password = { mode = "0400"; };
sops.secrets.maubot-secret-key = { mode = "0400"; };
services.dev-platform.maubot = {
enable = true;
port = 29316; # Management interface
};
```
**modules/dev-services.nix additions**:
```nix
services.dev-platform.maubot = {
enable = mkOption { type = types.bool; default = false; };
port = mkOption { type = types.port; default = 29316; };
};
config = mkIf cfg.maubot.enable {
services.maubot = {
enable = true;
homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
serverName = cfg.matrix.serverName;
port = cfg.maubot.port;
# ... map other options
};
};
```
### Alternatives Considered
**Pure mautrix-slack pattern**:
- Rejected: Would require removing LoadCredential and storing secrets in config
- Less secure (secrets in Nix store or config files)
- More code rewrite from proven ops-base pattern
**Keep ops-base pattern exactly**:
- Rejected: Inconsistent with ops-jrz1 conventions
- Manual directory management instead of StateDirectory
- File-based logging instead of journal
- Less integration with dev-platform namespace
---
## Technical Context Summary
**Language/Version**: Python 3.11 (maubot runtime)
**Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite
**Storage**: SQLite at `/var/lib/maubot/bot.db`
**Testing**: Manual QA (automated tests future enhancement)
**Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49)
**Project Type**: Infrastructure service (NixOS module)
**Performance Goals**: <5 second Instagram content fetch (per SC-001), 99% uptime over 7 days (per SC-003)
**Constraints**: localhost-only management interface (SSH tunnel required), single Instagram bot instance initially
**Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002)
---
## Platform Vision Alignment
### Core Philosophy Adherence
**Build It Right Over Time**:
- ✅ Extract proven maubot module from ops-base (avoid reinvention)
- ✅ Declarative NixOS module pattern
- ✅ Self-documenting via quickstart.md and inline comments
- ✅ Sustainable pattern (matches existing mautrix-slack infrastructure)
**Presentable State First**:
- ✅ Working Instagram bot demonstrates value immediately
- ✅ Clear documentation (research.md, quickstart.md, contracts/)
- ✅ Professional deployment pattern (consistent with mautrix-slack)
### Architecture Principles
**Communication Layer**:
- ✅ Maubot extends Matrix functionality (bot framework)
- ✅ Instagram bot brings external content into Matrix (enriches communication)
- ✅ Aligns with Matrix-centric hub architecture
**Deployment Philosophy**:
- ✅ NixOS-Native pattern (module + sops-nix secrets)
- ✅ Declarative and reproducible
- ✅ Built-in rollback (NixOS generations)
- ✅ Clear separation: infrastructure (maubot service) vs application (Instagram plugin)
**Sustainability**:
- ✅ Small team focus (single bot instance initially, validate 3-instance capability)
- ✅ Quality over speed (comprehensive research before implementation)
- ✅ Proven patterns (extract from ops-base, not experimental)
---
## Risk Assessment
### Low Risk
- SQLite database (proven, simple)
- LoadCredential secrets (ops-base pattern working)
- Health monitoring (non-intrusive timers)
- StateDirectory approach (standard systemd)
### Medium Risk
- conduwuit compatibility (ops-base uses continuwuity fork)
- **Mitigation**: Early testing of bot registration and Matrix connection
- Two-layer module pattern (new for maubot, proven with mautrix-slack)
- **Mitigation**: Follow exact mautrix-slack pattern
- Instagram scraping stability (yt-dlp depends on Instagram not changing)
- **Mitigation**: yt-dlp actively maintained, ops-base deployment proven
### Requires Testing
- Registration token workflow with conduwuit (different from ops-base shared secret)
- Management interface localhost binding (security requirement)
- Instagram content fetching with current yt-dlp version
- Bot response in designated rooms only (room-based activation per FR-006)
- Auto-recovery after homeserver restart (SC-004)
---
## Next Steps
### Phase 1: Design & Contracts
1. Generate data-model.md with entities:
- Maubot Service, Bot Instance, Plugin, Bot Configuration, Admin Notification, Bot Database
2. Generate contracts/ with configuration schemas (if applicable)
3. Generate quickstart.md with deployment runbook including:
- Registration token setup
- Bot creation workflow
- Room subscription configuration
- Admin room access procedure
4. Update AGENTS.md with maubot, yt-dlp context
### Phase 2: Implementation Planning
1. Extract maubot.nix from ops-base to ops-jrz1
2. Adapt namespace and configuration patterns
3. Add sops secrets declarations
4. Create dev-platform wrapper in dev-services.nix
5. Test service startup and conduwuit connection
6. Deploy Instagram plugin
7. Validate SC-001 through SC-008
---
## References
### Source Files Analyzed
- `/home/dan/proj/ops-base/vm-configs/modules/maubot.nix` (387 lines)
- `/home/dan/proj/ops-base/vm-configs/modules/continuwuity.nix` (413 lines)
- `/home/dan/proj/ops-base/docs/maubot-deployment-instructions.md`
- `/home/dan/proj/ops-base/docs/continuwuit-appservice-registration-guide.md`
- `/home/dan/proj/ops-jrz1/modules/mautrix-slack.nix` (current)
- `/home/dan/proj/ops-jrz1/modules/dev-services.nix` (current)
- `/home/dan/proj/ops-jrz1/docs/platform-vision.md` (architecture principles)
- `/home/dan/proj/sna/instagram_bot.py` (11,643 bytes)
- `/home/dan/proj/sna/sna-instagram-bot.mbp` (packaged plugin)
### External Documentation
- Maubot official docs: https://docs.mau.fi/maubot/
- Conduwuit appservice guide: https://conduwuit.puppyirl.gay/appservices.html
- yt-dlp Instagram extractor: https://github.com/yt-dlp/yt-dlp
---
**Status**: Research complete. All technical unknowns resolved. Ready for Phase 1 design.