18 KiB
Research Findings: Maubot Integration
Feature: 003-maubot-integration Date: 2025-10-26 Status: Phase 0 complete
Overview
Research conducted to resolve technical unknowns for extracting maubot from ops-base and deploying to ops-jrz1 with Instagram bot functionality.
Decision 1: Maubot-Conduwuit Compatibility
Decision
YES - Maubot is fully compatible with conduwuit with registration method modifications
Rationale
- ops-base successfully runs maubot 0.5.2+ on continuwuity (conduwuit fork) at matrix.talu.uno
- Over 10 production maubot instances confirmed working with conduwuit
- Maubot uses standard Matrix Client-Server API (homeserver-agnostic)
- ops-jrz1 conduwuit (0.5.0-rc.8) supports all required Matrix APIs
Key Finding: Registration Method Differs
ops-base pattern (continuwuity):
registration_secrets:
matrix.talu.uno:
url: http://127.0.0.1:6167
secret: REPLACE_REGISTRATION_SECRET # Shared secret registration
ops-jrz1 requirement (conduwuit):
- Conduwuit does NOT support
registration_shared_secretlike Synapse - Must use registration tokens or admin room commands for bot user creation
Recommended Approach
Registration Token Method (simpler, more secure):
- Configure conduwuit with registration token (from sops-nix)
- During bot client creation in maubot web UI, provide registration token
- Bot registers via standard Matrix client registration API
Alternative: Admin Room Commands:
!admin users create-user maubot-bot-1
# Returns generated password
Integration Pattern
- Remove
registration_secretssection from maubot config - Remove
registrationSecretFileoption from NixOS module - Document registration token workflow in quickstart.md
Compatibility Notes
- Database: SQLite works (no changes needed)
- Network: Use IPv4
127.0.0.1:8008(notlocalhost- conduwuit binds IPv4 only) - Encryption: maubot 0.5.2+ supports E2EE with conduwuit
- Appservice: Maubot bots are regular users, not appservice users (no appservice registration needed)
Known Issues (Resolved)
- maubot < 0.5.2 had bug causing excessive key uploads (fixed in 0.5.2+)
- Use latest stable maubot from nixpkgs
References
- ops-base maubot.nix:387
- ops-base maubot-deployment-instructions.md
- ops-base conduwuit admin room discovery worklog
Decision 2: Instagram Content Fetching
Decision
Use yt-dlp (primary) for Instagram content extraction
Rationale
- ops-base Instagram bot uses yt-dlp >=2023.1.6 (available in nixpkgs)
- Proven working implementation at
/home/dan/proj/sna/instagram_bot.py - Packaged as
sna-instagram-bot.mbpand deployed successfully - Source bot had instaloader fallback, but instaloader not in nixpkgs (yt-dlp-only mode in production)
Implementation Pattern
Extraction Architecture:
class InstagramBot(Plugin): # Inherits from maubot.Plugin
@event.on(EventType.ROOM_MESSAGE)
async def handle_message(self, event: MessageEvent):
# 1. Detect Instagram URLs via regex
# 2. Extract content with yt-dlp (async thread pool)
# 3. Upload media to Matrix homeserver
# 4. Send to room with metadata (caption, uploader, dimensions)
Content Types Supported:
- Posts (images)
- Reels (videos)
- IGTV (videos)
- Stories (if publicly accessible)
File Handling:
- Temporary directory for downloads (auto-cleanup)
- Max file size: 50MB (configurable)
- Supported formats: mp4, jpg, jpeg, png, webp
- MIME type detection for proper Matrix msgtype
Metadata Extraction:
- Title, description, uploader
- Dimensions (width x height)
- Duration (for videos)
- Posted as separate text message after media
Rate Limiting Strategy
Current State: No rate limiting implemented in ops-base bot
Risks:
- Burst of URLs in high-traffic room could trigger Instagram rate limits
- No request tracking, queuing, or throttling
- Extraction failures logged but no retry logic
Recommendations for 003-maubot-integration:
- Add per-room request tracking
- Implement exponential backoff on extraction failures
- Queue URLs and process with delays (e.g., 5 seconds between requests)
- Add configuration for max requests/minute
- Monitor extraction failure rates as health indicator
Known Limitations
- Instagram API changes: yt-dlp requires updates when Instagram changes interface
- Private content: Cannot access private posts/stories (public only)
- Rate limiting exposure: Heavy usage may cause temporary failures
- No retry logic: Failed extractions not queued for later attempt
- File size limits: 50MB hard limit, Matrix homeserver may have separate limits
- No caching: Frequently shared URLs re-extracted every time
Plugin Packaging
Format: .mbp archive (zip file)
Structure:
sna-instagram-bot.mbp:
instagram_bot.py (11,643 bytes)
maubot.yaml (plugin metadata)
README.md (documentation)
Metadata (maubot.yaml):
id: sna.instagram
version: 1.0.0
main_class: InstagramBot
modules: [instagram_bot]
Creation:
cd /path/to/plugin
zip -r instagram-bot.mbp instagram_bot.py maubot.yaml README.md
Deployment Methods:
-
API upload (automated):
curl -X POST \ -H "Authorization: Bearer $TOKEN" \ -F "file=@instagram-bot.mbp" \ "http://localhost:29316/_matrix/maubot/v1/plugins/upload" -
Web UI (manual): Upload via http://localhost:29316/_matrix/maubot (SSH tunnel)
Source Files to Adapt
- Plugin source:
/home/dan/proj/sna/instagram_bot.py - Plugin package:
/home/dan/proj/sna/sna-instagram-bot.mbp - Deployment scripts:
/home/dan/proj/ops-base/scripts/*instagram-bot.sh
Alternatives Considered
instaloader:
- Rejected: Not available in nixpkgs
- ops-base bot had fallback support, but unused in production
Official Instagram API:
- Rejected: Requires Facebook developer approval (per spec clarifications)
- Community scraping approach acceptable for internal team use
Decision 3: NixOS Module Adaptation Strategy
Decision
Two-layer module pattern matching mautrix-slack architecture
Rationale
- ops-jrz1 established pattern with mautrix-slack module
- Low-level module (
services.maubot) provides full configuration surface - High-level wrapper (
services.dev-platform.maubot) simplifies common usage - Consistent with existing infrastructure patterns
Source Pattern: ops-base maubot.nix
Module namespace: services.matrix-vm.maubot
Key characteristics:
- Runtime config generation with placeholder substitution
- systemd
LoadCredentialfor secrets injection - Python script in
ExecStartPrereplaces placeholders - SQLite database at
/var/lib/maubot/bot.db - Timer-based health monitoring (5min check + 10min auto-restart)
- Config template at
/etc/maubot/config.yaml→ runtime config at/run/maubot/config.yaml
Secrets pattern:
LoadCredential = [
"admin-password:${cfg.adminPasswordFile}"
"secret-key:${cfg.secretKeyFile}"
"registration-secret:${cfg.registrationSecretFile}" # REMOVE for conduwuit
];
Target Pattern: ops-jrz1 Services
mautrix-slack.nix pattern:
- Module namespace:
services.mautrix-slack(low-level) - Wrapper:
services.dev-platform.slackBridgeinmodules/dev-services.nix - Config: Example config generation + YAML merging via Python
- Database: PostgreSQL via unix socket
- Secrets: No LoadCredential (tokens from interactive login)
- State:
/var/lib/mautrix_slack/config/config.yaml(within StateDirectory)
Adaptation decisions:
| Aspect | ops-base | ops-jrz1 Target |
|---|---|---|
| Namespace | services.matrix-vm.maubot |
services.maubot + services.dev-platform.maubot |
| Config location | /run/maubot/config.yaml |
/var/lib/maubot/config/config.yaml |
| Config approach | Template substitution | Example config + YAML merge + secret substitution |
| Secrets | LoadCredential + Python replacement | LoadCredential + Python replacement (retain ops-base pattern) |
| Database | SQLite /var/lib/maubot/bot.db |
SQLite (same path) |
| Logs | File + journal | Journal only (StandardOutput) |
| State | Manual StateDirectory + tmpfiles | StateDirectory = "maubot" (systemd managed) |
| Health checks | Timer-based (5min + 10min) | Retain ops-base pattern |
| User/group | maubot:maubot |
maubot:maubot + matrix-appservices supplementary |
Configuration Generation Hybrid Approach
Recommendation: Combine mautrix-slack example config pattern with ops-base secrets injection
Steps:
- Run
maubot -c config.yaml -eto generate example config (ensures structure completeness) - Python script merges structured overrides (like mautrix-slack)
- Write config with placeholders to StateDirectory
- Second step reads from
CREDENTIALS_DIRECTORYand replaces placeholders - Final config written with proper permissions (0600)
Why hybrid:
- Example config ensures YAML structure stays valid across maubot versions
- LoadCredential provides better security than storing secrets in Nix store
- Proven pattern from both source (ops-base) and target (mautrix-slack)
Database Decision
Recommendation: SQLite (match ops-base)
Rationale:
- Maubot workload is lightweight (bot state, plugin configs)
- ops-base SQLite deployment proven stable
- Simpler backup/restore (single file)
- Isolation from shared PostgreSQL (Forgejo, mautrix-slack use it)
- Less complex dependency chain
- Adequate for small team usage (<10 bot instances)
Path: /var/lib/maubot/bot.db
Future: Support PostgreSQL via config option if scaling needs emerge
Secrets Management
Recommendation: Retain ops-base LoadCredential pattern
Secrets required:
# In secrets/secrets.yaml (add)
maubot-admin-password: "..." # Admin UI login
maubot-secret-key: "..." # Session signing key
# matrix-registration-token: "..." # Already exists, reuse for bot user creation
systemd configuration:
LoadCredential = [
"admin-password:/run/secrets/maubot-admin-password"
"secret-key:/run/secrets/maubot-secret-key"
"registration-token:/run/secrets/matrix-registration-token" # Reused
];
Substitution in ExecStartPre (Python script):
# Read from $CREDENTIALS_DIRECTORY
admin_pw = Path(os.environ['CREDENTIALS_DIRECTORY'], 'admin-password').read_text().strip()
# Replace placeholders in config
config = config.replace('REPLACE_ADMIN_PASSWORD', admin_pw)
Why not mautrix-slack pattern:
- mautrix-slack gets tokens via interactive login (no pre-provisioning needed)
- Maubot requires secrets before service starts (admin UI, signing key)
- LoadCredential keeps secrets out of Nix store and config files
Health Monitoring
Recommendation: Retain ops-base timer-based pattern
Implementation:
maubot-health.service(oneshot): Curl tohttp://localhost:29316/_matrix/maubot/v1/versionevery 5 minutesmaubot-health-restart.service(oneshot): Check for failed health checks, restart if needed (every 10 minutes)systemd.timersfor scheduling
Why retain:
- Maubot provides explicit health endpoint (unlike mautrix-slack)
- ops-base pattern proven reliable
- mautrix-slack has no health monitoring (only log-based Socket Mode checks)
- Valuable for production stability (auto-recovery)
Directory Structure
Target layout:
/var/lib/maubot/
├── config/
│ └── config.yaml # Generated runtime config
├── plugins/ # Plugin storage (.mbp files)
├── trash/ # Deleted plugins
└── bot.db # SQLite database
Changes from ops-base:
- Config in StateDirectory (not
/run/maubot/) - Logs via journal (remove
/var/log/maubot/) - Use
StateDirectory = "maubot"(systemd automatic management)
Security Hardening
Apply from mautrix-slack:
StateDirectory = "maubot"StateDirectoryMode = "0750"PrivateTmp = trueProtectSystem = "strict"ReadWritePaths = [ cfg.dataDir ]MemoryMax = "512M"(match ops-base)- Standard systemd hardening flags
Remove from ops-base:
RuntimeDirectory(use StateDirectory)LogsDirectory(use journal)- Manual tmpfiles rules
Integration Points
hosts/ops-jrz1.nix additions:
sops.secrets.maubot-admin-password = { mode = "0400"; };
sops.secrets.maubot-secret-key = { mode = "0400"; };
services.dev-platform.maubot = {
enable = true;
port = 29316; # Management interface
};
modules/dev-services.nix additions:
services.dev-platform.maubot = {
enable = mkOption { type = types.bool; default = false; };
port = mkOption { type = types.port; default = 29316; };
};
config = mkIf cfg.maubot.enable {
services.maubot = {
enable = true;
homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
serverName = cfg.matrix.serverName;
port = cfg.maubot.port;
# ... map other options
};
};
Alternatives Considered
Pure mautrix-slack pattern:
- Rejected: Would require removing LoadCredential and storing secrets in config
- Less secure (secrets in Nix store or config files)
- More code rewrite from proven ops-base pattern
Keep ops-base pattern exactly:
- Rejected: Inconsistent with ops-jrz1 conventions
- Manual directory management instead of StateDirectory
- File-based logging instead of journal
- Less integration with dev-platform namespace
Technical Context Summary
Language/Version: Python 3.11 (maubot runtime)
Primary Dependencies: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite
Storage: SQLite at /var/lib/maubot/bot.db
Testing: Manual QA (automated tests future enhancement)
Target Platform: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49)
Project Type: Infrastructure service (NixOS module)
Performance Goals: <5 second Instagram content fetch (per SC-001), 99% uptime over 7 days (per SC-003)
Constraints: localhost-only management interface (SSH tunnel required), single Instagram bot instance initially
Scale/Scope: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002)
Platform Vision Alignment
Core Philosophy Adherence
Build It Right Over Time:
- ✅ Extract proven maubot module from ops-base (avoid reinvention)
- ✅ Declarative NixOS module pattern
- ✅ Self-documenting via quickstart.md and inline comments
- ✅ Sustainable pattern (matches existing mautrix-slack infrastructure)
Presentable State First:
- ✅ Working Instagram bot demonstrates value immediately
- ✅ Clear documentation (research.md, quickstart.md, contracts/)
- ✅ Professional deployment pattern (consistent with mautrix-slack)
Architecture Principles
Communication Layer:
- ✅ Maubot extends Matrix functionality (bot framework)
- ✅ Instagram bot brings external content into Matrix (enriches communication)
- ✅ Aligns with Matrix-centric hub architecture
Deployment Philosophy:
- ✅ NixOS-Native pattern (module + sops-nix secrets)
- ✅ Declarative and reproducible
- ✅ Built-in rollback (NixOS generations)
- ✅ Clear separation: infrastructure (maubot service) vs application (Instagram plugin)
Sustainability:
- ✅ Small team focus (single bot instance initially, validate 3-instance capability)
- ✅ Quality over speed (comprehensive research before implementation)
- ✅ Proven patterns (extract from ops-base, not experimental)
Risk Assessment
Low Risk
- SQLite database (proven, simple)
- LoadCredential secrets (ops-base pattern working)
- Health monitoring (non-intrusive timers)
- StateDirectory approach (standard systemd)
Medium Risk
- conduwuit compatibility (ops-base uses continuwuity fork)
- Mitigation: Early testing of bot registration and Matrix connection
- Two-layer module pattern (new for maubot, proven with mautrix-slack)
- Mitigation: Follow exact mautrix-slack pattern
- Instagram scraping stability (yt-dlp depends on Instagram not changing)
- Mitigation: yt-dlp actively maintained, ops-base deployment proven
Requires Testing
- Registration token workflow with conduwuit (different from ops-base shared secret)
- Management interface localhost binding (security requirement)
- Instagram content fetching with current yt-dlp version
- Bot response in designated rooms only (room-based activation per FR-006)
- Auto-recovery after homeserver restart (SC-004)
Next Steps
Phase 1: Design & Contracts
- Generate data-model.md with entities:
- Maubot Service, Bot Instance, Plugin, Bot Configuration, Admin Notification, Bot Database
- Generate contracts/ with configuration schemas (if applicable)
- Generate quickstart.md with deployment runbook including:
- Registration token setup
- Bot creation workflow
- Room subscription configuration
- Admin room access procedure
- Update AGENTS.md with maubot, yt-dlp context
Phase 2: Implementation Planning
- Extract maubot.nix from ops-base to ops-jrz1
- Adapt namespace and configuration patterns
- Add sops secrets declarations
- Create dev-platform wrapper in dev-services.nix
- Test service startup and conduwuit connection
- Deploy Instagram plugin
- Validate SC-001 through SC-008
References
Source Files Analyzed
/home/dan/proj/ops-base/vm-configs/modules/maubot.nix(387 lines)/home/dan/proj/ops-base/vm-configs/modules/continuwuity.nix(413 lines)/home/dan/proj/ops-base/docs/maubot-deployment-instructions.md/home/dan/proj/ops-base/docs/continuwuit-appservice-registration-guide.md/home/dan/proj/ops-jrz1/modules/mautrix-slack.nix(current)/home/dan/proj/ops-jrz1/modules/dev-services.nix(current)/home/dan/proj/ops-jrz1/docs/platform-vision.md(architecture principles)/home/dan/proj/sna/instagram_bot.py(11,643 bytes)/home/dan/proj/sna/sna-instagram-bot.mbp(packaged plugin)
External Documentation
- Maubot official docs: https://docs.mau.fi/maubot/
- Conduwuit appservice guide: https://conduwuit.puppyirl.gay/appservices.html
- yt-dlp Instagram extractor: https://github.com/yt-dlp/yt-dlp
Status: Research complete. All technical unknowns resolved. Ready for Phase 1 design.