ops-jrz1/specs/003-maubot-integration/research.md

528 lines
18 KiB
Markdown

# Research Findings: Maubot Integration
**Feature**: 003-maubot-integration
**Date**: 2025-10-26
**Status**: Phase 0 complete
## Overview
Research conducted to resolve technical unknowns for extracting maubot from ops-base and deploying to ops-jrz1 with Instagram bot functionality.
---
## Decision 1: Maubot-Conduwuit Compatibility
### Decision
**YES - Maubot is fully compatible with conduwuit** with registration method modifications
### Rationale
- ops-base successfully runs maubot 0.5.2+ on continuwuity (conduwuit fork) at matrix.talu.uno
- Over 10 production maubot instances confirmed working with conduwuit
- Maubot uses standard Matrix Client-Server API (homeserver-agnostic)
- ops-jrz1 conduwuit (0.5.0-rc.8) supports all required Matrix APIs
### Key Finding: Registration Method Differs
**ops-base pattern (continuwuity)**:
```nix
registration_secrets:
matrix.talu.uno:
url: http://127.0.0.1:6167
secret: REPLACE_REGISTRATION_SECRET # Shared secret registration
```
**ops-jrz1 requirement (conduwuit)**:
- Conduwuit does NOT support `registration_shared_secret` like Synapse
- Must use **registration tokens** or **admin room commands** for bot user creation
### Recommended Approach
**Registration Token Method** (simpler, more secure):
1. Configure conduwuit with registration token (from sops-nix)
2. During bot client creation in maubot web UI, provide registration token
3. Bot registers via standard Matrix client registration API
**Alternative: Admin Room Commands**:
```
!admin users create-user maubot-bot-1
# Returns generated password
```
### Integration Pattern
- Remove `registration_secrets` section from maubot config
- Remove `registrationSecretFile` option from NixOS module
- Document registration token workflow in quickstart.md
### Compatibility Notes
- **Database**: SQLite works (no changes needed)
- **Network**: Use IPv4 `127.0.0.1:8008` (not `localhost` - conduwuit binds IPv4 only)
- **Encryption**: maubot 0.5.2+ supports E2EE with conduwuit
- **Appservice**: Maubot bots are regular users, not appservice users (no appservice registration needed)
### Known Issues (Resolved)
- maubot < 0.5.2 had bug causing excessive key uploads (fixed in 0.5.2+)
- Use latest stable maubot from nixpkgs
### References
- ops-base maubot.nix:387
- ops-base maubot-deployment-instructions.md
- ops-base conduwuit admin room discovery worklog
---
## Decision 2: Instagram Content Fetching
### Decision
**Use yt-dlp (primary) for Instagram content extraction**
### Rationale
- ops-base Instagram bot uses yt-dlp >=2023.1.6 (available in nixpkgs)
- Proven working implementation at `/home/dan/proj/sna/instagram_bot.py`
- Packaged as `sna-instagram-bot.mbp` and deployed successfully
- Source bot had instaloader fallback, but instaloader not in nixpkgs (yt-dlp-only mode in production)
### Implementation Pattern
**Extraction Architecture**:
```python
class InstagramBot(Plugin): # Inherits from maubot.Plugin
@event.on(EventType.ROOM_MESSAGE)
async def handle_message(self, event: MessageEvent):
# 1. Detect Instagram URLs via regex
# 2. Extract content with yt-dlp (async thread pool)
# 3. Upload media to Matrix homeserver
# 4. Send to room with metadata (caption, uploader, dimensions)
```
**Content Types Supported**:
- Posts (images)
- Reels (videos)
- IGTV (videos)
- Stories (if publicly accessible)
**File Handling**:
- Temporary directory for downloads (auto-cleanup)
- Max file size: 50MB (configurable)
- Supported formats: mp4, jpg, jpeg, png, webp
- MIME type detection for proper Matrix msgtype
**Metadata Extraction**:
- Title, description, uploader
- Dimensions (width x height)
- Duration (for videos)
- Posted as separate text message after media
### Rate Limiting Strategy
**Current State**: No rate limiting implemented in ops-base bot
**Risks**:
- Burst of URLs in high-traffic room could trigger Instagram rate limits
- No request tracking, queuing, or throttling
- Extraction failures logged but no retry logic
**Recommendations for 003-maubot-integration**:
1. Add per-room request tracking
2. Implement exponential backoff on extraction failures
3. Queue URLs and process with delays (e.g., 5 seconds between requests)
4. Add configuration for max requests/minute
5. Monitor extraction failure rates as health indicator
### Known Limitations
1. **Instagram API changes**: yt-dlp requires updates when Instagram changes interface
2. **Private content**: Cannot access private posts/stories (public only)
3. **Rate limiting exposure**: Heavy usage may cause temporary failures
4. **No retry logic**: Failed extractions not queued for later attempt
5. **File size limits**: 50MB hard limit, Matrix homeserver may have separate limits
6. **No caching**: Frequently shared URLs re-extracted every time
### Plugin Packaging
**Format**: `.mbp` archive (zip file)
**Structure**:
```
sna-instagram-bot.mbp:
instagram_bot.py (11,643 bytes)
maubot.yaml (plugin metadata)
README.md (documentation)
```
**Metadata** (maubot.yaml):
```yaml
id: sna.instagram
version: 1.0.0
main_class: InstagramBot
modules: [instagram_bot]
```
**Creation**:
```bash
cd /path/to/plugin
zip -r instagram-bot.mbp instagram_bot.py maubot.yaml README.md
```
**Deployment Methods**:
1. **API upload** (automated):
```bash
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-F "file=@instagram-bot.mbp" \
"http://localhost:29316/_matrix/maubot/v1/plugins/upload"
```
2. **Web UI** (manual): Upload via http://localhost:29316/_matrix/maubot (SSH tunnel)
### Source Files to Adapt
- Plugin source: `/home/dan/proj/sna/instagram_bot.py`
- Plugin package: `/home/dan/proj/sna/sna-instagram-bot.mbp`
- Deployment scripts: `/home/dan/proj/ops-base/scripts/*instagram-bot.sh`
### Alternatives Considered
**instaloader**:
- Rejected: Not available in nixpkgs
- ops-base bot had fallback support, but unused in production
**Official Instagram API**:
- Rejected: Requires Facebook developer approval (per spec clarifications)
- Community scraping approach acceptable for internal team use
---
## Decision 3: NixOS Module Adaptation Strategy
### Decision
**Two-layer module pattern** matching mautrix-slack architecture
### Rationale
- ops-jrz1 established pattern with mautrix-slack module
- Low-level module (`services.maubot`) provides full configuration surface
- High-level wrapper (`services.dev-platform.maubot`) simplifies common usage
- Consistent with existing infrastructure patterns
### Source Pattern: ops-base maubot.nix
**Module namespace**: `services.matrix-vm.maubot`
**Key characteristics**:
- Runtime config generation with placeholder substitution
- systemd `LoadCredential` for secrets injection
- Python script in `ExecStartPre` replaces placeholders
- SQLite database at `/var/lib/maubot/bot.db`
- Timer-based health monitoring (5min check + 10min auto-restart)
- Config template at `/etc/maubot/config.yaml` → runtime config at `/run/maubot/config.yaml`
**Secrets pattern**:
```nix
LoadCredential = [
"admin-password:${cfg.adminPasswordFile}"
"secret-key:${cfg.secretKeyFile}"
"registration-secret:${cfg.registrationSecretFile}" # REMOVE for conduwuit
];
```
### Target Pattern: ops-jrz1 Services
**mautrix-slack.nix pattern**:
- Module namespace: `services.mautrix-slack` (low-level)
- Wrapper: `services.dev-platform.slackBridge` in `modules/dev-services.nix`
- Config: Example config generation + YAML merging via Python
- Database: PostgreSQL via unix socket
- Secrets: No LoadCredential (tokens from interactive login)
- State: `/var/lib/mautrix_slack/config/config.yaml` (within StateDirectory)
**Adaptation decisions**:
| Aspect | ops-base | ops-jrz1 Target |
|--------|----------|-----------------|
| **Namespace** | `services.matrix-vm.maubot` | `services.maubot` + `services.dev-platform.maubot` |
| **Config location** | `/run/maubot/config.yaml` | `/var/lib/maubot/config/config.yaml` |
| **Config approach** | Template substitution | Example config + YAML merge + secret substitution |
| **Secrets** | LoadCredential + Python replacement | LoadCredential + Python replacement (retain ops-base pattern) |
| **Database** | SQLite `/var/lib/maubot/bot.db` | SQLite (same path) |
| **Logs** | File + journal | Journal only (StandardOutput) |
| **State** | Manual StateDirectory + tmpfiles | `StateDirectory = "maubot"` (systemd managed) |
| **Health checks** | Timer-based (5min + 10min) | Retain ops-base pattern |
| **User/group** | `maubot:maubot` | `maubot:maubot` + `matrix-appservices` supplementary |
### Configuration Generation Hybrid Approach
**Recommendation**: Combine mautrix-slack example config pattern with ops-base secrets injection
**Steps**:
1. Run `maubot -c config.yaml -e` to generate example config (ensures structure completeness)
2. Python script merges structured overrides (like mautrix-slack)
3. Write config with placeholders to StateDirectory
4. Second step reads from `CREDENTIALS_DIRECTORY` and replaces placeholders
5. Final config written with proper permissions (0600)
**Why hybrid**:
- Example config ensures YAML structure stays valid across maubot versions
- LoadCredential provides better security than storing secrets in Nix store
- Proven pattern from both source (ops-base) and target (mautrix-slack)
### Database Decision
**Recommendation**: SQLite (match ops-base)
**Rationale**:
- Maubot workload is lightweight (bot state, plugin configs)
- ops-base SQLite deployment proven stable
- Simpler backup/restore (single file)
- Isolation from shared PostgreSQL (Forgejo, mautrix-slack use it)
- Less complex dependency chain
- Adequate for small team usage (<10 bot instances)
**Path**: `/var/lib/maubot/bot.db`
**Future**: Support PostgreSQL via config option if scaling needs emerge
### Secrets Management
**Recommendation**: Retain ops-base LoadCredential pattern
**Secrets required**:
```yaml
# In secrets/secrets.yaml (add)
maubot-admin-password: "..." # Admin UI login
maubot-secret-key: "..." # Session signing key
# matrix-registration-token: "..." # Already exists, reuse for bot user creation
```
**systemd configuration**:
```nix
LoadCredential = [
"admin-password:/run/secrets/maubot-admin-password"
"secret-key:/run/secrets/maubot-secret-key"
"registration-token:/run/secrets/matrix-registration-token" # Reused
];
```
**Substitution in ExecStartPre** (Python script):
```python
# Read from $CREDENTIALS_DIRECTORY
admin_pw = Path(os.environ['CREDENTIALS_DIRECTORY'], 'admin-password').read_text().strip()
# Replace placeholders in config
config = config.replace('REPLACE_ADMIN_PASSWORD', admin_pw)
```
**Why not mautrix-slack pattern**:
- mautrix-slack gets tokens via interactive login (no pre-provisioning needed)
- Maubot requires secrets before service starts (admin UI, signing key)
- LoadCredential keeps secrets out of Nix store and config files
### Health Monitoring
**Recommendation**: Retain ops-base timer-based pattern
**Implementation**:
- `maubot-health.service` (oneshot): Curl to `http://localhost:29316/_matrix/maubot/v1/version` every 5 minutes
- `maubot-health-restart.service` (oneshot): Check for failed health checks, restart if needed (every 10 minutes)
- `systemd.timers` for scheduling
**Why retain**:
- Maubot provides explicit health endpoint (unlike mautrix-slack)
- ops-base pattern proven reliable
- mautrix-slack has no health monitoring (only log-based Socket Mode checks)
- Valuable for production stability (auto-recovery)
### Directory Structure
**Target layout**:
```
/var/lib/maubot/
├── config/
│ └── config.yaml # Generated runtime config
├── plugins/ # Plugin storage (.mbp files)
├── trash/ # Deleted plugins
└── bot.db # SQLite database
```
**Changes from ops-base**:
- Config in StateDirectory (not `/run/maubot/`)
- Logs via journal (remove `/var/log/maubot/`)
- Use `StateDirectory = "maubot"` (systemd automatic management)
### Security Hardening
**Apply from mautrix-slack**:
- `StateDirectory = "maubot"`
- `StateDirectoryMode = "0750"`
- `PrivateTmp = true`
- `ProtectSystem = "strict"`
- `ReadWritePaths = [ cfg.dataDir ]`
- `MemoryMax = "512M"` (match ops-base)
- Standard systemd hardening flags
**Remove from ops-base**:
- `RuntimeDirectory` (use StateDirectory)
- `LogsDirectory` (use journal)
- Manual tmpfiles rules
### Integration Points
**hosts/ops-jrz1.nix additions**:
```nix
sops.secrets.maubot-admin-password = { mode = "0400"; };
sops.secrets.maubot-secret-key = { mode = "0400"; };
services.dev-platform.maubot = {
enable = true;
port = 29316; # Management interface
};
```
**modules/dev-services.nix additions**:
```nix
services.dev-platform.maubot = {
enable = mkOption { type = types.bool; default = false; };
port = mkOption { type = types.port; default = 29316; };
};
config = mkIf cfg.maubot.enable {
services.maubot = {
enable = true;
homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
serverName = cfg.matrix.serverName;
port = cfg.maubot.port;
# ... map other options
};
};
```
### Alternatives Considered
**Pure mautrix-slack pattern**:
- Rejected: Would require removing LoadCredential and storing secrets in config
- Less secure (secrets in Nix store or config files)
- More code rewrite from proven ops-base pattern
**Keep ops-base pattern exactly**:
- Rejected: Inconsistent with ops-jrz1 conventions
- Manual directory management instead of StateDirectory
- File-based logging instead of journal
- Less integration with dev-platform namespace
---
## Technical Context Summary
**Language/Version**: Python 3.11 (maubot runtime)
**Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite
**Storage**: SQLite at `/var/lib/maubot/bot.db`
**Testing**: Manual QA (automated tests future enhancement)
**Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49)
**Project Type**: Infrastructure service (NixOS module)
**Performance Goals**: <5 second Instagram content fetch (per SC-001), 99% uptime over 7 days (per SC-003)
**Constraints**: localhost-only management interface (SSH tunnel required), single Instagram bot instance initially
**Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002)
---
## Platform Vision Alignment
### Core Philosophy Adherence
**Build It Right Over Time**:
- Extract proven maubot module from ops-base (avoid reinvention)
- Declarative NixOS module pattern
- Self-documenting via quickstart.md and inline comments
- Sustainable pattern (matches existing mautrix-slack infrastructure)
**Presentable State First**:
- Working Instagram bot demonstrates value immediately
- Clear documentation (research.md, quickstart.md, contracts/)
- Professional deployment pattern (consistent with mautrix-slack)
### Architecture Principles
**Communication Layer**:
- Maubot extends Matrix functionality (bot framework)
- Instagram bot brings external content into Matrix (enriches communication)
- Aligns with Matrix-centric hub architecture
**Deployment Philosophy**:
- NixOS-Native pattern (module + sops-nix secrets)
- Declarative and reproducible
- Built-in rollback (NixOS generations)
- Clear separation: infrastructure (maubot service) vs application (Instagram plugin)
**Sustainability**:
- Small team focus (single bot instance initially, validate 3-instance capability)
- Quality over speed (comprehensive research before implementation)
- Proven patterns (extract from ops-base, not experimental)
---
## Risk Assessment
### Low Risk
- SQLite database (proven, simple)
- LoadCredential secrets (ops-base pattern working)
- Health monitoring (non-intrusive timers)
- StateDirectory approach (standard systemd)
### Medium Risk
- conduwuit compatibility (ops-base uses continuwuity fork)
- **Mitigation**: Early testing of bot registration and Matrix connection
- Two-layer module pattern (new for maubot, proven with mautrix-slack)
- **Mitigation**: Follow exact mautrix-slack pattern
- Instagram scraping stability (yt-dlp depends on Instagram not changing)
- **Mitigation**: yt-dlp actively maintained, ops-base deployment proven
### Requires Testing
- Registration token workflow with conduwuit (different from ops-base shared secret)
- Management interface localhost binding (security requirement)
- Instagram content fetching with current yt-dlp version
- Bot response in designated rooms only (room-based activation per FR-006)
- Auto-recovery after homeserver restart (SC-004)
---
## Next Steps
### Phase 1: Design & Contracts
1. Generate data-model.md with entities:
- Maubot Service, Bot Instance, Plugin, Bot Configuration, Admin Notification, Bot Database
2. Generate contracts/ with configuration schemas (if applicable)
3. Generate quickstart.md with deployment runbook including:
- Registration token setup
- Bot creation workflow
- Room subscription configuration
- Admin room access procedure
4. Update AGENTS.md with maubot, yt-dlp context
### Phase 2: Implementation Planning
1. Extract maubot.nix from ops-base to ops-jrz1
2. Adapt namespace and configuration patterns
3. Add sops secrets declarations
4. Create dev-platform wrapper in dev-services.nix
5. Test service startup and conduwuit connection
6. Deploy Instagram plugin
7. Validate SC-001 through SC-008
---
## References
### Source Files Analyzed
- `/home/dan/proj/ops-base/vm-configs/modules/maubot.nix` (387 lines)
- `/home/dan/proj/ops-base/vm-configs/modules/continuwuity.nix` (413 lines)
- `/home/dan/proj/ops-base/docs/maubot-deployment-instructions.md`
- `/home/dan/proj/ops-base/docs/continuwuit-appservice-registration-guide.md`
- `/home/dan/proj/ops-jrz1/modules/mautrix-slack.nix` (current)
- `/home/dan/proj/ops-jrz1/modules/dev-services.nix` (current)
- `/home/dan/proj/ops-jrz1/docs/platform-vision.md` (architecture principles)
- `/home/dan/proj/sna/instagram_bot.py` (11,643 bytes)
- `/home/dan/proj/sna/sna-instagram-bot.mbp` (packaged plugin)
### External Documentation
- Maubot official docs: https://docs.mau.fi/maubot/
- Conduwuit appservice guide: https://conduwuit.puppyirl.gay/appservices.html
- yt-dlp Instagram extractor: https://github.com/yt-dlp/yt-dlp
---
**Status**: Research complete. All technical unknowns resolved. Ready for Phase 1 design.