528 lines
18 KiB
Markdown
528 lines
18 KiB
Markdown
# Research Findings: Maubot Integration
|
|
|
|
**Feature**: 003-maubot-integration
|
|
**Date**: 2025-10-26
|
|
**Status**: Phase 0 complete
|
|
|
|
## Overview
|
|
|
|
Research conducted to resolve technical unknowns for extracting maubot from ops-base and deploying to ops-jrz1 with Instagram bot functionality.
|
|
|
|
---
|
|
|
|
## Decision 1: Maubot-Conduwuit Compatibility
|
|
|
|
### Decision
|
|
**YES - Maubot is fully compatible with conduwuit** with registration method modifications
|
|
|
|
### Rationale
|
|
- ops-base successfully runs maubot 0.5.2+ on continuwuity (conduwuit fork) at matrix.talu.uno
|
|
- Over 10 production maubot instances confirmed working with conduwuit
|
|
- Maubot uses standard Matrix Client-Server API (homeserver-agnostic)
|
|
- ops-jrz1 conduwuit (0.5.0-rc.8) supports all required Matrix APIs
|
|
|
|
### Key Finding: Registration Method Differs
|
|
**ops-base pattern (continuwuity)**:
|
|
```nix
|
|
registration_secrets:
|
|
matrix.talu.uno:
|
|
url: http://127.0.0.1:6167
|
|
secret: REPLACE_REGISTRATION_SECRET # Shared secret registration
|
|
```
|
|
|
|
**ops-jrz1 requirement (conduwuit)**:
|
|
- Conduwuit does NOT support `registration_shared_secret` like Synapse
|
|
- Must use **registration tokens** or **admin room commands** for bot user creation
|
|
|
|
### Recommended Approach
|
|
**Registration Token Method** (simpler, more secure):
|
|
1. Configure conduwuit with registration token (from sops-nix)
|
|
2. During bot client creation in maubot web UI, provide registration token
|
|
3. Bot registers via standard Matrix client registration API
|
|
|
|
**Alternative: Admin Room Commands**:
|
|
```
|
|
!admin users create-user maubot-bot-1
|
|
# Returns generated password
|
|
```
|
|
|
|
### Integration Pattern
|
|
- Remove `registration_secrets` section from maubot config
|
|
- Remove `registrationSecretFile` option from NixOS module
|
|
- Document registration token workflow in quickstart.md
|
|
|
|
### Compatibility Notes
|
|
- **Database**: SQLite works (no changes needed)
|
|
- **Network**: Use IPv4 `127.0.0.1:8008` (not `localhost` - conduwuit binds IPv4 only)
|
|
- **Encryption**: maubot 0.5.2+ supports E2EE with conduwuit
|
|
- **Appservice**: Maubot bots are regular users, not appservice users (no appservice registration needed)
|
|
|
|
### Known Issues (Resolved)
|
|
- maubot < 0.5.2 had bug causing excessive key uploads (fixed in 0.5.2+)
|
|
- Use latest stable maubot from nixpkgs
|
|
|
|
### References
|
|
- ops-base maubot.nix:387
|
|
- ops-base maubot-deployment-instructions.md
|
|
- ops-base conduwuit admin room discovery worklog
|
|
|
|
---
|
|
|
|
## Decision 2: Instagram Content Fetching
|
|
|
|
### Decision
|
|
**Use yt-dlp (primary) for Instagram content extraction**
|
|
|
|
### Rationale
|
|
- ops-base Instagram bot uses yt-dlp >=2023.1.6 (available in nixpkgs)
|
|
- Proven working implementation at `/home/dan/proj/sna/instagram_bot.py`
|
|
- Packaged as `sna-instagram-bot.mbp` and deployed successfully
|
|
- Source bot had instaloader fallback, but instaloader not in nixpkgs (yt-dlp-only mode in production)
|
|
|
|
### Implementation Pattern
|
|
|
|
**Extraction Architecture**:
|
|
```python
|
|
class InstagramBot(Plugin): # Inherits from maubot.Plugin
|
|
|
|
@event.on(EventType.ROOM_MESSAGE)
|
|
async def handle_message(self, event: MessageEvent):
|
|
# 1. Detect Instagram URLs via regex
|
|
# 2. Extract content with yt-dlp (async thread pool)
|
|
# 3. Upload media to Matrix homeserver
|
|
# 4. Send to room with metadata (caption, uploader, dimensions)
|
|
```
|
|
|
|
**Content Types Supported**:
|
|
- Posts (images)
|
|
- Reels (videos)
|
|
- IGTV (videos)
|
|
- Stories (if publicly accessible)
|
|
|
|
**File Handling**:
|
|
- Temporary directory for downloads (auto-cleanup)
|
|
- Max file size: 50MB (configurable)
|
|
- Supported formats: mp4, jpg, jpeg, png, webp
|
|
- MIME type detection for proper Matrix msgtype
|
|
|
|
**Metadata Extraction**:
|
|
- Title, description, uploader
|
|
- Dimensions (width x height)
|
|
- Duration (for videos)
|
|
- Posted as separate text message after media
|
|
|
|
### Rate Limiting Strategy
|
|
|
|
**Current State**: No rate limiting implemented in ops-base bot
|
|
|
|
**Risks**:
|
|
- Burst of URLs in high-traffic room could trigger Instagram rate limits
|
|
- No request tracking, queuing, or throttling
|
|
- Extraction failures logged but no retry logic
|
|
|
|
**Recommendations for 003-maubot-integration**:
|
|
1. Add per-room request tracking
|
|
2. Implement exponential backoff on extraction failures
|
|
3. Queue URLs and process with delays (e.g., 5 seconds between requests)
|
|
4. Add configuration for max requests/minute
|
|
5. Monitor extraction failure rates as health indicator
|
|
|
|
### Known Limitations
|
|
|
|
1. **Instagram API changes**: yt-dlp requires updates when Instagram changes interface
|
|
2. **Private content**: Cannot access private posts/stories (public only)
|
|
3. **Rate limiting exposure**: Heavy usage may cause temporary failures
|
|
4. **No retry logic**: Failed extractions not queued for later attempt
|
|
5. **File size limits**: 50MB hard limit, Matrix homeserver may have separate limits
|
|
6. **No caching**: Frequently shared URLs re-extracted every time
|
|
|
|
### Plugin Packaging
|
|
|
|
**Format**: `.mbp` archive (zip file)
|
|
|
|
**Structure**:
|
|
```
|
|
sna-instagram-bot.mbp:
|
|
instagram_bot.py (11,643 bytes)
|
|
maubot.yaml (plugin metadata)
|
|
README.md (documentation)
|
|
```
|
|
|
|
**Metadata** (maubot.yaml):
|
|
```yaml
|
|
id: sna.instagram
|
|
version: 1.0.0
|
|
main_class: InstagramBot
|
|
modules: [instagram_bot]
|
|
```
|
|
|
|
**Creation**:
|
|
```bash
|
|
cd /path/to/plugin
|
|
zip -r instagram-bot.mbp instagram_bot.py maubot.yaml README.md
|
|
```
|
|
|
|
**Deployment Methods**:
|
|
1. **API upload** (automated):
|
|
```bash
|
|
curl -X POST \
|
|
-H "Authorization: Bearer $TOKEN" \
|
|
-F "file=@instagram-bot.mbp" \
|
|
"http://localhost:29316/_matrix/maubot/v1/plugins/upload"
|
|
```
|
|
|
|
2. **Web UI** (manual): Upload via http://localhost:29316/_matrix/maubot (SSH tunnel)
|
|
|
|
### Source Files to Adapt
|
|
- Plugin source: `/home/dan/proj/sna/instagram_bot.py`
|
|
- Plugin package: `/home/dan/proj/sna/sna-instagram-bot.mbp`
|
|
- Deployment scripts: `/home/dan/proj/ops-base/scripts/*instagram-bot.sh`
|
|
|
|
### Alternatives Considered
|
|
|
|
**instaloader**:
|
|
- Rejected: Not available in nixpkgs
|
|
- ops-base bot had fallback support, but unused in production
|
|
|
|
**Official Instagram API**:
|
|
- Rejected: Requires Facebook developer approval (per spec clarifications)
|
|
- Community scraping approach acceptable for internal team use
|
|
|
|
---
|
|
|
|
## Decision 3: NixOS Module Adaptation Strategy
|
|
|
|
### Decision
|
|
**Two-layer module pattern** matching mautrix-slack architecture
|
|
|
|
### Rationale
|
|
- ops-jrz1 established pattern with mautrix-slack module
|
|
- Low-level module (`services.maubot`) provides full configuration surface
|
|
- High-level wrapper (`services.dev-platform.maubot`) simplifies common usage
|
|
- Consistent with existing infrastructure patterns
|
|
|
|
### Source Pattern: ops-base maubot.nix
|
|
|
|
**Module namespace**: `services.matrix-vm.maubot`
|
|
|
|
**Key characteristics**:
|
|
- Runtime config generation with placeholder substitution
|
|
- systemd `LoadCredential` for secrets injection
|
|
- Python script in `ExecStartPre` replaces placeholders
|
|
- SQLite database at `/var/lib/maubot/bot.db`
|
|
- Timer-based health monitoring (5min check + 10min auto-restart)
|
|
- Config template at `/etc/maubot/config.yaml` → runtime config at `/run/maubot/config.yaml`
|
|
|
|
**Secrets pattern**:
|
|
```nix
|
|
LoadCredential = [
|
|
"admin-password:${cfg.adminPasswordFile}"
|
|
"secret-key:${cfg.secretKeyFile}"
|
|
"registration-secret:${cfg.registrationSecretFile}" # REMOVE for conduwuit
|
|
];
|
|
```
|
|
|
|
### Target Pattern: ops-jrz1 Services
|
|
|
|
**mautrix-slack.nix pattern**:
|
|
- Module namespace: `services.mautrix-slack` (low-level)
|
|
- Wrapper: `services.dev-platform.slackBridge` in `modules/dev-services.nix`
|
|
- Config: Example config generation + YAML merging via Python
|
|
- Database: PostgreSQL via unix socket
|
|
- Secrets: No LoadCredential (tokens from interactive login)
|
|
- State: `/var/lib/mautrix_slack/config/config.yaml` (within StateDirectory)
|
|
|
|
**Adaptation decisions**:
|
|
|
|
| Aspect | ops-base | ops-jrz1 Target |
|
|
|--------|----------|-----------------|
|
|
| **Namespace** | `services.matrix-vm.maubot` | `services.maubot` + `services.dev-platform.maubot` |
|
|
| **Config location** | `/run/maubot/config.yaml` | `/var/lib/maubot/config/config.yaml` |
|
|
| **Config approach** | Template substitution | Example config + YAML merge + secret substitution |
|
|
| **Secrets** | LoadCredential + Python replacement | LoadCredential + Python replacement (retain ops-base pattern) |
|
|
| **Database** | SQLite `/var/lib/maubot/bot.db` | SQLite (same path) |
|
|
| **Logs** | File + journal | Journal only (StandardOutput) |
|
|
| **State** | Manual StateDirectory + tmpfiles | `StateDirectory = "maubot"` (systemd managed) |
|
|
| **Health checks** | Timer-based (5min + 10min) | Retain ops-base pattern |
|
|
| **User/group** | `maubot:maubot` | `maubot:maubot` + `matrix-appservices` supplementary |
|
|
|
|
### Configuration Generation Hybrid Approach
|
|
|
|
**Recommendation**: Combine mautrix-slack example config pattern with ops-base secrets injection
|
|
|
|
**Steps**:
|
|
1. Run `maubot -c config.yaml -e` to generate example config (ensures structure completeness)
|
|
2. Python script merges structured overrides (like mautrix-slack)
|
|
3. Write config with placeholders to StateDirectory
|
|
4. Second step reads from `CREDENTIALS_DIRECTORY` and replaces placeholders
|
|
5. Final config written with proper permissions (0600)
|
|
|
|
**Why hybrid**:
|
|
- Example config ensures YAML structure stays valid across maubot versions
|
|
- LoadCredential provides better security than storing secrets in Nix store
|
|
- Proven pattern from both source (ops-base) and target (mautrix-slack)
|
|
|
|
### Database Decision
|
|
|
|
**Recommendation**: SQLite (match ops-base)
|
|
|
|
**Rationale**:
|
|
- Maubot workload is lightweight (bot state, plugin configs)
|
|
- ops-base SQLite deployment proven stable
|
|
- Simpler backup/restore (single file)
|
|
- Isolation from shared PostgreSQL (Forgejo, mautrix-slack use it)
|
|
- Less complex dependency chain
|
|
- Adequate for small team usage (<10 bot instances)
|
|
|
|
**Path**: `/var/lib/maubot/bot.db`
|
|
|
|
**Future**: Support PostgreSQL via config option if scaling needs emerge
|
|
|
|
### Secrets Management
|
|
|
|
**Recommendation**: Retain ops-base LoadCredential pattern
|
|
|
|
**Secrets required**:
|
|
```yaml
|
|
# In secrets/secrets.yaml (add)
|
|
maubot-admin-password: "..." # Admin UI login
|
|
maubot-secret-key: "..." # Session signing key
|
|
# matrix-registration-token: "..." # Already exists, reuse for bot user creation
|
|
```
|
|
|
|
**systemd configuration**:
|
|
```nix
|
|
LoadCredential = [
|
|
"admin-password:/run/secrets/maubot-admin-password"
|
|
"secret-key:/run/secrets/maubot-secret-key"
|
|
"registration-token:/run/secrets/matrix-registration-token" # Reused
|
|
];
|
|
```
|
|
|
|
**Substitution in ExecStartPre** (Python script):
|
|
```python
|
|
# Read from $CREDENTIALS_DIRECTORY
|
|
admin_pw = Path(os.environ['CREDENTIALS_DIRECTORY'], 'admin-password').read_text().strip()
|
|
# Replace placeholders in config
|
|
config = config.replace('REPLACE_ADMIN_PASSWORD', admin_pw)
|
|
```
|
|
|
|
**Why not mautrix-slack pattern**:
|
|
- mautrix-slack gets tokens via interactive login (no pre-provisioning needed)
|
|
- Maubot requires secrets before service starts (admin UI, signing key)
|
|
- LoadCredential keeps secrets out of Nix store and config files
|
|
|
|
### Health Monitoring
|
|
|
|
**Recommendation**: Retain ops-base timer-based pattern
|
|
|
|
**Implementation**:
|
|
- `maubot-health.service` (oneshot): Curl to `http://localhost:29316/_matrix/maubot/v1/version` every 5 minutes
|
|
- `maubot-health-restart.service` (oneshot): Check for failed health checks, restart if needed (every 10 minutes)
|
|
- `systemd.timers` for scheduling
|
|
|
|
**Why retain**:
|
|
- Maubot provides explicit health endpoint (unlike mautrix-slack)
|
|
- ops-base pattern proven reliable
|
|
- mautrix-slack has no health monitoring (only log-based Socket Mode checks)
|
|
- Valuable for production stability (auto-recovery)
|
|
|
|
### Directory Structure
|
|
|
|
**Target layout**:
|
|
```
|
|
/var/lib/maubot/
|
|
├── config/
|
|
│ └── config.yaml # Generated runtime config
|
|
├── plugins/ # Plugin storage (.mbp files)
|
|
├── trash/ # Deleted plugins
|
|
└── bot.db # SQLite database
|
|
```
|
|
|
|
**Changes from ops-base**:
|
|
- Config in StateDirectory (not `/run/maubot/`)
|
|
- Logs via journal (remove `/var/log/maubot/`)
|
|
- Use `StateDirectory = "maubot"` (systemd automatic management)
|
|
|
|
### Security Hardening
|
|
|
|
**Apply from mautrix-slack**:
|
|
- `StateDirectory = "maubot"`
|
|
- `StateDirectoryMode = "0750"`
|
|
- `PrivateTmp = true`
|
|
- `ProtectSystem = "strict"`
|
|
- `ReadWritePaths = [ cfg.dataDir ]`
|
|
- `MemoryMax = "512M"` (match ops-base)
|
|
- Standard systemd hardening flags
|
|
|
|
**Remove from ops-base**:
|
|
- `RuntimeDirectory` (use StateDirectory)
|
|
- `LogsDirectory` (use journal)
|
|
- Manual tmpfiles rules
|
|
|
|
### Integration Points
|
|
|
|
**hosts/ops-jrz1.nix additions**:
|
|
```nix
|
|
sops.secrets.maubot-admin-password = { mode = "0400"; };
|
|
sops.secrets.maubot-secret-key = { mode = "0400"; };
|
|
|
|
services.dev-platform.maubot = {
|
|
enable = true;
|
|
port = 29316; # Management interface
|
|
};
|
|
```
|
|
|
|
**modules/dev-services.nix additions**:
|
|
```nix
|
|
services.dev-platform.maubot = {
|
|
enable = mkOption { type = types.bool; default = false; };
|
|
port = mkOption { type = types.port; default = 29316; };
|
|
};
|
|
|
|
config = mkIf cfg.maubot.enable {
|
|
services.maubot = {
|
|
enable = true;
|
|
homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
|
|
serverName = cfg.matrix.serverName;
|
|
port = cfg.maubot.port;
|
|
# ... map other options
|
|
};
|
|
};
|
|
```
|
|
|
|
### Alternatives Considered
|
|
|
|
**Pure mautrix-slack pattern**:
|
|
- Rejected: Would require removing LoadCredential and storing secrets in config
|
|
- Less secure (secrets in Nix store or config files)
|
|
- More code rewrite from proven ops-base pattern
|
|
|
|
**Keep ops-base pattern exactly**:
|
|
- Rejected: Inconsistent with ops-jrz1 conventions
|
|
- Manual directory management instead of StateDirectory
|
|
- File-based logging instead of journal
|
|
- Less integration with dev-platform namespace
|
|
|
|
---
|
|
|
|
## Technical Context Summary
|
|
|
|
**Language/Version**: Python 3.11 (maubot runtime)
|
|
**Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite
|
|
**Storage**: SQLite at `/var/lib/maubot/bot.db`
|
|
**Testing**: Manual QA (automated tests future enhancement)
|
|
**Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49)
|
|
**Project Type**: Infrastructure service (NixOS module)
|
|
**Performance Goals**: <5 second Instagram content fetch (per SC-001), 99% uptime over 7 days (per SC-003)
|
|
**Constraints**: localhost-only management interface (SSH tunnel required), single Instagram bot instance initially
|
|
**Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002)
|
|
|
|
---
|
|
|
|
## Platform Vision Alignment
|
|
|
|
### Core Philosophy Adherence
|
|
|
|
**Build It Right Over Time**:
|
|
- ✅ Extract proven maubot module from ops-base (avoid reinvention)
|
|
- ✅ Declarative NixOS module pattern
|
|
- ✅ Self-documenting via quickstart.md and inline comments
|
|
- ✅ Sustainable pattern (matches existing mautrix-slack infrastructure)
|
|
|
|
**Presentable State First**:
|
|
- ✅ Working Instagram bot demonstrates value immediately
|
|
- ✅ Clear documentation (research.md, quickstart.md, contracts/)
|
|
- ✅ Professional deployment pattern (consistent with mautrix-slack)
|
|
|
|
### Architecture Principles
|
|
|
|
**Communication Layer**:
|
|
- ✅ Maubot extends Matrix functionality (bot framework)
|
|
- ✅ Instagram bot brings external content into Matrix (enriches communication)
|
|
- ✅ Aligns with Matrix-centric hub architecture
|
|
|
|
**Deployment Philosophy**:
|
|
- ✅ NixOS-Native pattern (module + sops-nix secrets)
|
|
- ✅ Declarative and reproducible
|
|
- ✅ Built-in rollback (NixOS generations)
|
|
- ✅ Clear separation: infrastructure (maubot service) vs application (Instagram plugin)
|
|
|
|
**Sustainability**:
|
|
- ✅ Small team focus (single bot instance initially, validate 3-instance capability)
|
|
- ✅ Quality over speed (comprehensive research before implementation)
|
|
- ✅ Proven patterns (extract from ops-base, not experimental)
|
|
|
|
---
|
|
|
|
## Risk Assessment
|
|
|
|
### Low Risk
|
|
- SQLite database (proven, simple)
|
|
- LoadCredential secrets (ops-base pattern working)
|
|
- Health monitoring (non-intrusive timers)
|
|
- StateDirectory approach (standard systemd)
|
|
|
|
### Medium Risk
|
|
- conduwuit compatibility (ops-base uses continuwuity fork)
|
|
- **Mitigation**: Early testing of bot registration and Matrix connection
|
|
- Two-layer module pattern (new for maubot, proven with mautrix-slack)
|
|
- **Mitigation**: Follow exact mautrix-slack pattern
|
|
- Instagram scraping stability (yt-dlp depends on Instagram not changing)
|
|
- **Mitigation**: yt-dlp actively maintained, ops-base deployment proven
|
|
|
|
### Requires Testing
|
|
- Registration token workflow with conduwuit (different from ops-base shared secret)
|
|
- Management interface localhost binding (security requirement)
|
|
- Instagram content fetching with current yt-dlp version
|
|
- Bot response in designated rooms only (room-based activation per FR-006)
|
|
- Auto-recovery after homeserver restart (SC-004)
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
### Phase 1: Design & Contracts
|
|
1. Generate data-model.md with entities:
|
|
- Maubot Service, Bot Instance, Plugin, Bot Configuration, Admin Notification, Bot Database
|
|
2. Generate contracts/ with configuration schemas (if applicable)
|
|
3. Generate quickstart.md with deployment runbook including:
|
|
- Registration token setup
|
|
- Bot creation workflow
|
|
- Room subscription configuration
|
|
- Admin room access procedure
|
|
4. Update AGENTS.md with maubot, yt-dlp context
|
|
|
|
### Phase 2: Implementation Planning
|
|
1. Extract maubot.nix from ops-base to ops-jrz1
|
|
2. Adapt namespace and configuration patterns
|
|
3. Add sops secrets declarations
|
|
4. Create dev-platform wrapper in dev-services.nix
|
|
5. Test service startup and conduwuit connection
|
|
6. Deploy Instagram plugin
|
|
7. Validate SC-001 through SC-008
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
### Source Files Analyzed
|
|
- `/home/dan/proj/ops-base/vm-configs/modules/maubot.nix` (387 lines)
|
|
- `/home/dan/proj/ops-base/vm-configs/modules/continuwuity.nix` (413 lines)
|
|
- `/home/dan/proj/ops-base/docs/maubot-deployment-instructions.md`
|
|
- `/home/dan/proj/ops-base/docs/continuwuit-appservice-registration-guide.md`
|
|
- `/home/dan/proj/ops-jrz1/modules/mautrix-slack.nix` (current)
|
|
- `/home/dan/proj/ops-jrz1/modules/dev-services.nix` (current)
|
|
- `/home/dan/proj/ops-jrz1/docs/platform-vision.md` (architecture principles)
|
|
- `/home/dan/proj/sna/instagram_bot.py` (11,643 bytes)
|
|
- `/home/dan/proj/sna/sna-instagram-bot.mbp` (packaged plugin)
|
|
|
|
### External Documentation
|
|
- Maubot official docs: https://docs.mau.fi/maubot/
|
|
- Conduwuit appservice guide: https://conduwuit.puppyirl.gay/appservices.html
|
|
- yt-dlp Instagram extractor: https://github.com/yt-dlp/yt-dlp
|
|
|
|
---
|
|
|
|
**Status**: Research complete. All technical unknowns resolved. Ready for Phase 1 design.
|