# Implementation Plan: Maubot Integration **Branch**: `003-maubot-integration` | **Date**: 2025-10-26 | **Spec**: [spec.md](./spec.md) **Input**: Feature specification from `/specs/003-maubot-integration/spec.md` ## Summary Extract maubot bot framework from ops-base and deploy to ops-jrz1 with Instagram bot plugin. Primary approach: adapt proven ops-base maubot.nix module to ops-jrz1 patterns (conduwuit homeserver, sops-nix secrets, dev-platform wrapper), using registration token auth instead of shared secret. Instagram content fetching via yt-dlp (community scraping). Deployment validates single-instance initially, architecture supports 3+ concurrent instances. ## Technical Context **Language/Version**: Python 3.11 (maubot runtime environment) **Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite, sops-nix **Storage**: SQLite `/var/lib/maubot/bot.db` (service state), per-bot databases (plugin-specific) **Testing**: Manual QA on production VPS (no staging environment), 7-day validation period **Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49, x86_64-linux) **Project Type**: Infrastructure service (NixOS module) **Performance Goals**: <5 second Instagram content fetch (SC-001), 99% uptime over 7 days (SC-003), <2 second management UI load (SC-007) **Constraints**: Localhost-only management interface (SSH tunnel required), single Instagram bot instance initially, conduwuit registration token auth (no shared secret) **Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002), small team usage (<20 Instagram fetches/day) ## Constitution Check *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* ### Principle I: Declarative Infrastructure ✅ PASS **Compliance**: - All maubot configuration defined in NixOS modules (maubot.nix, dev-services.nix) - No imperative modifications required (service managed via nixos-rebuild) - Configuration changes deployed declaratively - Rollback via NixOS generations **Evidence**: - Module adaptation documented in research.md (ops-base → ops-jrz1 pattern) - Secrets via sops-nix (declarative encryption) - Runtime config generated from NixOS module options ### Principle II: Security First ✅ PASS **Compliance**: - All secrets encrypted via sops-nix (maubot-admin-password, maubot-secret-key, registration-token) - Runtime secrets in /run/secrets/ (tmpfs, ephemeral) - No secrets in Nix store or configuration files (LoadCredential pattern) - Management interface localhost-only (SSH tunnel required per FR-003) **Evidence**: - Secrets management pattern documented in data-model.md - File permissions: 0400 for secrets, 0600 for config with credentials - Pre-commit hooks scan for secret leaks (inherited from platform) ### Principle III: Presentable State Over Speed ✅ PASS **Compliance**: - Comprehensive specification (spec.md with 16 functional requirements, 4 user stories) - Complete documentation suite (research.md, data-model.md, quickstart.md) - 7-day validation period required before announcement (per constitution) - Success criteria measurable and testable (SC-001 through SC-008) **Evidence**: - Spec clarification session resolved all ambiguities (5 questions answered) - Quickstart.md provides deployment runbook with troubleshooting - Testing checklist in quickstart.md validates all success criteria ### Principle IV: Quality Over Quick Wins ✅ PASS **Compliance**: - Extracted proven pattern from ops-base (391-line maubot.nix module in production) - Research phase documented alternatives (yt-dlp vs instaloader, SQLite vs PostgreSQL) - Follows established ops-jrz1 patterns (mautrix-slack module structure, sops-nix secrets) - Spec-kit workflow followed (specify → clarify → plan → tasks → implement) **Evidence**: - Research.md documents 3 major technical decisions with rationale - Module adaptation strategy preserves ops-base proven components - Constitution check validates pattern consistency **Gate Status**: ✅ ALL CHECKS PASS - Proceed to implementation ## Project Structure ### Documentation (this feature) ```text specs/003-maubot-integration/ ├── spec.md # Feature specification (✅ complete) ├── plan.md # This file (✅ complete) ├── research.md # Phase 0 output (✅ complete) ├── data-model.md # Phase 1 output (✅ complete) ├── quickstart.md # Phase 1 output (✅ complete) ├── checklists/ │ └── requirements.md # Quality validation (✅ complete) └── tasks.md # Phase 2 output (/speckit.tasks - pending) ``` ### Source Code (repository root) **Structure Decision**: Infrastructure service (NixOS module) - no application source code ```text /home/dan/proj/ops-jrz1/ ├── modules/ │ ├── maubot.nix # Low-level maubot service module (to create) │ ├── dev-services.nix # High-level wrapper (to update) │ ├── mautrix-slack.nix # Reference pattern (existing) │ └── matrix-continuwuity.nix # Matrix homeserver (existing) ├── hosts/ │ └── ops-jrz1.nix # VPS configuration (to update: enable maubot) ├── secrets/ │ └── secrets.yaml # Encrypted secrets (to update: add maubot secrets) ├── specs/ │ └── 003-maubot-integration/ # This feature directory └── docs/ ├── platform-vision.md # North star document (reference) ├── CLAUDE.md # Development guidelines (to update) └── worklogs/ # Session logs (to create after deployment) ``` **External source files** (to copy/adapt): ```text /home/dan/proj/ops-base/ └── vm-configs/modules/ └── maubot.nix # Source module (391 lines, proven in production) /home/dan/proj/sna/ ├── instagram_bot.py # Instagram bot source (11,643 bytes) └── sna-instagram-bot.mbp # Packaged plugin (ready to upload) ``` **Runtime state** (on VPS after deployment): ```text /var/lib/maubot/ ├── config/ │ └── config.yaml # Generated runtime config ├── plugins/ │ └── sna.instagram-v1.0.0.mbp # Uploaded plugin ├── bot.db # SQLite database (service state) └── trash/ # Deleted plugins /run/secrets/ # sops-nix decrypted secrets (tmpfs) ├── maubot-admin-password ├── maubot-secret-key └── matrix-registration-token ``` ## Deployment Strategy **Context**: ops-jrz1 is a live production server with critical services (Matrix homeserver, Slack bridge, PostgreSQL, Forgejo, nginx). Deployment must be incremental with validation checkpoints. ### Live Server Risk Assessment **Critical Services** (must remain operational): - conduwuit Matrix homeserver (8008) - All Matrix functionality - mautrix-slack (29319) - ~50 Slack channels syncing bidirectionally - PostgreSQL (5432) - Bridge database (172KB, critical state) - Forgejo (git.clarun.xyz) - Code hosting - nginx (443) - TLS termination for all public services **New Service** (isolated): - maubot (29316, localhost-only) - New SQLite database, different port, no appservice registration ### Incremental Deployment Approach Deploy in 4 phases with git commits as rollback points: **Phase 1: Module Files (No-Op Deployment)** - Add modules/maubot.nix (adapted from ops-base) - Add services.dev-platform.maubot wrapper to modules/dev-services.nix (options + config) - **Do NOT enable**: services.dev-platform.maubot.enable remains unset - Deploy → Verify no services changed → Git commit - **Rollback**: nixos-rebuild switch --rollback OR git revert **Phase 2: Secrets (Preparation)** - Add maubot-admin-password, maubot-secret-key to secrets/secrets.yaml - Add sops.secrets declarations to hosts/ops-jrz1.nix - **Still disabled**: services.dev-platform.maubot.enable remains unset - Deploy → Verify secrets decrypt to /run/secrets/ → Git commit - **Rollback**: nixos-rebuild switch --rollback OR git revert **Phase 3: Service Start (Module Only)** - Enable in hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = true - Deploy → Verify maubot.service starts → Verify existing services healthy → Git commit - **Rollback**: Set enable = false + redeploy OR nixos-rebuild switch --rollback **Phase 4: Bot Deployment (Manual, Reversible)** - SSH tunnel to management UI (localhost:29316) - Create bot Matrix user via registration token - Upload Instagram plugin (.mbp file) - Create bot instance (test in private room first) - **Rollback**: Delete bot instance via web UI (no code changes to revert) ### Validation Checkpoints After each phase deployment: ```bash # 1. Verify existing services still healthy ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx' # 2. Check for errors in last 5 minutes (excluding maubot) ssh root@45.77.205.49 'journalctl --since "5 minutes ago" | grep -E "ERR|CRIT|FTL" | grep -v maubot' # 3. Test Slack bridge (post in Slack, verify appears in Matrix) # Phase-specific validations documented in tasks.md ``` ### Rollback Procedures **NixOS Generation Rollback** (fastest): ```bash ssh root@45.77.205.49 'nixos-rebuild switch --rollback' ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack' ``` **Git Revert** (if committed): ```bash git revert HEAD nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost ``` **Service Disable** (Phase 3 specific): ```nix # In hosts/ops-jrz1.nix services.dev-platform.maubot.enable = false; # Then redeploy ``` ### Risk Mitigation **Known risks from mautrix-slack deployment** (2025-10-26): 1. IPv4 vs localhost: Always use 127.0.0.1 (not localhost) in homeserverUrl 2. Conduwuit database corruption: Have database wipe procedure ready (low risk - fresh maubot install) 3. Port conflicts: Maubot uses 29316 (unique, no conflicts expected) **Blast radius containment**: - Phase 1 fail → Nix syntax errors only, no runtime impact - Phase 2 fail → Secrets issue, no services affected - Phase 3 fail → Maubot won't start, but Matrix/Slack/Forgejo unaffected (different ports, databases) - Phase 4 fail → Bot instance only, delete via UI ### Success Criteria Per Phase - **Phase 1**: Build succeeds, nixos-rebuild reports "no services changed" - **Phase 2**: /run/secrets/maubot-* files exist with mode 0400, existing services healthy - **Phase 3**: systemctl status maubot.service shows "active (running)", management UI accessible via SSH tunnel - **Phase 4**: Bot responds to Instagram URL in <5 seconds (SC-001) ### Update/Upgrade Procedure (State-Preserving) After initial deployment, future updates must preserve runtime state in `/var/lib/maubot/`: - `bot.db` - Service state (bot instances, plugin configurations) - `plugins/` - Uploaded .mbp files - `config/config.yaml` - Generated runtime config **Typical update scenarios**: **Scenario 1: Module Configuration Change** (e.g., change port, add new option) ```bash # 1. Edit modules/dev-services.nix or hosts/ops-jrz1.nix # 2. Deploy nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost # 3. Verify service restarted cleanly ssh root@45.77.205.49 'systemctl status maubot.service' ssh root@45.77.205.49 'journalctl -u maubot.service -n 50' # 4. Verify bot instances still running (check management UI) # StateDirectory persists across service restarts ``` **Scenario 2: Maubot Version Upgrade** (nixpkgs update) ```bash # 1. Update flake.lock or nixpkgs input nix flake update # 2. Review maubot changelog for breaking changes # Check: https://github.com/maubot/maubot/releases # 3. Deploy with build test first nixos-rebuild build --flake .#ops-jrz1 # 4. If build succeeds, deploy nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost # 5. Monitor service restart ssh root@45.77.205.49 'journalctl -u maubot.service -f' # 6. Verify bot instances reconnected (check Matrix room for bot presence) ``` **Scenario 3: Plugin Update** (new Instagram bot version) ```bash # Manual via web UI: # 1. Upload new .mbp file (Plugins tab → Upload) # 2. Maubot detects version change # 3. Restart affected bot instances (Instances tab → Stop → Start) # 4. Test in private room before production use # No nixos-rebuild needed - plugin is runtime state ``` **Scenario 4: Add New Bot Instance** (e.g., second Instagram bot or new bot type) ```bash # Manual via web UI: # 1. Create bot Matrix user (via registration token) # 2. Upload plugin if new type (Plugins tab) # 3. Create bot instance (Instances tab → Add instance) # 4. Configure and enable # No nixos-rebuild needed - bot instances are runtime state ``` **State Preservation Guarantees**: - NixOS StateDirectory (`/var/lib/maubot/`) persists across: - Service restarts (systemctl restart maubot.service) - System reboots - Module configuration changes - Maubot version upgrades (unless database schema incompatible) - StateDirectory only wiped if: - Explicitly deleted manually - Service definition changes StateDirectory path - Major maubot version with incompatible schema (rare, documented in release notes) **Rollback with State**: ```bash # NixOS generation rollback preserves StateDirectory ssh root@45.77.205.49 'nixos-rebuild switch --rollback' # Bot instances resume with previous configuration # Database and plugins unchanged ``` **When to wipe database** (rare, destructive): ```bash # Only if: # 1. Database corruption detected # 2. Major version migration requires clean slate (check release notes) # 3. Testing fresh deployment # Backup first: ssh root@45.77.205.49 'tar czf /root/maubot-backup-$(date +%Y%m%d).tar.gz /var/lib/maubot/' # Wipe: ssh root@45.77.205.49 'systemctl stop maubot.service' ssh root@45.77.205.49 'rm -rf /var/lib/maubot/bot.db' ssh root@45.77.205.49 'systemctl start maubot.service' # Reconfigure all bot instances via web UI ``` ## Complexity Tracking **No violations** - All constitution principles satisfied. This feature follows established patterns: - Declarative infrastructure (NixOS modules) - Security first (sops-nix encrypted secrets) - Presentable state (comprehensive spec, 7-day validation) - Quality over speed (extract proven ops-base module, document alternatives) **No simpler alternatives rejected** - Chosen approach is the simplest that meets requirements while maintaining quality standards.