- maubot.nix: Declarative bot framework with plugin deployment - backup.nix: Local backup service for Matrix/bridge data - sna-instagram-bot: Instagram content bridge plugin - beads: Issue tracking workflow integrated - spec 004: Browser-based dev environment design - nixpkgs bump: Oct 22 → Dec 2 - Fix maubot health check (401 = healthy)
14 KiB
Implementation Plan: Maubot Integration
Branch: 003-maubot-integration | Date: 2025-10-26 | Spec: spec.md
Input: Feature specification from /specs/003-maubot-integration/spec.md
Summary
Extract maubot bot framework from ops-base and deploy to ops-jrz1 with Instagram bot plugin. Primary approach: adapt proven ops-base maubot.nix module to ops-jrz1 patterns (conduwuit homeserver, sops-nix secrets, dev-platform wrapper), using registration token auth instead of shared secret. Instagram content fetching via yt-dlp (community scraping). Deployment validates single-instance initially, architecture supports 3+ concurrent instances.
Technical Context
Language/Version: Python 3.11 (maubot runtime environment)
Primary Dependencies: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite, sops-nix
Storage: SQLite /var/lib/maubot/bot.db (service state), per-bot databases (plugin-specific)
Testing: Manual QA on production VPS (no staging environment), 7-day validation period
Target Platform: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49, x86_64-linux)
Project Type: Infrastructure service (NixOS module)
Performance Goals: <5 second Instagram content fetch (SC-001), 99% uptime over 7 days (SC-003), <2 second management UI load (SC-007)
Constraints: Localhost-only management interface (SSH tunnel required), single Instagram bot instance initially, conduwuit registration token auth (no shared secret)
Scale/Scope: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002), small team usage (<20 Instagram fetches/day)
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
Principle I: Declarative Infrastructure ✅ PASS
Compliance:
- All maubot configuration defined in NixOS modules (maubot.nix, dev-services.nix)
- No imperative modifications required (service managed via nixos-rebuild)
- Configuration changes deployed declaratively
- Rollback via NixOS generations
Evidence:
- Module adaptation documented in research.md (ops-base → ops-jrz1 pattern)
- Secrets via sops-nix (declarative encryption)
- Runtime config generated from NixOS module options
Principle II: Security First ✅ PASS
Compliance:
- All secrets encrypted via sops-nix (maubot-admin-password, maubot-secret-key, registration-token)
- Runtime secrets in /run/secrets/ (tmpfs, ephemeral)
- No secrets in Nix store or configuration files (LoadCredential pattern)
- Management interface localhost-only (SSH tunnel required per FR-003)
Evidence:
- Secrets management pattern documented in data-model.md
- File permissions: 0400 for secrets, 0600 for config with credentials
- Pre-commit hooks scan for secret leaks (inherited from platform)
Principle III: Presentable State Over Speed ✅ PASS
Compliance:
- Comprehensive specification (spec.md with 16 functional requirements, 4 user stories)
- Complete documentation suite (research.md, data-model.md, quickstart.md)
- 7-day validation period required before announcement (per constitution)
- Success criteria measurable and testable (SC-001 through SC-008)
Evidence:
- Spec clarification session resolved all ambiguities (5 questions answered)
- Quickstart.md provides deployment runbook with troubleshooting
- Testing checklist in quickstart.md validates all success criteria
Principle IV: Quality Over Quick Wins ✅ PASS
Compliance:
- Extracted proven pattern from ops-base (391-line maubot.nix module in production)
- Research phase documented alternatives (yt-dlp vs instaloader, SQLite vs PostgreSQL)
- Follows established ops-jrz1 patterns (mautrix-slack module structure, sops-nix secrets)
- Spec-kit workflow followed (specify → clarify → plan → tasks → implement)
Evidence:
- Research.md documents 3 major technical decisions with rationale
- Module adaptation strategy preserves ops-base proven components
- Constitution check validates pattern consistency
Gate Status: ✅ ALL CHECKS PASS - Proceed to implementation
Project Structure
Documentation (this feature)
specs/003-maubot-integration/
├── spec.md # Feature specification (✅ complete)
├── plan.md # This file (✅ complete)
├── research.md # Phase 0 output (✅ complete)
├── data-model.md # Phase 1 output (✅ complete)
├── quickstart.md # Phase 1 output (✅ complete)
├── checklists/
│ └── requirements.md # Quality validation (✅ complete)
└── tasks.md # Phase 2 output (/speckit.tasks - pending)
Source Code (repository root)
Structure Decision: Infrastructure service (NixOS module) - no application source code
/home/dan/proj/ops-jrz1/
├── modules/
│ ├── maubot.nix # Low-level maubot service module (to create)
│ ├── dev-services.nix # High-level wrapper (to update)
│ ├── mautrix-slack.nix # Reference pattern (existing)
│ └── matrix-continuwuity.nix # Matrix homeserver (existing)
├── hosts/
│ └── ops-jrz1.nix # VPS configuration (to update: enable maubot)
├── secrets/
│ └── secrets.yaml # Encrypted secrets (to update: add maubot secrets)
├── specs/
│ └── 003-maubot-integration/ # This feature directory
└── docs/
├── platform-vision.md # North star document (reference)
├── CLAUDE.md # Development guidelines (to update)
└── worklogs/ # Session logs (to create after deployment)
External source files (to copy/adapt):
/home/dan/proj/ops-base/
└── vm-configs/modules/
└── maubot.nix # Source module (391 lines, proven in production)
/home/dan/proj/sna/
├── instagram_bot.py # Instagram bot source (11,643 bytes)
└── sna-instagram-bot.mbp # Packaged plugin (ready to upload)
Runtime state (on VPS after deployment):
/var/lib/maubot/
├── config/
│ └── config.yaml # Generated runtime config
├── plugins/
│ └── sna.instagram-v1.0.0.mbp # Uploaded plugin
├── bot.db # SQLite database (service state)
└── trash/ # Deleted plugins
/run/secrets/ # sops-nix decrypted secrets (tmpfs)
├── maubot-admin-password
├── maubot-secret-key
└── matrix-registration-token
Deployment Strategy
Context: ops-jrz1 is a live production server with critical services (Matrix homeserver, Slack bridge, PostgreSQL, Forgejo, nginx). Deployment must be incremental with validation checkpoints.
Live Server Risk Assessment
Critical Services (must remain operational):
- conduwuit Matrix homeserver (8008) - All Matrix functionality
- mautrix-slack (29319) - ~50 Slack channels syncing bidirectionally
- PostgreSQL (5432) - Bridge database (172KB, critical state)
- Forgejo (git.clarun.xyz) - Code hosting
- nginx (443) - TLS termination for all public services
New Service (isolated):
- maubot (29316, localhost-only) - New SQLite database, different port, no appservice registration
Incremental Deployment Approach
Deploy in 4 phases with git commits as rollback points:
Phase 1: Module Files (No-Op Deployment)
- Add modules/maubot.nix (adapted from ops-base)
- Add services.dev-platform.maubot wrapper to modules/dev-services.nix (options + config)
- Do NOT enable: services.dev-platform.maubot.enable remains unset
- Deploy → Verify no services changed → Git commit
- Rollback: nixos-rebuild switch --rollback OR git revert
Phase 2: Secrets (Preparation)
- Add maubot-admin-password, maubot-secret-key to secrets/secrets.yaml
- Add sops.secrets declarations to hosts/ops-jrz1.nix
- Still disabled: services.dev-platform.maubot.enable remains unset
- Deploy → Verify secrets decrypt to /run/secrets/ → Git commit
- Rollback: nixos-rebuild switch --rollback OR git revert
Phase 3: Service Start (Module Only)
- Enable in hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = true
- Deploy → Verify maubot.service starts → Verify existing services healthy → Git commit
- Rollback: Set enable = false + redeploy OR nixos-rebuild switch --rollback
Phase 4: Bot Deployment (Manual, Reversible)
- SSH tunnel to management UI (localhost:29316)
- Create bot Matrix user via registration token
- Upload Instagram plugin (.mbp file)
- Create bot instance (test in private room first)
- Rollback: Delete bot instance via web UI (no code changes to revert)
Validation Checkpoints
After each phase deployment:
# 1. Verify existing services still healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
# 2. Check for errors in last 5 minutes (excluding maubot)
ssh root@45.77.205.49 'journalctl --since "5 minutes ago" | grep -E "ERR|CRIT|FTL" | grep -v maubot'
# 3. Test Slack bridge (post in Slack, verify appears in Matrix)
# Phase-specific validations documented in tasks.md
Rollback Procedures
NixOS Generation Rollback (fastest):
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
Git Revert (if committed):
git revert HEAD
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
Service Disable (Phase 3 specific):
# In hosts/ops-jrz1.nix
services.dev-platform.maubot.enable = false; # Then redeploy
Risk Mitigation
Known risks from mautrix-slack deployment (2025-10-26):
- IPv4 vs localhost: Always use 127.0.0.1 (not localhost) in homeserverUrl
- Conduwuit database corruption: Have database wipe procedure ready (low risk - fresh maubot install)
- Port conflicts: Maubot uses 29316 (unique, no conflicts expected)
Blast radius containment:
- Phase 1 fail → Nix syntax errors only, no runtime impact
- Phase 2 fail → Secrets issue, no services affected
- Phase 3 fail → Maubot won't start, but Matrix/Slack/Forgejo unaffected (different ports, databases)
- Phase 4 fail → Bot instance only, delete via UI
Success Criteria Per Phase
- Phase 1: Build succeeds, nixos-rebuild reports "no services changed"
- Phase 2: /run/secrets/maubot-* files exist with mode 0400, existing services healthy
- Phase 3: systemctl status maubot.service shows "active (running)", management UI accessible via SSH tunnel
- Phase 4: Bot responds to Instagram URL in <5 seconds (SC-001)
Update/Upgrade Procedure (State-Preserving)
After initial deployment, future updates must preserve runtime state in /var/lib/maubot/:
bot.db- Service state (bot instances, plugin configurations)plugins/- Uploaded .mbp filesconfig/config.yaml- Generated runtime config
Typical update scenarios:
Scenario 1: Module Configuration Change (e.g., change port, add new option)
# 1. Edit modules/dev-services.nix or hosts/ops-jrz1.nix
# 2. Deploy
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
# 3. Verify service restarted cleanly
ssh root@45.77.205.49 'systemctl status maubot.service'
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
# 4. Verify bot instances still running (check management UI)
# StateDirectory persists across service restarts
Scenario 2: Maubot Version Upgrade (nixpkgs update)
# 1. Update flake.lock or nixpkgs input
nix flake update
# 2. Review maubot changelog for breaking changes
# Check: https://github.com/maubot/maubot/releases
# 3. Deploy with build test first
nixos-rebuild build --flake .#ops-jrz1
# 4. If build succeeds, deploy
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
# 5. Monitor service restart
ssh root@45.77.205.49 'journalctl -u maubot.service -f'
# 6. Verify bot instances reconnected (check Matrix room for bot presence)
Scenario 3: Plugin Update (new Instagram bot version)
# Manual via web UI:
# 1. Upload new .mbp file (Plugins tab → Upload)
# 2. Maubot detects version change
# 3. Restart affected bot instances (Instances tab → Stop → Start)
# 4. Test in private room before production use
# No nixos-rebuild needed - plugin is runtime state
Scenario 4: Add New Bot Instance (e.g., second Instagram bot or new bot type)
# Manual via web UI:
# 1. Create bot Matrix user (via registration token)
# 2. Upload plugin if new type (Plugins tab)
# 3. Create bot instance (Instances tab → Add instance)
# 4. Configure and enable
# No nixos-rebuild needed - bot instances are runtime state
State Preservation Guarantees:
- NixOS StateDirectory (
/var/lib/maubot/) persists across:- Service restarts (systemctl restart maubot.service)
- System reboots
- Module configuration changes
- Maubot version upgrades (unless database schema incompatible)
- StateDirectory only wiped if:
- Explicitly deleted manually
- Service definition changes StateDirectory path
- Major maubot version with incompatible schema (rare, documented in release notes)
Rollback with State:
# NixOS generation rollback preserves StateDirectory
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
# Bot instances resume with previous configuration
# Database and plugins unchanged
When to wipe database (rare, destructive):
# Only if:
# 1. Database corruption detected
# 2. Major version migration requires clean slate (check release notes)
# 3. Testing fresh deployment
# Backup first:
ssh root@45.77.205.49 'tar czf /root/maubot-backup-$(date +%Y%m%d).tar.gz /var/lib/maubot/'
# Wipe:
ssh root@45.77.205.49 'systemctl stop maubot.service'
ssh root@45.77.205.49 'rm -rf /var/lib/maubot/bot.db'
ssh root@45.77.205.49 'systemctl start maubot.service'
# Reconfigure all bot instances via web UI
Complexity Tracking
No violations - All constitution principles satisfied.
This feature follows established patterns:
- Declarative infrastructure (NixOS modules)
- Security first (sops-nix encrypted secrets)
- Presentable state (comprehensive spec, 7-day validation)
- Quality over speed (extract proven ops-base module, document alternatives)
No simpler alternatives rejected - Chosen approach is the simplest that meets requirements while maintaining quality standards.