- maubot.nix: Declarative bot framework with plugin deployment - backup.nix: Local backup service for Matrix/bridge data - sna-instagram-bot: Instagram content bridge plugin - beads: Issue tracking workflow integrated - spec 004: Browser-based dev environment design - nixpkgs bump: Oct 22 → Dec 2 - Fix maubot health check (401 = healthy)
361 lines
14 KiB
Markdown
361 lines
14 KiB
Markdown
# Implementation Plan: Maubot Integration
|
|
|
|
**Branch**: `003-maubot-integration` | **Date**: 2025-10-26 | **Spec**: [spec.md](./spec.md)
|
|
**Input**: Feature specification from `/specs/003-maubot-integration/spec.md`
|
|
|
|
## Summary
|
|
|
|
Extract maubot bot framework from ops-base and deploy to ops-jrz1 with Instagram bot plugin. Primary approach: adapt proven ops-base maubot.nix module to ops-jrz1 patterns (conduwuit homeserver, sops-nix secrets, dev-platform wrapper), using registration token auth instead of shared secret. Instagram content fetching via yt-dlp (community scraping). Deployment validates single-instance initially, architecture supports 3+ concurrent instances.
|
|
|
|
## Technical Context
|
|
|
|
**Language/Version**: Python 3.11 (maubot runtime environment)
|
|
**Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite, sops-nix
|
|
**Storage**: SQLite `/var/lib/maubot/bot.db` (service state), per-bot databases (plugin-specific)
|
|
**Testing**: Manual QA on production VPS (no staging environment), 7-day validation period
|
|
**Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49, x86_64-linux)
|
|
**Project Type**: Infrastructure service (NixOS module)
|
|
**Performance Goals**: <5 second Instagram content fetch (SC-001), 99% uptime over 7 days (SC-003), <2 second management UI load (SC-007)
|
|
**Constraints**: Localhost-only management interface (SSH tunnel required), single Instagram bot instance initially, conduwuit registration token auth (no shared secret)
|
|
**Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002), small team usage (<20 Instagram fetches/day)
|
|
|
|
## Constitution Check
|
|
|
|
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
|
|
|
### Principle I: Declarative Infrastructure ✅ PASS
|
|
|
|
**Compliance**:
|
|
- All maubot configuration defined in NixOS modules (maubot.nix, dev-services.nix)
|
|
- No imperative modifications required (service managed via nixos-rebuild)
|
|
- Configuration changes deployed declaratively
|
|
- Rollback via NixOS generations
|
|
|
|
**Evidence**:
|
|
- Module adaptation documented in research.md (ops-base → ops-jrz1 pattern)
|
|
- Secrets via sops-nix (declarative encryption)
|
|
- Runtime config generated from NixOS module options
|
|
|
|
### Principle II: Security First ✅ PASS
|
|
|
|
**Compliance**:
|
|
- All secrets encrypted via sops-nix (maubot-admin-password, maubot-secret-key, registration-token)
|
|
- Runtime secrets in /run/secrets/ (tmpfs, ephemeral)
|
|
- No secrets in Nix store or configuration files (LoadCredential pattern)
|
|
- Management interface localhost-only (SSH tunnel required per FR-003)
|
|
|
|
**Evidence**:
|
|
- Secrets management pattern documented in data-model.md
|
|
- File permissions: 0400 for secrets, 0600 for config with credentials
|
|
- Pre-commit hooks scan for secret leaks (inherited from platform)
|
|
|
|
### Principle III: Presentable State Over Speed ✅ PASS
|
|
|
|
**Compliance**:
|
|
- Comprehensive specification (spec.md with 16 functional requirements, 4 user stories)
|
|
- Complete documentation suite (research.md, data-model.md, quickstart.md)
|
|
- 7-day validation period required before announcement (per constitution)
|
|
- Success criteria measurable and testable (SC-001 through SC-008)
|
|
|
|
**Evidence**:
|
|
- Spec clarification session resolved all ambiguities (5 questions answered)
|
|
- Quickstart.md provides deployment runbook with troubleshooting
|
|
- Testing checklist in quickstart.md validates all success criteria
|
|
|
|
### Principle IV: Quality Over Quick Wins ✅ PASS
|
|
|
|
**Compliance**:
|
|
- Extracted proven pattern from ops-base (391-line maubot.nix module in production)
|
|
- Research phase documented alternatives (yt-dlp vs instaloader, SQLite vs PostgreSQL)
|
|
- Follows established ops-jrz1 patterns (mautrix-slack module structure, sops-nix secrets)
|
|
- Spec-kit workflow followed (specify → clarify → plan → tasks → implement)
|
|
|
|
**Evidence**:
|
|
- Research.md documents 3 major technical decisions with rationale
|
|
- Module adaptation strategy preserves ops-base proven components
|
|
- Constitution check validates pattern consistency
|
|
|
|
**Gate Status**: ✅ ALL CHECKS PASS - Proceed to implementation
|
|
|
|
## Project Structure
|
|
|
|
### Documentation (this feature)
|
|
|
|
```text
|
|
specs/003-maubot-integration/
|
|
├── spec.md # Feature specification (✅ complete)
|
|
├── plan.md # This file (✅ complete)
|
|
├── research.md # Phase 0 output (✅ complete)
|
|
├── data-model.md # Phase 1 output (✅ complete)
|
|
├── quickstart.md # Phase 1 output (✅ complete)
|
|
├── checklists/
|
|
│ └── requirements.md # Quality validation (✅ complete)
|
|
└── tasks.md # Phase 2 output (/speckit.tasks - pending)
|
|
```
|
|
|
|
### Source Code (repository root)
|
|
|
|
**Structure Decision**: Infrastructure service (NixOS module) - no application source code
|
|
|
|
```text
|
|
/home/dan/proj/ops-jrz1/
|
|
├── modules/
|
|
│ ├── maubot.nix # Low-level maubot service module (to create)
|
|
│ ├── dev-services.nix # High-level wrapper (to update)
|
|
│ ├── mautrix-slack.nix # Reference pattern (existing)
|
|
│ └── matrix-continuwuity.nix # Matrix homeserver (existing)
|
|
├── hosts/
|
|
│ └── ops-jrz1.nix # VPS configuration (to update: enable maubot)
|
|
├── secrets/
|
|
│ └── secrets.yaml # Encrypted secrets (to update: add maubot secrets)
|
|
├── specs/
|
|
│ └── 003-maubot-integration/ # This feature directory
|
|
└── docs/
|
|
├── platform-vision.md # North star document (reference)
|
|
├── CLAUDE.md # Development guidelines (to update)
|
|
└── worklogs/ # Session logs (to create after deployment)
|
|
```
|
|
|
|
**External source files** (to copy/adapt):
|
|
```text
|
|
/home/dan/proj/ops-base/
|
|
└── vm-configs/modules/
|
|
└── maubot.nix # Source module (391 lines, proven in production)
|
|
|
|
/home/dan/proj/sna/
|
|
├── instagram_bot.py # Instagram bot source (11,643 bytes)
|
|
└── sna-instagram-bot.mbp # Packaged plugin (ready to upload)
|
|
```
|
|
|
|
**Runtime state** (on VPS after deployment):
|
|
```text
|
|
/var/lib/maubot/
|
|
├── config/
|
|
│ └── config.yaml # Generated runtime config
|
|
├── plugins/
|
|
│ └── sna.instagram-v1.0.0.mbp # Uploaded plugin
|
|
├── bot.db # SQLite database (service state)
|
|
└── trash/ # Deleted plugins
|
|
|
|
/run/secrets/ # sops-nix decrypted secrets (tmpfs)
|
|
├── maubot-admin-password
|
|
├── maubot-secret-key
|
|
└── matrix-registration-token
|
|
```
|
|
|
|
## Deployment Strategy
|
|
|
|
**Context**: ops-jrz1 is a live production server with critical services (Matrix homeserver, Slack bridge, PostgreSQL, Forgejo, nginx). Deployment must be incremental with validation checkpoints.
|
|
|
|
### Live Server Risk Assessment
|
|
|
|
**Critical Services** (must remain operational):
|
|
- conduwuit Matrix homeserver (8008) - All Matrix functionality
|
|
- mautrix-slack (29319) - ~50 Slack channels syncing bidirectionally
|
|
- PostgreSQL (5432) - Bridge database (172KB, critical state)
|
|
- Forgejo (git.clarun.xyz) - Code hosting
|
|
- nginx (443) - TLS termination for all public services
|
|
|
|
**New Service** (isolated):
|
|
- maubot (29316, localhost-only) - New SQLite database, different port, no appservice registration
|
|
|
|
### Incremental Deployment Approach
|
|
|
|
Deploy in 4 phases with git commits as rollback points:
|
|
|
|
**Phase 1: Module Files (No-Op Deployment)**
|
|
- Add modules/maubot.nix (adapted from ops-base)
|
|
- Add services.dev-platform.maubot wrapper to modules/dev-services.nix (options + config)
|
|
- **Do NOT enable**: services.dev-platform.maubot.enable remains unset
|
|
- Deploy → Verify no services changed → Git commit
|
|
- **Rollback**: nixos-rebuild switch --rollback OR git revert
|
|
|
|
**Phase 2: Secrets (Preparation)**
|
|
- Add maubot-admin-password, maubot-secret-key to secrets/secrets.yaml
|
|
- Add sops.secrets declarations to hosts/ops-jrz1.nix
|
|
- **Still disabled**: services.dev-platform.maubot.enable remains unset
|
|
- Deploy → Verify secrets decrypt to /run/secrets/ → Git commit
|
|
- **Rollback**: nixos-rebuild switch --rollback OR git revert
|
|
|
|
**Phase 3: Service Start (Module Only)**
|
|
- Enable in hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = true
|
|
- Deploy → Verify maubot.service starts → Verify existing services healthy → Git commit
|
|
- **Rollback**: Set enable = false + redeploy OR nixos-rebuild switch --rollback
|
|
|
|
**Phase 4: Bot Deployment (Manual, Reversible)**
|
|
- SSH tunnel to management UI (localhost:29316)
|
|
- Create bot Matrix user via registration token
|
|
- Upload Instagram plugin (.mbp file)
|
|
- Create bot instance (test in private room first)
|
|
- **Rollback**: Delete bot instance via web UI (no code changes to revert)
|
|
|
|
### Validation Checkpoints
|
|
|
|
After each phase deployment:
|
|
```bash
|
|
# 1. Verify existing services still healthy
|
|
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
|
|
|
|
# 2. Check for errors in last 5 minutes (excluding maubot)
|
|
ssh root@45.77.205.49 'journalctl --since "5 minutes ago" | grep -E "ERR|CRIT|FTL" | grep -v maubot'
|
|
|
|
# 3. Test Slack bridge (post in Slack, verify appears in Matrix)
|
|
|
|
# Phase-specific validations documented in tasks.md
|
|
```
|
|
|
|
### Rollback Procedures
|
|
|
|
**NixOS Generation Rollback** (fastest):
|
|
```bash
|
|
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
|
|
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
|
|
```
|
|
|
|
**Git Revert** (if committed):
|
|
```bash
|
|
git revert HEAD
|
|
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
|
|
```
|
|
|
|
**Service Disable** (Phase 3 specific):
|
|
```nix
|
|
# In hosts/ops-jrz1.nix
|
|
services.dev-platform.maubot.enable = false; # Then redeploy
|
|
```
|
|
|
|
### Risk Mitigation
|
|
|
|
**Known risks from mautrix-slack deployment** (2025-10-26):
|
|
1. IPv4 vs localhost: Always use 127.0.0.1 (not localhost) in homeserverUrl
|
|
2. Conduwuit database corruption: Have database wipe procedure ready (low risk - fresh maubot install)
|
|
3. Port conflicts: Maubot uses 29316 (unique, no conflicts expected)
|
|
|
|
**Blast radius containment**:
|
|
- Phase 1 fail → Nix syntax errors only, no runtime impact
|
|
- Phase 2 fail → Secrets issue, no services affected
|
|
- Phase 3 fail → Maubot won't start, but Matrix/Slack/Forgejo unaffected (different ports, databases)
|
|
- Phase 4 fail → Bot instance only, delete via UI
|
|
|
|
### Success Criteria Per Phase
|
|
|
|
- **Phase 1**: Build succeeds, nixos-rebuild reports "no services changed"
|
|
- **Phase 2**: /run/secrets/maubot-* files exist with mode 0400, existing services healthy
|
|
- **Phase 3**: systemctl status maubot.service shows "active (running)", management UI accessible via SSH tunnel
|
|
- **Phase 4**: Bot responds to Instagram URL in <5 seconds (SC-001)
|
|
|
|
### Update/Upgrade Procedure (State-Preserving)
|
|
|
|
After initial deployment, future updates must preserve runtime state in `/var/lib/maubot/`:
|
|
- `bot.db` - Service state (bot instances, plugin configurations)
|
|
- `plugins/` - Uploaded .mbp files
|
|
- `config/config.yaml` - Generated runtime config
|
|
|
|
**Typical update scenarios**:
|
|
|
|
**Scenario 1: Module Configuration Change** (e.g., change port, add new option)
|
|
```bash
|
|
# 1. Edit modules/dev-services.nix or hosts/ops-jrz1.nix
|
|
# 2. Deploy
|
|
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
|
|
|
|
# 3. Verify service restarted cleanly
|
|
ssh root@45.77.205.49 'systemctl status maubot.service'
|
|
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
|
|
|
|
# 4. Verify bot instances still running (check management UI)
|
|
# StateDirectory persists across service restarts
|
|
```
|
|
|
|
**Scenario 2: Maubot Version Upgrade** (nixpkgs update)
|
|
```bash
|
|
# 1. Update flake.lock or nixpkgs input
|
|
nix flake update
|
|
|
|
# 2. Review maubot changelog for breaking changes
|
|
# Check: https://github.com/maubot/maubot/releases
|
|
|
|
# 3. Deploy with build test first
|
|
nixos-rebuild build --flake .#ops-jrz1
|
|
|
|
# 4. If build succeeds, deploy
|
|
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
|
|
|
|
# 5. Monitor service restart
|
|
ssh root@45.77.205.49 'journalctl -u maubot.service -f'
|
|
|
|
# 6. Verify bot instances reconnected (check Matrix room for bot presence)
|
|
```
|
|
|
|
**Scenario 3: Plugin Update** (new Instagram bot version)
|
|
```bash
|
|
# Manual via web UI:
|
|
# 1. Upload new .mbp file (Plugins tab → Upload)
|
|
# 2. Maubot detects version change
|
|
# 3. Restart affected bot instances (Instances tab → Stop → Start)
|
|
# 4. Test in private room before production use
|
|
|
|
# No nixos-rebuild needed - plugin is runtime state
|
|
```
|
|
|
|
**Scenario 4: Add New Bot Instance** (e.g., second Instagram bot or new bot type)
|
|
```bash
|
|
# Manual via web UI:
|
|
# 1. Create bot Matrix user (via registration token)
|
|
# 2. Upload plugin if new type (Plugins tab)
|
|
# 3. Create bot instance (Instances tab → Add instance)
|
|
# 4. Configure and enable
|
|
|
|
# No nixos-rebuild needed - bot instances are runtime state
|
|
```
|
|
|
|
**State Preservation Guarantees**:
|
|
- NixOS StateDirectory (`/var/lib/maubot/`) persists across:
|
|
- Service restarts (systemctl restart maubot.service)
|
|
- System reboots
|
|
- Module configuration changes
|
|
- Maubot version upgrades (unless database schema incompatible)
|
|
- StateDirectory only wiped if:
|
|
- Explicitly deleted manually
|
|
- Service definition changes StateDirectory path
|
|
- Major maubot version with incompatible schema (rare, documented in release notes)
|
|
|
|
**Rollback with State**:
|
|
```bash
|
|
# NixOS generation rollback preserves StateDirectory
|
|
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
|
|
|
|
# Bot instances resume with previous configuration
|
|
# Database and plugins unchanged
|
|
```
|
|
|
|
**When to wipe database** (rare, destructive):
|
|
```bash
|
|
# Only if:
|
|
# 1. Database corruption detected
|
|
# 2. Major version migration requires clean slate (check release notes)
|
|
# 3. Testing fresh deployment
|
|
|
|
# Backup first:
|
|
ssh root@45.77.205.49 'tar czf /root/maubot-backup-$(date +%Y%m%d).tar.gz /var/lib/maubot/'
|
|
|
|
# Wipe:
|
|
ssh root@45.77.205.49 'systemctl stop maubot.service'
|
|
ssh root@45.77.205.49 'rm -rf /var/lib/maubot/bot.db'
|
|
ssh root@45.77.205.49 'systemctl start maubot.service'
|
|
|
|
# Reconfigure all bot instances via web UI
|
|
```
|
|
|
|
## Complexity Tracking
|
|
|
|
**No violations** - All constitution principles satisfied.
|
|
|
|
This feature follows established patterns:
|
|
- Declarative infrastructure (NixOS modules)
|
|
- Security first (sops-nix encrypted secrets)
|
|
- Presentable state (comprehensive spec, 7-day validation)
|
|
- Quality over speed (extract proven ops-base module, document alternatives)
|
|
|
|
**No simpler alternatives rejected** - Chosen approach is the simplest that meets requirements while maintaining quality standards.
|