ops-jrz1/specs/003-maubot-integration/quickstart.md
Dan 8826d62bcc Add maubot integration and infrastructure updates
- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
2025-12-08 15:55:12 -08:00

668 lines
16 KiB
Markdown

# Quickstart: Maubot Integration Deployment
**Feature**: 003-maubot-integration
**Target**: ops-jrz1 VPS (45.77.205.49)
**Estimated time**: 2-3 hours
## Prerequisites
- [x] ops-jrz1 VPS operational with conduwuit Matrix homeserver
- [x] SSH access to VPS as root
- [x] sops-nix configured with server SSH host key
- [x] Local machine with Nix/NixOS
- [ ] Instagram bot .mbp file available (`/home/dan/proj/sna/sna-instagram-bot.mbp`)
---
## Phase 0: Secrets Preparation
### 1. Generate Maubot Secrets
```bash
# Generate admin password (32 characters)
MAUBOT_ADMIN_PW=$(openssl rand -base64 32)
# Generate secret key (48 bytes base64-encoded)
MAUBOT_SECRET=$(openssl rand -base64 48)
echo "Admin Password: $MAUBOT_ADMIN_PW"
echo "Secret Key: $MAUBOT_SECRET"
```
### 2. Add Secrets to sops-nix
```bash
cd /home/dan/proj/ops-jrz1
# Edit encrypted secrets
sops secrets/secrets.yaml
```
Add these entries:
```yaml
maubot-admin-password: "<paste MAUBOT_ADMIN_PW>"
maubot-secret-key: "<paste MAUBOT_SECRET>"
# matrix-registration-token already exists - reuse for bot creation
```
### 3. Declare Secrets in NixOS Config
Edit `hosts/ops-jrz1.nix`:
```nix
sops.secrets.maubot-admin-password = { mode = "0400"; };
sops.secrets.maubot-secret-key = { mode = "0400"; };
```
---
## Phase 1: Module Extraction and Adaptation
### 1. Extract maubot.nix from ops-base
```bash
cd /home/dan/proj/ops-jrz1
# Copy module from ops-base
cp /home/dan/proj/ops-base/vm-configs/modules/maubot.nix \
modules/maubot.nix
```
### 2. Adapt Module Namespace
Edit `modules/maubot.nix`:
**Change module namespace**:
```nix
# From:
options.services.matrix-vm.maubot = { ... };
# To:
options.services.maubot = { ... };
```
**Update homeserver URL**:
```nix
# From:
homeserverUrl = mkOption {
default = "http://127.0.0.1:6167"; # ops-base continuwuity port
};
# To:
homeserverUrl = mkOption {
default = "http://127.0.0.1:8008"; # ops-jrz1 conduwuit port
};
```
**Remove registration_secrets** (conduwuit doesn't support this):
```nix
# REMOVE this section from config generation (around line 140-150):
# registration_secrets:
# ${cfg.serverName}:
# url: ${cfg.homeserverUrl}
# secret: REPLACE_REGISTRATION_SECRET
```
**Update StateDirectory** (move from /run to /var/lib):
```nix
# Change config path from:
/run/maubot/config.yaml
# To:
/var/lib/maubot/config/config.yaml
```
### 3. Add dev-platform Wrapper
Edit `modules/dev-services.nix`:
Add options section:
```nix
options.services.dev-platform.maubot = {
enable = mkEnableOption "maubot bot framework";
port = mkOption {
type = types.port;
default = 29316;
description = "Management interface port";
};
};
```
Add config section:
```nix
config = mkIf cfg.maubot.enable {
services.maubot = {
enable = true;
homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
serverName = cfg.matrix.serverName;
port = cfg.maubot.port;
adminPasswordFile = config.sops.secrets.maubot-admin-password.path;
secretKeyFile = config.sops.secrets.maubot-secret-key.path;
};
};
```
---
## Phase 2: Incremental Deployment (Live Server)
⚠️ **IMPORTANT**: ops-jrz1 is a live production server with critical services:
- conduwuit Matrix homeserver - All Matrix functionality
- mautrix-slack bridge - ~50 Slack channels syncing
- PostgreSQL, Forgejo, nginx - Core infrastructure
Deploy incrementally with validation checkpoints. Each phase creates a git commit as a rollback point.
---
### Phase 2.1: Module Files Only (No-Op Deployment)
**Goal**: Add maubot module without starting any services
**Steps**:
1. Verify services.dev-platform.maubot.enable is NOT set in `hosts/ops-jrz1.nix`
2. Deploy:
```bash
cd /home/dan/proj/ops-jrz1
nixos-rebuild switch --flake .#ops-jrz1 \
--target-host root@45.77.205.49 \
--build-host localhost
```
**Validation**:
```bash
# Should report "no services changed" or only unrelated restarts
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
# Expected: Both active (running), no recent restarts
```
**Git checkpoint**:
```bash
git add modules/maubot.nix modules/dev-services.nix
git commit -m "Add maubot module files (service disabled)"
```
**Rollback if needed**:
```bash
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
```
---
### Phase 2.2: Secrets Preparation
**Goal**: Add secrets without starting service
**Steps**:
1. Verify services.dev-platform.maubot.enable is still NOT set
2. Deploy (secrets added in Phase 0 and Phase 1 config):
```bash
nixos-rebuild switch --flake .#ops-jrz1 \
--target-host root@45.77.205.49 \
--build-host localhost
```
**Validation**:
```bash
# Verify secrets decrypted
ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*'
# Expected:
# -r-------- 1 root root ... /run/secrets/maubot-admin-password
# -r-------- 1 root root ... /run/secrets/maubot-secret-key
# Verify existing services healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
```
**Git checkpoint**:
```bash
git add hosts/ops-jrz1.nix secrets/secrets.yaml
git commit -m "Add maubot secrets (service not enabled)"
```
---
### Phase 2.3: Enable Maubot Service
**Goal**: Start maubot service, verify isolation from existing services
**Steps**:
1. Enable in `hosts/ops-jrz1.nix`:
```nix
services.dev-platform.maubot = {
enable = true;
port = 29316;
};
```
2. Deploy:
```bash
nixos-rebuild switch --flake .#ops-jrz1 \
--target-host root@45.77.205.49 \
--build-host localhost
```
**Validation**:
```bash
# 1. Verify maubot service started
ssh root@45.77.205.49 'systemctl status maubot.service'
# Expected: active (running)
# 2. Check logs for errors
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
# Look for: "Starting maubot on port 29316", "Connected to homeserver"
# No ERROR or CRITICAL messages
# 3. Verify existing services still healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
# 4. Test Slack bridge (critical validation)
# Post message in Slack → verify appears in Matrix within 5 seconds
# 5. Test management UI access
ssh -L 29316:localhost:29316 root@45.77.205.49
# In browser: http://localhost:29316/_matrix/maubot
# Should load login page
```
**Git checkpoint**:
```bash
git add hosts/ops-jrz1.nix
git commit -m "Enable maubot service (no bots deployed yet)"
```
**Rollback if needed**:
```bash
# Option 1: NixOS generation rollback (fastest)
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
# Option 2: Disable service (if you want to keep other changes)
# Edit hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = false
# Then redeploy
```
---
### Rollback Procedures
**If ANY deployment phase fails or breaks existing services**:
1. **Immediate rollback** (restores last working state):
```bash
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
```
2. **Verify services restored**:
```bash
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
# Test Slack bridge: post message, verify in Matrix
```
3. **Investigate issue** before retrying:
```bash
# Check what changed
ssh root@45.77.205.49 'journalctl --since "10 minutes ago" | grep -E "ERR|CRIT|FTL"'
# Review deployment logs
ssh root@45.77.205.49 'journalctl -u nixos-rebuild -n 100'
```
**Git-based rollback** (if committed but want to revert):
```bash
git log --oneline -5 # Find commit to revert
git revert <commit-hash>
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
```
---
### Phase 2.4: Deployment Success Criteria
Before proceeding to bot configuration, verify:
- [ ] maubot.service is active (running)
- [ ] Management UI loads at http://localhost:29316/_matrix/maubot (via SSH tunnel)
- [ ] No errors in maubot service logs
- [ ] All existing services healthy (Matrix, Slack bridge, Forgejo, PostgreSQL, nginx)
- [ ] Slack bridge functional (test message flow Slack ↔ Matrix)
- [ ] Phase 2.3 git commit created
If all criteria pass, proceed to Phase 3 (Bot Registration). Otherwise, rollback and investigate.
---
## Phase 3: Bot Registration and Configuration
### 1. Access Management Interface
```bash
# Create SSH tunnel
ssh -L 29316:localhost:29316 root@45.77.205.49
# In browser:
# Navigate to: http://localhost:29316/_matrix/maubot
```
### 2. Login to Maubot
- Username: `admin`
- Password: `<from sops secrets>`
### 3. Create Bot Matrix User
**Option A: Registration Token** (recommended):
1. Configure conduwuit registration token (if not already set)
2. In Maubot UI: Clients → Add client
3. Enter Matrix user ID: `@instagram-bot:clarun.xyz`
4. Select "Register" and provide registration token
5. Bot user created automatically
**Option B: Admin Room Commands**:
1. Access Matrix homeserver admin room
2. Run: `!admin users create-user instagram-bot`
3. Copy generated password
4. In Maubot UI: Create client with username/password
### 4. Upload Instagram Plugin
```bash
# Copy plugin to VPS
scp /home/dan/proj/sna/sna-instagram-bot.mbp \
root@45.77.205.49:/tmp/
# Or upload via web UI:
# - Plugins tab → Upload
# - Select sna-instagram-bot.mbp
```
### 5. Create Bot Instance
In Maubot UI:
1. Instances tab → Add instance
2. **ID**: `instagram-bot-1`
3. **Type**: `sna.instagram`
4. **Primary user**: Select `@instagram-bot:clarun.xyz`
5. **Enabled**: ✓
6. **Config**:
```json
{
"enabled": true,
"max_file_size": 50000000,
"room_subscriptions": []
}
```
7. Save
### 6. Configure Room Subscriptions
**Get Matrix room ID**:
```bash
# In Element or Matrix client:
# Room Settings → Advanced → Internal Room ID
# Example: !abc123def:clarun.xyz
```
**Add to bot config** (per FR-010):
Edit bot instance config in Maubot UI:
```json
{
"enabled": true,
"max_file_size": 50000000,
"room_subscriptions": [
"!abc123def:clarun.xyz"
]
}
```
**Restart bot instance**: Stop → Start in Maubot UI
---
## Phase 4: Testing
### 1. Invite Bot to Test Room
In Matrix client:
```
/invite @instagram-bot:clarun.xyz
```
### 2. Test Instagram URL Fetching
Post in the room:
```
https://www.instagram.com/p/EXAMPLE123/
```
**Expected behavior**:
- Bot responds within 5 seconds (SC-001)
- Image/video appears in room
- Caption and metadata posted as text message
### 3. Test Room Subscription Enforcement
Post Instagram URL in a room NOT in `room_subscriptions`:
**Expected behavior**:
- Bot ignores URL (no response)
### 4. Monitor Logs
```bash
ssh root@45.77.205.49 'journalctl -u maubot.service -f --since "5 minutes ago"'
# Check for:
# - Instagram URL detection
# - yt-dlp extraction
# - Matrix upload
# - Any ERROR/CRITICAL logs
```
---
## Phase 5: Health Monitoring
### 1. Verify Health Check Timer
```bash
ssh root@45.77.205.49 'systemctl list-timers | grep maubot'
# Expected:
# maubot-health.timer (runs every 5 minutes)
# maubot-health-restart.timer (runs every 10 minutes)
```
### 2. Manual Health Check
```bash
ssh root@45.77.205.49 'curl -s http://localhost:29316/_matrix/maubot/v1/version | jq .'
# Expected output:
# {
# "version": "0.5.2",
# "server": "maubot"
# }
```
### 3. Check Bot Instance Status
In Maubot UI:
- Instances tab
- Verify `instagram-bot-1` shows green "Running" status
- Check "Last Sync" timestamp (should be <10 minutes)
---
## Troubleshooting
### Bot Not Responding to Instagram URLs
**Check**:
1. Room ID is in `room_subscriptions` config
2. Bot has joined the room (`/invite @instagram-bot:clarun.xyz`)
3. URL is public Instagram post (not private/story)
4. Logs show URL detection: `journalctl -u maubot.service | grep -i instagram`
**Fix**:
- Update room_subscriptions config
- Restart bot instance in Maubot UI
### Service Won't Start
**Check**:
```bash
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
```
**Common issues**:
- Port 29316 already in use Check `ss -tlnp | grep 29316`
- Database permissions Check `/var/lib/maubot/` ownership
- Secrets not decrypted Check `/run/secrets/maubot-*` exists
### Bot Can't Connect to Matrix
**Check**:
1. conduwuit is running: `systemctl status matrix-continuwuity`
2. Homeserver URL is correct: `http://127.0.0.1:8008` (IPv4)
3. Bot Matrix user exists and has valid access token
**Fix**:
- Recreate bot client in Maubot UI
- Check Matrix homeserver logs: `journalctl -u matrix-continuwuity | grep instagram`
### Instagram Content Fetch Fails
**Check logs**:
```bash
ssh root@45.77.205.49 'journalctl -u maubot.service | grep -A 10 "yt-dlp"'
```
**Common issues**:
- Instagram rate limiting (429 error) Wait 30 minutes, reduce request frequency
- Private post Can't fetch (expected behavior)
- yt-dlp outdated Update nixpkgs, redeploy
---
## Rollback Procedure
If deployment fails:
```bash
# List NixOS generations
ssh root@45.77.205.49 'nixos-rebuild list-generations'
# Rollback to previous generation
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
# Verify services restored
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
```
---
## Success Criteria Validation
Verify all success criteria before marking feature complete:
- [ ] **SC-001**: Instagram bot responds within 5 seconds
- [ ] **SC-002**: System supports 3 concurrent bot instances (test by creating 2 more instances)
- [ ] **SC-003**: Service maintains 99% uptime over 7 days
- [ ] **SC-004**: Auto-recovery within 2 minutes after restart
- [ ] **SC-005**: New bot deployment completes in <10 minutes
- [ ] **SC-006**: 95% success rate for public Instagram URLs
- [ ] **SC-007**: Management interface loads in <2 seconds
- [ ] **SC-008**: Server reboot without data loss (test with `reboot`)
**Testing period**: 7 days operational before merging to main (per constitution Principle III)
---
## Post-Deployment
### 1. Update Documentation
```bash
# Update CLAUDE.md with maubot commands
# Example section to add:
### Maubot Management
- Management UI: http://localhost:29316/_matrix/maubot (via SSH tunnel)
- Bot registration: Use conduwuit registration token
- Room subscriptions: Edit config JSON, restart instance
- Logs: journalctl -u maubot.service -f
```
### 2. Commit and Tag
```bash
git add modules/maubot.nix modules/dev-services.nix hosts/ops-jrz1.nix
git commit -m "Add maubot bot framework with Instagram bot
- Extract and adapt maubot.nix from ops-base
- Configure for conduwuit (registration token auth)
- Deploy Instagram bot with room-based activation
- Add health monitoring timers
Implements feature 003-maubot-integration
"
git tag -a v0.3.0 -m "Release v0.3.0: Maubot Integration
Features:
- Maubot bot framework service
- Instagram content fetcher bot
- Room-based bot activation
- Management web interface (localhost only)
- Health monitoring and auto-recovery
Success criteria validated (SC-001 through SC-008)
Constitution compliance verified
"
git push origin main --tags
```
### 3. Create Worklog
Document the deployment session:
```bash
# Create worklog
docs/worklogs/2025-10-26-maubot-deployment.org
```
---
## Reference Files
**Module locations**:
- `/home/dan/proj/ops-jrz1/modules/maubot.nix` (service module)
- `/home/dan/proj/ops-jrz1/modules/dev-services.nix` (high-level wrapper)
**Secrets**:
- `/home/dan/proj/ops-jrz1/secrets/secrets.yaml` (encrypted)
- `/run/secrets/maubot-*` (runtime, on VPS)
**Runtime state** (on VPS):
- `/var/lib/maubot/bot.db` (SQLite database)
- `/var/lib/maubot/config/config.yaml` (generated config)
- `/var/lib/maubot/plugins/` (uploaded .mbp files)
**Source reference**:
- ops-base module: `/home/dan/proj/ops-base/vm-configs/modules/maubot.nix`
- Instagram plugin: `/home/dan/proj/sna/sna-instagram-bot.mbp`
- ops-base docs: `/home/dan/proj/ops-base/docs/maubot-*.md`
---
**Deployment time estimate**: 2-3 hours (including testing and validation)
**Status**: Ready for Phase 2 (implementation)