ops-jrz1/specs/003-maubot-integration/quickstart.md
Dan 8826d62bcc Add maubot integration and infrastructure updates
- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
2025-12-08 15:55:12 -08:00

16 KiB

Quickstart: Maubot Integration Deployment

Feature: 003-maubot-integration Target: ops-jrz1 VPS (45.77.205.49) Estimated time: 2-3 hours

Prerequisites

  • ops-jrz1 VPS operational with conduwuit Matrix homeserver
  • SSH access to VPS as root
  • sops-nix configured with server SSH host key
  • Local machine with Nix/NixOS
  • Instagram bot .mbp file available (/home/dan/proj/sna/sna-instagram-bot.mbp)

Phase 0: Secrets Preparation

1. Generate Maubot Secrets

# Generate admin password (32 characters)
MAUBOT_ADMIN_PW=$(openssl rand -base64 32)

# Generate secret key (48 bytes base64-encoded)
MAUBOT_SECRET=$(openssl rand -base64 48)

echo "Admin Password: $MAUBOT_ADMIN_PW"
echo "Secret Key: $MAUBOT_SECRET"

2. Add Secrets to sops-nix

cd /home/dan/proj/ops-jrz1

# Edit encrypted secrets
sops secrets/secrets.yaml

Add these entries:

maubot-admin-password: "<paste MAUBOT_ADMIN_PW>"
maubot-secret-key: "<paste MAUBOT_SECRET>"
# matrix-registration-token already exists - reuse for bot creation

3. Declare Secrets in NixOS Config

Edit hosts/ops-jrz1.nix:

sops.secrets.maubot-admin-password = { mode = "0400"; };
sops.secrets.maubot-secret-key = { mode = "0400"; };

Phase 1: Module Extraction and Adaptation

1. Extract maubot.nix from ops-base

cd /home/dan/proj/ops-jrz1

# Copy module from ops-base
cp /home/dan/proj/ops-base/vm-configs/modules/maubot.nix \
   modules/maubot.nix

2. Adapt Module Namespace

Edit modules/maubot.nix:

Change module namespace:

# From:
options.services.matrix-vm.maubot = { ... };

# To:
options.services.maubot = { ... };

Update homeserver URL:

# From:
homeserverUrl = mkOption {
  default = "http://127.0.0.1:6167";  # ops-base continuwuity port
};

# To:
homeserverUrl = mkOption {
  default = "http://127.0.0.1:8008";  # ops-jrz1 conduwuit port
};

Remove registration_secrets (conduwuit doesn't support this):

# REMOVE this section from config generation (around line 140-150):
# registration_secrets:
#   ${cfg.serverName}:
#     url: ${cfg.homeserverUrl}
#     secret: REPLACE_REGISTRATION_SECRET

Update StateDirectory (move from /run to /var/lib):

# Change config path from:
/run/maubot/config.yaml

# To:
/var/lib/maubot/config/config.yaml

3. Add dev-platform Wrapper

Edit modules/dev-services.nix:

Add options section:

options.services.dev-platform.maubot = {
  enable = mkEnableOption "maubot bot framework";

  port = mkOption {
    type = types.port;
    default = 29316;
    description = "Management interface port";
  };
};

Add config section:

config = mkIf cfg.maubot.enable {
  services.maubot = {
    enable = true;
    homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
    serverName = cfg.matrix.serverName;
    port = cfg.maubot.port;

    adminPasswordFile = config.sops.secrets.maubot-admin-password.path;
    secretKeyFile = config.sops.secrets.maubot-secret-key.path;
   };
};

Phase 2: Incremental Deployment (Live Server)

⚠️ IMPORTANT: ops-jrz1 is a live production server with critical services:

  • conduwuit Matrix homeserver - All Matrix functionality
  • mautrix-slack bridge - ~50 Slack channels syncing
  • PostgreSQL, Forgejo, nginx - Core infrastructure

Deploy incrementally with validation checkpoints. Each phase creates a git commit as a rollback point.


Phase 2.1: Module Files Only (No-Op Deployment)

Goal: Add maubot module without starting any services

Steps:

  1. Verify services.dev-platform.maubot.enable is NOT set in hosts/ops-jrz1.nix

  2. Deploy:

cd /home/dan/proj/ops-jrz1
nixos-rebuild switch --flake .#ops-jrz1 \
  --target-host root@45.77.205.49 \
  --build-host localhost

Validation:

# Should report "no services changed" or only unrelated restarts
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
# Expected: Both active (running), no recent restarts

Git checkpoint:

git add modules/maubot.nix modules/dev-services.nix
git commit -m "Add maubot module files (service disabled)"

Rollback if needed:

ssh root@45.77.205.49 'nixos-rebuild switch --rollback'

Phase 2.2: Secrets Preparation

Goal: Add secrets without starting service

Steps:

  1. Verify services.dev-platform.maubot.enable is still NOT set

  2. Deploy (secrets added in Phase 0 and Phase 1 config):

nixos-rebuild switch --flake .#ops-jrz1 \
  --target-host root@45.77.205.49 \
  --build-host localhost

Validation:

# Verify secrets decrypted
ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*'
# Expected:
# -r-------- 1 root root ... /run/secrets/maubot-admin-password
# -r-------- 1 root root ... /run/secrets/maubot-secret-key

# Verify existing services healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'

Git checkpoint:

git add hosts/ops-jrz1.nix secrets/secrets.yaml
git commit -m "Add maubot secrets (service not enabled)"

Phase 2.3: Enable Maubot Service

Goal: Start maubot service, verify isolation from existing services

Steps:

  1. Enable in hosts/ops-jrz1.nix:
services.dev-platform.maubot = {
  enable = true;
  port = 29316;
};
  1. Deploy:
nixos-rebuild switch --flake .#ops-jrz1 \
  --target-host root@45.77.205.49 \
  --build-host localhost

Validation:

# 1. Verify maubot service started
ssh root@45.77.205.49 'systemctl status maubot.service'
# Expected: active (running)

# 2. Check logs for errors
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
# Look for: "Starting maubot on port 29316", "Connected to homeserver"
# No ERROR or CRITICAL messages

# 3. Verify existing services still healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'

# 4. Test Slack bridge (critical validation)
# Post message in Slack → verify appears in Matrix within 5 seconds

# 5. Test management UI access
ssh -L 29316:localhost:29316 root@45.77.205.49
# In browser: http://localhost:29316/_matrix/maubot
# Should load login page

Git checkpoint:

git add hosts/ops-jrz1.nix
git commit -m "Enable maubot service (no bots deployed yet)"

Rollback if needed:

# Option 1: NixOS generation rollback (fastest)
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'

# Option 2: Disable service (if you want to keep other changes)
# Edit hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = false
# Then redeploy

Rollback Procedures

If ANY deployment phase fails or breaks existing services:

  1. Immediate rollback (restores last working state):
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
  1. Verify services restored:
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
# Test Slack bridge: post message, verify in Matrix
  1. Investigate issue before retrying:
# Check what changed
ssh root@45.77.205.49 'journalctl --since "10 minutes ago" | grep -E "ERR|CRIT|FTL"'

# Review deployment logs
ssh root@45.77.205.49 'journalctl -u nixos-rebuild -n 100'

Git-based rollback (if committed but want to revert):

git log --oneline -5  # Find commit to revert
git revert <commit-hash>
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost

Phase 2.4: Deployment Success Criteria

Before proceeding to bot configuration, verify:

  • maubot.service is active (running)
  • Management UI loads at http://localhost:29316/_matrix/maubot (via SSH tunnel)
  • No errors in maubot service logs
  • All existing services healthy (Matrix, Slack bridge, Forgejo, PostgreSQL, nginx)
  • Slack bridge functional (test message flow Slack ↔ Matrix)
  • Phase 2.3 git commit created

If all criteria pass, proceed to Phase 3 (Bot Registration). Otherwise, rollback and investigate.


Phase 3: Bot Registration and Configuration

1. Access Management Interface

# Create SSH tunnel
ssh -L 29316:localhost:29316 root@45.77.205.49

# In browser:
# Navigate to: http://localhost:29316/_matrix/maubot

2. Login to Maubot

  • Username: admin
  • Password: <from sops secrets>

3. Create Bot Matrix User

Option A: Registration Token (recommended):

  1. Configure conduwuit registration token (if not already set)
  2. In Maubot UI: Clients → Add client
  3. Enter Matrix user ID: @instagram-bot:clarun.xyz
  4. Select "Register" and provide registration token
  5. Bot user created automatically

Option B: Admin Room Commands:

  1. Access Matrix homeserver admin room
  2. Run: !admin users create-user instagram-bot
  3. Copy generated password
  4. In Maubot UI: Create client with username/password

4. Upload Instagram Plugin

# Copy plugin to VPS
scp /home/dan/proj/sna/sna-instagram-bot.mbp \
    root@45.77.205.49:/tmp/

# Or upload via web UI:
# - Plugins tab → Upload
# - Select sna-instagram-bot.mbp

5. Create Bot Instance

In Maubot UI:

  1. Instances tab → Add instance
  2. ID: instagram-bot-1
  3. Type: sna.instagram
  4. Primary user: Select @instagram-bot:clarun.xyz
  5. Enabled: ✓
  6. Config:
{
  "enabled": true,
  "max_file_size": 50000000,
  "room_subscriptions": []
}
  1. Save

6. Configure Room Subscriptions

Get Matrix room ID:

# In Element or Matrix client:
# Room Settings → Advanced → Internal Room ID
# Example: !abc123def:clarun.xyz

Add to bot config (per FR-010):

Edit bot instance config in Maubot UI:

{
  "enabled": true,
  "max_file_size": 50000000,
  "room_subscriptions": [
    "!abc123def:clarun.xyz"
  ]
}

Restart bot instance: Stop → Start in Maubot UI


Phase 4: Testing

1. Invite Bot to Test Room

In Matrix client:

/invite @instagram-bot:clarun.xyz

2. Test Instagram URL Fetching

Post in the room:

https://www.instagram.com/p/EXAMPLE123/

Expected behavior:

  • Bot responds within 5 seconds (SC-001)
  • Image/video appears in room
  • Caption and metadata posted as text message

3. Test Room Subscription Enforcement

Post Instagram URL in a room NOT in room_subscriptions:

Expected behavior:

  • Bot ignores URL (no response)

4. Monitor Logs

ssh root@45.77.205.49 'journalctl -u maubot.service -f --since "5 minutes ago"'

# Check for:
# - Instagram URL detection
# - yt-dlp extraction
# - Matrix upload
# - Any ERROR/CRITICAL logs

Phase 5: Health Monitoring

1. Verify Health Check Timer

ssh root@45.77.205.49 'systemctl list-timers | grep maubot'

# Expected:
# maubot-health.timer (runs every 5 minutes)
# maubot-health-restart.timer (runs every 10 minutes)

2. Manual Health Check

ssh root@45.77.205.49 'curl -s http://localhost:29316/_matrix/maubot/v1/version | jq .'

# Expected output:
# {
#   "version": "0.5.2",
#   "server": "maubot"
# }

3. Check Bot Instance Status

In Maubot UI:

  • Instances tab
  • Verify instagram-bot-1 shows green "Running" status
  • Check "Last Sync" timestamp (should be <10 minutes)

Troubleshooting

Bot Not Responding to Instagram URLs

Check:

  1. Room ID is in room_subscriptions config
  2. Bot has joined the room (/invite @instagram-bot:clarun.xyz)
  3. URL is public Instagram post (not private/story)
  4. Logs show URL detection: journalctl -u maubot.service | grep -i instagram

Fix:

  • Update room_subscriptions config
  • Restart bot instance in Maubot UI

Service Won't Start

Check:

ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'

Common issues:

  • Port 29316 already in use → Check ss -tlnp | grep 29316
  • Database permissions → Check /var/lib/maubot/ ownership
  • Secrets not decrypted → Check /run/secrets/maubot-* exists

Bot Can't Connect to Matrix

Check:

  1. conduwuit is running: systemctl status matrix-continuwuity
  2. Homeserver URL is correct: http://127.0.0.1:8008 (IPv4)
  3. Bot Matrix user exists and has valid access token

Fix:

  • Recreate bot client in Maubot UI
  • Check Matrix homeserver logs: journalctl -u matrix-continuwuity | grep instagram

Instagram Content Fetch Fails

Check logs:

ssh root@45.77.205.49 'journalctl -u maubot.service | grep -A 10 "yt-dlp"'

Common issues:

  • Instagram rate limiting (429 error) → Wait 30 minutes, reduce request frequency
  • Private post → Can't fetch (expected behavior)
  • yt-dlp outdated → Update nixpkgs, redeploy

Rollback Procedure

If deployment fails:

# List NixOS generations
ssh root@45.77.205.49 'nixos-rebuild list-generations'

# Rollback to previous generation
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'

# Verify services restored
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'

Success Criteria Validation

Verify all success criteria before marking feature complete:

  • SC-001: Instagram bot responds within 5 seconds
  • SC-002: System supports 3 concurrent bot instances (test by creating 2 more instances)
  • SC-003: Service maintains 99% uptime over 7 days
  • SC-004: Auto-recovery within 2 minutes after restart
  • SC-005: New bot deployment completes in <10 minutes
  • SC-006: 95% success rate for public Instagram URLs
  • SC-007: Management interface loads in <2 seconds
  • SC-008: Server reboot without data loss (test with reboot)

Testing period: 7 days operational before merging to main (per constitution Principle III)


Post-Deployment

1. Update Documentation

# Update CLAUDE.md with maubot commands
# Example section to add:

### Maubot Management
- Management UI: http://localhost:29316/_matrix/maubot (via SSH tunnel)
- Bot registration: Use conduwuit registration token
- Room subscriptions: Edit config JSON, restart instance
- Logs: journalctl -u maubot.service -f

2. Commit and Tag

git add modules/maubot.nix modules/dev-services.nix hosts/ops-jrz1.nix
git commit -m "Add maubot bot framework with Instagram bot

- Extract and adapt maubot.nix from ops-base
- Configure for conduwuit (registration token auth)
- Deploy Instagram bot with room-based activation
- Add health monitoring timers

Implements feature 003-maubot-integration
"

git tag -a v0.3.0 -m "Release v0.3.0: Maubot Integration

Features:
- Maubot bot framework service
- Instagram content fetcher bot
- Room-based bot activation
- Management web interface (localhost only)
- Health monitoring and auto-recovery

Success criteria validated (SC-001 through SC-008)
Constitution compliance verified
"

git push origin main --tags

3. Create Worklog

Document the deployment session:

# Create worklog
docs/worklogs/2025-10-26-maubot-deployment.org

Reference Files

Module locations:

  • /home/dan/proj/ops-jrz1/modules/maubot.nix (service module)
  • /home/dan/proj/ops-jrz1/modules/dev-services.nix (high-level wrapper)

Secrets:

  • /home/dan/proj/ops-jrz1/secrets/secrets.yaml (encrypted)
  • /run/secrets/maubot-* (runtime, on VPS)

Runtime state (on VPS):

  • /var/lib/maubot/bot.db (SQLite database)
  • /var/lib/maubot/config/config.yaml (generated config)
  • /var/lib/maubot/plugins/ (uploaded .mbp files)

Source reference:

  • ops-base module: /home/dan/proj/ops-base/vm-configs/modules/maubot.nix
  • Instagram plugin: /home/dan/proj/sna/sna-instagram-bot.mbp
  • ops-base docs: /home/dan/proj/ops-base/docs/maubot-*.md

Deployment time estimate: 2-3 hours (including testing and validation) Status: Ready for Phase 2 (implementation)