ops-jrz1/specs/003-maubot-integration/tasks.md
Dan 8826d62bcc Add maubot integration and infrastructure updates
- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
2025-12-08 15:55:12 -08:00

16 KiB

Implementation Tasks: Maubot Integration

Feature: 003-maubot-integration Branch: 003-maubot-integration Target: ops-jrz1 VPS (45.77.205.49) Estimated Duration: 2-3 hours deployment + 7 days validation

Task Summary

  • Total Tasks: 47 (updated for incremental deployment strategy)
  • Setup Phase: 4 tasks
  • Foundational Phase: 6 tasks
  • User Story 1 (P1): 20 tasks - Instagram content sharing (MVP)
    • Infrastructure: 3 tasks (T011-T013)
    • Phase 1 deployment: 4 tasks (T013a-d)
    • Phase 2 deployment: 4 tasks (T013e-h)
    • Phase 3 deployment: 6 tasks (T014-T017c)
    • Phase 4 bot config: 6 tasks (T018-T023)
    • Testing: 4 tasks (T024-T027)
  • User Story 2 (P2): 6 tasks - Management interface
  • User Story 3 (P2): 5 tasks - Service reliability
  • User Story 4 (P3): 3 tasks - Additional bot deployment
  • Polish Phase: 3 tasks

MVP Scope: User Story 1 (20 tasks) - validates core value proposition with incremental deployment


Phase 1: Setup (Project Initialization)

Goal: Prepare development environment and extract source modules from ops-base

  • T001 Create feature branch 003-maubot-integration from main
  • T002 Copy maubot.nix module from /home/dan/proj/ops-base/vm-configs/modules/maubot.nix to modules/maubot.nix
  • T003 Copy Instagram bot plugin from /home/dan/proj/sna/sna-instagram-bot.mbp to local working directory
  • T004 Generate maubot secrets (admin password 32 chars, secret key 48 bytes) using openssl rand -base64

Checkpoint: Source files ready for adaptation


Phase 2: Foundational (Blocking Prerequisites)

Goal: Adapt maubot module for ops-jrz1 and configure secrets

Independent Test: Deploy adapted module and verify service starts without errors

Module Adaptation

  • T005 Update module namespace from services.matrix-vm.maubot to services.maubot in modules/maubot.nix
  • T006 Update homeserver URL from http://127.0.0.1:6167 to http://127.0.0.1:8008 in modules/maubot.nix
  • T007 Remove registration_secrets section from config generation in modules/maubot.nix (lines ~140-150, conduwuit doesn't support shared secret)
  • T008 Change config path from /run/maubot/config.yaml to /var/lib/maubot/config/config.yaml in modules/maubot.nix
  • T009 Add LoadCredential removal for registration-secret (keep admin-password and secret-key only) in modules/maubot.nix systemd service section
  • T010 [P] Add maubot secrets to secrets/secrets.yaml (maubot-admin-password, maubot-secret-key) using sops secrets/secrets.yaml

Checkpoint: Module adapted for conduwuit, secrets encrypted


Phase 3: User Story 1 - Instagram Content Sharing to Matrix (Priority: P1)

Goal: Deploy maubot service with Instagram bot and validate content fetching

Independent Test: Post Instagram URL in enabled Matrix room and verify bot responds with image/video/caption within 5 seconds

Why MVP: Core value proposition - brings Instagram content into team communication, validates integration works

Infrastructure Deployment

  • T011 [US1] Add sops secret declarations to hosts/ops-jrz1.nix (sops.secrets.maubot-admin-password, sops.secrets.maubot-secret-key)
  • T012 [US1] Create dev-platform wrapper options in modules/dev-services.nix (services.dev-platform.maubot with enable and port options)
  • T013 [US1] Add dev-platform config block in modules/dev-services.nix (maps to services.maubot with homeserverUrl, serverName, port, secret paths)

Service Deployment - Phase 1: Module Files

  • T013a [US1] Deploy Phase 1 to VPS (modules added, service disabled) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
  • T013b [US1] Verify Phase 1: Check nixos-rebuild output reports "no services changed" or only unrelated service restarts
  • T013c [US1] Verify existing services healthy: ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
  • T013d [US1] Git commit Phase 1 with message "Add maubot module files (service disabled)"

Service Deployment - Phase 2: Secrets

  • T013e [US1] Deploy Phase 2 to VPS (secrets added in Phase 0 and Phase 1, service still disabled) using nixos-rebuild switch
  • T013f [US1] Verify Phase 2: Check secrets decrypted via ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*' (expect 0400 permissions)
  • T013g [US1] Verify existing services healthy (same command as T013c)
  • T013h [US1] Git commit Phase 2 with message "Add maubot secrets (service not enabled)"

Service Deployment - Phase 3: Enable Service

  • T014 [US1] Enable maubot service in hosts/ops-jrz1.nix (services.dev-platform.maubot.enable = true, port = 29316)
  • T015 [US1] Deploy Phase 3 to VPS (enable maubot service) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
  • T016 [US1] Verify service status via ssh root@45.77.205.49 'systemctl status maubot.service' (expect active running)
  • T017 [US1] Check logs for errors via ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
  • T017a [US1] Verify existing services still healthy after maubot deployment (same command as T013c)
  • T017b [US1] Test Slack bridge functionality (post message in Slack, verify appears in Matrix within 5 seconds)
  • T017c [US1] Git commit Phase 3 with message "Enable maubot service (no bots deployed yet)"

Bot Configuration - Phase 4: Manual Deployment

  • T018 [US1] Create SSH tunnel to management interface: ssh -L 29316:localhost:29316 root@45.77.205.49
  • T019 [US1] Login to maubot web UI at http://localhost:29316/_matrix/maubot (username: admin, password from sops secrets)
  • T020 [US1] Create bot Matrix user @instagram-bot:clarun.xyz via conduwuit registration token (Clients tab → Add client → Register)
  • T021 [US1] Upload Instagram plugin sna-instagram-bot.mbp via web UI (Plugins tab → Upload)
  • T022 [US1] Create bot instance instagram-bot-1 (type: sna.instagram, primary_user: @instagram-bot:clarun.xyz, config: {"enabled": true, "max_file_size": 50000000, "room_subscriptions": []})
  • T023 [US1] Invite bot to test Matrix room via /invite @instagram-bot:clarun.xyz

Testing & Validation

  • T024 [US1] Add test room ID to bot config room_subscriptions in maubot web UI
  • T025 [US1] Restart bot instance (Stop → Start in web UI)
  • T026 [US1] Post public Instagram URL in test room and verify bot responds within 5 seconds with image/video/caption (SC-001)
  • T027 [US1] Post Instagram URL in non-subscribed room and verify bot ignores it (FR-006 enforcement)

Acceptance Criteria:

  • Bot responds to Instagram URLs in subscribed rooms only
  • Content fetched within 5 seconds (SC-001)
  • Images, videos, and captions displayed correctly
  • Bot ignores URLs in non-subscribed rooms

MVP Checkpoint: Core functionality working - Instagram content visible in Matrix


Phase 4: User Story 2 - Bot Management Interface (Priority: P2)

Goal: Validate management interface functionality for bot lifecycle operations

Independent Test: Access management UI, create/stop/restart bot instance, view logs and status

Why this priority: Essential for operations but bot works without admin features initially

Management Interface Validation

  • T028 [US2] Access management dashboard via SSH tunnel and verify all bot instances listed with status (instances tab)
  • T029 [US2] Test plugin upload via web UI (upload test .mbp file, verify appears in plugins list)
  • T030 [US2] Test bot instance creation via web UI (create test instance, verify appears online in Matrix within 30 seconds)
  • T031 [US2] Test bot configuration edit (edit room_subscriptions via config JSON, restart instance, verify bot responds only in new rooms)
  • T032 [US2] Test bot stop/start via web UI (click Stop button, verify bot goes offline, click Start, verify reconnects)
  • T033 [US2] View bot logs in UI and verify error messages display with timestamps and severity levels

Acceptance Criteria:

  • Dashboard displays all bot instances with status
  • Plugin upload succeeds and validates
  • Bot lifecycle operations (create/stop/start) work via UI
  • Configuration changes take effect after restart
  • Logs visible with proper formatting

Phase 5: User Story 3 - Bot Framework Service Reliability (Priority: P2)

Goal: Validate auto-start, auto-recovery, and failure handling

Independent Test: Reboot server and verify maubot service and all bot instances resume automatically

Why this priority: Critical for production reliability but can be validated after basic functionality proven

Reliability Testing

  • T034 [US3] Test server reboot recovery (ssh root@45.77.205.49 'reboot', wait 2 minutes, verify service auto-starts via systemctl status maubot)
  • T035 [US3] Test Matrix homeserver restart handling (restart matrix-continuwuity service, verify bot reconnects automatically without manual intervention)
  • T036 [US3] Verify health check timers active (ssh root@45.77.205.49 'systemctl list-timers | grep maubot', expect maubot-health.timer and maubot-health-restart.timer)
  • T037 [US3] Test manual health check (curl http://localhost:29316/_matrix/maubot/v1/version, verify JSON response with version field)
  • T038 [US3] Monitor 7-day uptime for SC-003 validation (99% uptime target, check periodically: uptime -p, journalctl -u maubot | grep -i error)

Acceptance Criteria:

  • Service auto-starts on server boot within 2 minutes
  • Bot instances reconnect after Matrix homeserver restart
  • Health timers operational
  • 99% uptime achieved over 7-day period

Phase 6: User Story 4 - Additional Bot Deployment (Priority: P3)

Goal: Demonstrate platform extensibility by deploying a second bot type

Independent Test: Deploy echo bot or reaction bot from maubot plugin repository and verify independent operation

Why this priority: Future-proofs investment, not required for initial Instagram bot value

Extensibility Validation

  • T039 [US4] Download additional maubot plugin from community repository (e.g., echo bot, reaction bot)
  • T040 [US4] Upload second plugin via management UI and verify validation succeeds
  • T041 [US4] Create second bot instance using new plugin and verify appears in dashboard with type, status, and resource usage
  • T042 [US4] Test SC-002 multi-instance validation (run 3 concurrent bot instances, verify no performance degradation)

Acceptance Criteria:

  • Multiple plugin types supported
  • Dashboard shows all bots with clear differentiation
  • 3+ concurrent instances run without degradation (SC-002)

Phase 7: Polish & Cross-Cutting Concerns

Goal: Complete documentation and prepare for merge

Documentation

  • T043 Update CLAUDE.md with maubot management commands (service status, logs, SSH tunnel, room subscription workflow)
  • T044 Create deployment worklog in docs/worklogs/2025-10-26-maubot-deployment.org documenting session
  • T045 Commit changes and tag release v0.3.0 (message: "Add maubot bot framework with Instagram bot - Implements 003-maubot-integration")

Final Checkpoint: All documentation complete, ready for 7-day validation period


Dependencies & Execution Order

User Story Dependencies

Phase 1 (Setup)
  ↓
Phase 2 (Foundational) ← BLOCKING for all user stories
  ↓
├─→ User Story 1 (P1) ← MVP, no dependencies
├─→ User Story 2 (P2) ← depends on US1 (needs running bot to manage)
├─→ User Story 3 (P2) ← depends on US1 (needs service deployed to test reliability)
└─→ User Story 4 (P3) ← depends on US2 (needs management UI working)
  ↓
Phase 7 (Polish) ← depends on all user stories complete

Critical Path

  1. Setup (T001-T004)
  2. Foundational (T005-T010) - MUST complete before user stories
  3. User Story 1 (T011-T027) - MVP - Deploy first, validate before continuing
  4. Validate MVP success before proceeding to US2/US3/US4
  5. User Stories 2, 3, 4 can proceed in parallel after US1 validates
  6. Polish (T043-T045) after all user stories complete

Parallel Execution Opportunities

Phase 2 (Foundational)

Parallel:

  • T010 can run in parallel with T005-T009 (secrets vs module editing, different files)

Phase 3 (User Story 1)

Parallel:

  • T011, T012, T013 can run in parallel (different files: hosts/ops-jrz1.nix, modules/dev-services.nix)
  • After T015 deploys: T016, T017 can run in parallel (both read-only checks)

Sequential:

  • T014 depends on T011, T012, T013 (needs config in place)
  • T015 depends on T014 (deployment needs config)
  • T018-T027 must run sequentially (UI workflow dependencies)

Phase 4-6 (User Stories 2, 3, 4)

Parallel after US1:

  • US2 tasks (T028-T033) can run in parallel with US3 tasks (T034-T038) if US1 validates
  • US4 tasks (T039-T042) should wait for US2 to confirm management UI working

Implementation Strategy

MVP-First Approach

Week 1: Focus exclusively on User Story 1 (T001-T027)

  • Goal: Working Instagram bot responding to URLs in designated rooms
  • Success: Can demo "post Instagram URL → see content in Matrix"
  • Decision point: If MVP fails, stop and reassess before continuing

Week 2: Expand to User Stories 2 & 3 (T028-T038) in parallel

  • Goal: Operational management and reliability validated
  • Success: Admins can manage bots via UI, service survives restarts

Week 3: Add extensibility (User Story 4) if needed (T039-T042)

  • Goal: Prove multi-bot capability
  • Success: 3 concurrent bot instances running

Week 4+: 7-day validation period

  • Monitor uptime (SC-003: 99% target)
  • Monitor Instagram fetch success rate (SC-006: 95% target)
  • Collect user feedback

Incremental Delivery

Each user story delivers independently testable value:

  • US1: Instagram content in Matrix (core value)
  • US2: Self-service bot management (operational efficiency)
  • US3: Production reliability (reduces maintenance burden)
  • US4: Platform extensibility (future-proofing)

Can stop after any user story and still have working system.


Testing Strategy

Manual QA (no automated tests per plan.md):

  • Each user story has "Independent Test" criteria
  • Acceptance scenarios from spec.md validated manually
  • Success criteria (SC-001 through SC-008) checked via quickstart.md checklist

Validation Period:

  • 7 days operational before merging to main (per constitution Principle III)
  • Monitor metrics: uptime, response time, fetch success rate
  • Document issues in worklog

Risk Mitigation

High-risk tasks:

  • T007: Removing registration_secrets (conduwuit incompatibility) - carefully test bot registration after change
  • T015: Initial deployment (first time on ops-jrz1) - have rollback ready via nixos-rebuild switch --rollback
  • T020: Bot user registration (new auth pattern) - document exact steps in worklog for repeatability

Rollback points:

  • After T010: Can rollback before deployment if module adaptation fails
  • After T015: NixOS generation rollback if service won't start
  • After T027: Can remove bot and redeploy if issues found

Success Metrics

Per User Story:

  • US1: Bot responds to Instagram URLs within 5 seconds (SC-001)
  • US2: Management UI loads within 2 seconds (SC-007)
  • US3: 99% uptime over 7 days (SC-003), auto-recovery within 2 minutes (SC-004)
  • US4: 3 concurrent instances without degradation (SC-002)

Overall:

  • All 8 success criteria validated (SC-001 through SC-008)
  • Constitution check passes (all 4 principles compliant)
  • 7-day stability period completed without critical issues
  • Documentation complete (spec, plan, quickstart, worklog, CLAUDE.md updated)

Estimated Timeline:

  • MVP (US1): 2-3 hours deployment + testing
  • Full Feature (US1-4): 1 week implementation + 1 week validation
  • Production Ready: 2 weeks total (including 7-day stability period)

Next Command: /speckit.implement to begin execution (start with T001)