- maubot.nix: Declarative bot framework with plugin deployment - backup.nix: Local backup service for Matrix/bridge data - sna-instagram-bot: Instagram content bridge plugin - beads: Issue tracking workflow integrated - spec 004: Browser-based dev environment design - nixpkgs bump: Oct 22 → Dec 2 - Fix maubot health check (401 = healthy)
16 KiB
Implementation Tasks: Maubot Integration
Feature: 003-maubot-integration
Branch: 003-maubot-integration
Target: ops-jrz1 VPS (45.77.205.49)
Estimated Duration: 2-3 hours deployment + 7 days validation
Task Summary
- Total Tasks: 47 (updated for incremental deployment strategy)
- Setup Phase: 4 tasks
- Foundational Phase: 6 tasks
- User Story 1 (P1): 20 tasks - Instagram content sharing (MVP)
- Infrastructure: 3 tasks (T011-T013)
- Phase 1 deployment: 4 tasks (T013a-d)
- Phase 2 deployment: 4 tasks (T013e-h)
- Phase 3 deployment: 6 tasks (T014-T017c)
- Phase 4 bot config: 6 tasks (T018-T023)
- Testing: 4 tasks (T024-T027)
- User Story 2 (P2): 6 tasks - Management interface
- User Story 3 (P2): 5 tasks - Service reliability
- User Story 4 (P3): 3 tasks - Additional bot deployment
- Polish Phase: 3 tasks
MVP Scope: User Story 1 (20 tasks) - validates core value proposition with incremental deployment
Phase 1: Setup (Project Initialization)
Goal: Prepare development environment and extract source modules from ops-base
- T001 Create feature branch 003-maubot-integration from main
- T002 Copy maubot.nix module from /home/dan/proj/ops-base/vm-configs/modules/maubot.nix to modules/maubot.nix
- T003 Copy Instagram bot plugin from /home/dan/proj/sna/sna-instagram-bot.mbp to local working directory
- T004 Generate maubot secrets (admin password 32 chars, secret key 48 bytes) using openssl rand -base64
Checkpoint: Source files ready for adaptation
Phase 2: Foundational (Blocking Prerequisites)
Goal: Adapt maubot module for ops-jrz1 and configure secrets
Independent Test: Deploy adapted module and verify service starts without errors
Module Adaptation
- T005 Update module namespace from services.matrix-vm.maubot to services.maubot in modules/maubot.nix
- T006 Update homeserver URL from http://127.0.0.1:6167 to http://127.0.0.1:8008 in modules/maubot.nix
- T007 Remove registration_secrets section from config generation in modules/maubot.nix (lines ~140-150, conduwuit doesn't support shared secret)
- T008 Change config path from /run/maubot/config.yaml to /var/lib/maubot/config/config.yaml in modules/maubot.nix
- T009 Add LoadCredential removal for registration-secret (keep admin-password and secret-key only) in modules/maubot.nix systemd service section
- T010 [P] Add maubot secrets to secrets/secrets.yaml (maubot-admin-password, maubot-secret-key) using sops secrets/secrets.yaml
Checkpoint: Module adapted for conduwuit, secrets encrypted
Phase 3: User Story 1 - Instagram Content Sharing to Matrix (Priority: P1)
Goal: Deploy maubot service with Instagram bot and validate content fetching
Independent Test: Post Instagram URL in enabled Matrix room and verify bot responds with image/video/caption within 5 seconds
Why MVP: Core value proposition - brings Instagram content into team communication, validates integration works
Infrastructure Deployment
- T011 [US1] Add sops secret declarations to hosts/ops-jrz1.nix (sops.secrets.maubot-admin-password, sops.secrets.maubot-secret-key)
- T012 [US1] Create dev-platform wrapper options in modules/dev-services.nix (services.dev-platform.maubot with enable and port options)
- T013 [US1] Add dev-platform config block in modules/dev-services.nix (maps to services.maubot with homeserverUrl, serverName, port, secret paths)
Service Deployment - Phase 1: Module Files
- T013a [US1] Deploy Phase 1 to VPS (modules added, service disabled) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
- T013b [US1] Verify Phase 1: Check nixos-rebuild output reports "no services changed" or only unrelated service restarts
- T013c [US1] Verify existing services healthy: ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
- T013d [US1] Git commit Phase 1 with message "Add maubot module files (service disabled)"
Service Deployment - Phase 2: Secrets
- T013e [US1] Deploy Phase 2 to VPS (secrets added in Phase 0 and Phase 1, service still disabled) using nixos-rebuild switch
- T013f [US1] Verify Phase 2: Check secrets decrypted via ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*' (expect 0400 permissions)
- T013g [US1] Verify existing services healthy (same command as T013c)
- T013h [US1] Git commit Phase 2 with message "Add maubot secrets (service not enabled)"
Service Deployment - Phase 3: Enable Service
- T014 [US1] Enable maubot service in hosts/ops-jrz1.nix (services.dev-platform.maubot.enable = true, port = 29316)
- T015 [US1] Deploy Phase 3 to VPS (enable maubot service) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
- T016 [US1] Verify service status via ssh root@45.77.205.49 'systemctl status maubot.service' (expect active running)
- T017 [US1] Check logs for errors via ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
- T017a [US1] Verify existing services still healthy after maubot deployment (same command as T013c)
- T017b [US1] Test Slack bridge functionality (post message in Slack, verify appears in Matrix within 5 seconds)
- T017c [US1] Git commit Phase 3 with message "Enable maubot service (no bots deployed yet)"
Bot Configuration - Phase 4: Manual Deployment
- T018 [US1] Create SSH tunnel to management interface: ssh -L 29316:localhost:29316 root@45.77.205.49
- T019 [US1] Login to maubot web UI at http://localhost:29316/_matrix/maubot (username: admin, password from sops secrets)
- T020 [US1] Create bot Matrix user @instagram-bot:clarun.xyz via conduwuit registration token (Clients tab → Add client → Register)
- T021 [US1] Upload Instagram plugin sna-instagram-bot.mbp via web UI (Plugins tab → Upload)
- T022 [US1] Create bot instance instagram-bot-1 (type: sna.instagram, primary_user: @instagram-bot:clarun.xyz, config: {"enabled": true, "max_file_size": 50000000, "room_subscriptions": []})
- T023 [US1] Invite bot to test Matrix room via /invite @instagram-bot:clarun.xyz
Testing & Validation
- T024 [US1] Add test room ID to bot config room_subscriptions in maubot web UI
- T025 [US1] Restart bot instance (Stop → Start in web UI)
- T026 [US1] Post public Instagram URL in test room and verify bot responds within 5 seconds with image/video/caption (SC-001)
- T027 [US1] Post Instagram URL in non-subscribed room and verify bot ignores it (FR-006 enforcement)
Acceptance Criteria:
- ✅ Bot responds to Instagram URLs in subscribed rooms only
- ✅ Content fetched within 5 seconds (SC-001)
- ✅ Images, videos, and captions displayed correctly
- ✅ Bot ignores URLs in non-subscribed rooms
MVP Checkpoint: Core functionality working - Instagram content visible in Matrix
Phase 4: User Story 2 - Bot Management Interface (Priority: P2)
Goal: Validate management interface functionality for bot lifecycle operations
Independent Test: Access management UI, create/stop/restart bot instance, view logs and status
Why this priority: Essential for operations but bot works without admin features initially
Management Interface Validation
- T028 [US2] Access management dashboard via SSH tunnel and verify all bot instances listed with status (instances tab)
- T029 [US2] Test plugin upload via web UI (upload test .mbp file, verify appears in plugins list)
- T030 [US2] Test bot instance creation via web UI (create test instance, verify appears online in Matrix within 30 seconds)
- T031 [US2] Test bot configuration edit (edit room_subscriptions via config JSON, restart instance, verify bot responds only in new rooms)
- T032 [US2] Test bot stop/start via web UI (click Stop button, verify bot goes offline, click Start, verify reconnects)
- T033 [US2] View bot logs in UI and verify error messages display with timestamps and severity levels
Acceptance Criteria:
- ✅ Dashboard displays all bot instances with status
- ✅ Plugin upload succeeds and validates
- ✅ Bot lifecycle operations (create/stop/start) work via UI
- ✅ Configuration changes take effect after restart
- ✅ Logs visible with proper formatting
Phase 5: User Story 3 - Bot Framework Service Reliability (Priority: P2)
Goal: Validate auto-start, auto-recovery, and failure handling
Independent Test: Reboot server and verify maubot service and all bot instances resume automatically
Why this priority: Critical for production reliability but can be validated after basic functionality proven
Reliability Testing
- T034 [US3] Test server reboot recovery (ssh root@45.77.205.49 'reboot', wait 2 minutes, verify service auto-starts via systemctl status maubot)
- T035 [US3] Test Matrix homeserver restart handling (restart matrix-continuwuity service, verify bot reconnects automatically without manual intervention)
- T036 [US3] Verify health check timers active (ssh root@45.77.205.49 'systemctl list-timers | grep maubot', expect maubot-health.timer and maubot-health-restart.timer)
- T037 [US3] Test manual health check (curl http://localhost:29316/_matrix/maubot/v1/version, verify JSON response with version field)
- T038 [US3] Monitor 7-day uptime for SC-003 validation (99% uptime target, check periodically: uptime -p, journalctl -u maubot | grep -i error)
Acceptance Criteria:
- ✅ Service auto-starts on server boot within 2 minutes
- ✅ Bot instances reconnect after Matrix homeserver restart
- ✅ Health timers operational
- ✅ 99% uptime achieved over 7-day period
Phase 6: User Story 4 - Additional Bot Deployment (Priority: P3)
Goal: Demonstrate platform extensibility by deploying a second bot type
Independent Test: Deploy echo bot or reaction bot from maubot plugin repository and verify independent operation
Why this priority: Future-proofs investment, not required for initial Instagram bot value
Extensibility Validation
- T039 [US4] Download additional maubot plugin from community repository (e.g., echo bot, reaction bot)
- T040 [US4] Upload second plugin via management UI and verify validation succeeds
- T041 [US4] Create second bot instance using new plugin and verify appears in dashboard with type, status, and resource usage
- T042 [US4] Test SC-002 multi-instance validation (run 3 concurrent bot instances, verify no performance degradation)
Acceptance Criteria:
- ✅ Multiple plugin types supported
- ✅ Dashboard shows all bots with clear differentiation
- ✅ 3+ concurrent instances run without degradation (SC-002)
Phase 7: Polish & Cross-Cutting Concerns
Goal: Complete documentation and prepare for merge
Documentation
- T043 Update CLAUDE.md with maubot management commands (service status, logs, SSH tunnel, room subscription workflow)
- T044 Create deployment worklog in docs/worklogs/2025-10-26-maubot-deployment.org documenting session
- T045 Commit changes and tag release v0.3.0 (message: "Add maubot bot framework with Instagram bot - Implements 003-maubot-integration")
Final Checkpoint: All documentation complete, ready for 7-day validation period
Dependencies & Execution Order
User Story Dependencies
Phase 1 (Setup)
↓
Phase 2 (Foundational) ← BLOCKING for all user stories
↓
├─→ User Story 1 (P1) ← MVP, no dependencies
├─→ User Story 2 (P2) ← depends on US1 (needs running bot to manage)
├─→ User Story 3 (P2) ← depends on US1 (needs service deployed to test reliability)
└─→ User Story 4 (P3) ← depends on US2 (needs management UI working)
↓
Phase 7 (Polish) ← depends on all user stories complete
Critical Path
- Setup (T001-T004)
- Foundational (T005-T010) - MUST complete before user stories
- User Story 1 (T011-T027) - MVP - Deploy first, validate before continuing
- Validate MVP success before proceeding to US2/US3/US4
- User Stories 2, 3, 4 can proceed in parallel after US1 validates
- Polish (T043-T045) after all user stories complete
Parallel Execution Opportunities
Phase 2 (Foundational)
Parallel:
- T010 can run in parallel with T005-T009 (secrets vs module editing, different files)
Phase 3 (User Story 1)
Parallel:
- T011, T012, T013 can run in parallel (different files: hosts/ops-jrz1.nix, modules/dev-services.nix)
- After T015 deploys: T016, T017 can run in parallel (both read-only checks)
Sequential:
- T014 depends on T011, T012, T013 (needs config in place)
- T015 depends on T014 (deployment needs config)
- T018-T027 must run sequentially (UI workflow dependencies)
Phase 4-6 (User Stories 2, 3, 4)
Parallel after US1:
- US2 tasks (T028-T033) can run in parallel with US3 tasks (T034-T038) if US1 validates
- US4 tasks (T039-T042) should wait for US2 to confirm management UI working
Implementation Strategy
MVP-First Approach
Week 1: Focus exclusively on User Story 1 (T001-T027)
- Goal: Working Instagram bot responding to URLs in designated rooms
- Success: Can demo "post Instagram URL → see content in Matrix"
- Decision point: If MVP fails, stop and reassess before continuing
Week 2: Expand to User Stories 2 & 3 (T028-T038) in parallel
- Goal: Operational management and reliability validated
- Success: Admins can manage bots via UI, service survives restarts
Week 3: Add extensibility (User Story 4) if needed (T039-T042)
- Goal: Prove multi-bot capability
- Success: 3 concurrent bot instances running
Week 4+: 7-day validation period
- Monitor uptime (SC-003: 99% target)
- Monitor Instagram fetch success rate (SC-006: 95% target)
- Collect user feedback
Incremental Delivery
Each user story delivers independently testable value:
- US1: Instagram content in Matrix (core value)
- US2: Self-service bot management (operational efficiency)
- US3: Production reliability (reduces maintenance burden)
- US4: Platform extensibility (future-proofing)
Can stop after any user story and still have working system.
Testing Strategy
Manual QA (no automated tests per plan.md):
- Each user story has "Independent Test" criteria
- Acceptance scenarios from spec.md validated manually
- Success criteria (SC-001 through SC-008) checked via quickstart.md checklist
Validation Period:
- 7 days operational before merging to main (per constitution Principle III)
- Monitor metrics: uptime, response time, fetch success rate
- Document issues in worklog
Risk Mitigation
High-risk tasks:
- T007: Removing registration_secrets (conduwuit incompatibility) - carefully test bot registration after change
- T015: Initial deployment (first time on ops-jrz1) - have rollback ready via nixos-rebuild switch --rollback
- T020: Bot user registration (new auth pattern) - document exact steps in worklog for repeatability
Rollback points:
- After T010: Can rollback before deployment if module adaptation fails
- After T015: NixOS generation rollback if service won't start
- After T027: Can remove bot and redeploy if issues found
Success Metrics
Per User Story:
- US1: Bot responds to Instagram URLs within 5 seconds (SC-001)
- US2: Management UI loads within 2 seconds (SC-007)
- US3: 99% uptime over 7 days (SC-003), auto-recovery within 2 minutes (SC-004)
- US4: 3 concurrent instances without degradation (SC-002)
Overall:
- All 8 success criteria validated (SC-001 through SC-008)
- Constitution check passes (all 4 principles compliant)
- 7-day stability period completed without critical issues
- Documentation complete (spec, plan, quickstart, worklog, CLAUDE.md updated)
Estimated Timeline:
- MVP (US1): 2-3 hours deployment + testing
- Full Feature (US1-4): 1 week implementation + 1 week validation
- Production Ready: 2 weeks total (including 7-day stability period)
Next Command: /speckit.implement to begin execution (start with T001)