Add maubot integration and infrastructure updates

- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
This commit is contained in:
Dan 2025-12-08 15:55:12 -08:00
parent f25a8b06ef
commit 8826d62bcc
26 changed files with 3685 additions and 8 deletions

29
.beads/.gitignore vendored Normal file
View file

@ -0,0 +1,29 @@
# SQLite databases
*.db
*.db?*
*.db-journal
*.db-wal
*.db-shm
# Daemon runtime files
daemon.lock
daemon.log
daemon.pid
bd.sock
# Legacy database files
db.sqlite
bd.db
# Merge artifacts (temporary files from 3-way merge)
beads.base.jsonl
beads.base.meta.json
beads.left.jsonl
beads.left.meta.json
beads.right.jsonl
beads.right.meta.json
# Keep JSONL exports and config (source of truth for git)
!issues.jsonl
!metadata.json
!config.json

1
.beads/.local_version Normal file
View file

@ -0,0 +1 @@
0.29.0

81
.beads/README.md Normal file
View file

@ -0,0 +1,81 @@
# Beads - AI-Native Issue Tracking
Welcome to Beads! This repository uses **Beads** for issue tracking - a modern, AI-native tool designed to live directly in your codebase alongside your code.
## What is Beads?
Beads is issue tracking that lives in your repo, making it perfect for AI coding agents and developers who want their issues close to their code. No web UI required - everything works through the CLI and integrates seamlessly with git.
**Learn more:** [github.com/steveyegge/beads](https://github.com/steveyegge/beads)
## Quick Start
### Essential Commands
```bash
# Create new issues
bd create "Add user authentication"
# View all issues
bd list
# View issue details
bd show <issue-id>
# Update issue status
bd update <issue-id> --status in_progress
bd update <issue-id> --status done
# Sync with git remote
bd sync
```
### Working with Issues
Issues in Beads are:
- **Git-native**: Stored in `.beads/issues.jsonl` and synced like code
- **AI-friendly**: CLI-first design works perfectly with AI coding agents
- **Branch-aware**: Issues can follow your branch workflow
- **Always in sync**: Auto-syncs with your commits
## Why Beads?
✨ **AI-Native Design**
- Built specifically for AI-assisted development workflows
- CLI-first interface works seamlessly with AI coding agents
- No context switching to web UIs
🚀 **Developer Focused**
- Issues live in your repo, right next to your code
- Works offline, syncs when you push
- Fast, lightweight, and stays out of your way
🔧 **Git Integration**
- Automatic sync with git commits
- Branch-aware issue tracking
- Intelligent JSONL merge resolution
## Get Started with Beads
Try Beads in your own projects:
```bash
# Install Beads
curl -sSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash
# Initialize in your repo
bd init
# Create your first issue
bd create "Try out Beads"
```
## Learn More
- **Documentation**: [github.com/steveyegge/beads/docs](https://github.com/steveyegge/beads/tree/main/docs)
- **Quick Start Guide**: Run `bd quickstart`
- **Examples**: [github.com/steveyegge/beads/examples](https://github.com/steveyegge/beads/tree/main/examples)
---
*Beads: Issue tracking that moves at the speed of thought* ⚡

62
.beads/config.yaml Normal file
View file

@ -0,0 +1,62 @@
# Beads Configuration File
# This file configures default behavior for all bd commands in this repository
# All settings can also be set via environment variables (BD_* prefix)
# or overridden with command-line flags
# Issue prefix for this repository (used by bd init)
# If not set, bd init will auto-detect from directory name
# Example: issue-prefix: "myproject" creates issues like "myproject-1", "myproject-2", etc.
# issue-prefix: ""
# Use no-db mode: load from JSONL, no SQLite, write back after each command
# When true, bd will use .beads/issues.jsonl as the source of truth
# instead of SQLite database
# no-db: false
# Disable daemon for RPC communication (forces direct database access)
# no-daemon: false
# Disable auto-flush of database to JSONL after mutations
# no-auto-flush: false
# Disable auto-import from JSONL when it's newer than database
# no-auto-import: false
# Enable JSON output by default
# json: false
# Default actor for audit trails (overridden by BD_ACTOR or --actor)
# actor: ""
# Path to database (overridden by BEADS_DB or --db)
# db: ""
# Auto-start daemon if not running (can also use BEADS_AUTO_START_DAEMON)
# auto-start-daemon: true
# Debounce interval for auto-flush (can also use BEADS_FLUSH_DEBOUNCE)
# flush-debounce: "5s"
# Git branch for beads commits (bd sync will commit to this branch)
# IMPORTANT: Set this for team projects so all clones use the same sync branch.
# This setting persists across clones (unlike database config which is gitignored).
# Can also use BEADS_SYNC_BRANCH env var for local override.
# If not set, bd sync will require you to run 'bd config set sync.branch <branch>'.
# sync-branch: "beads-sync"
# Multi-repo configuration (experimental - bd-307)
# Allows hydrating from multiple repositories and routing writes to the correct JSONL
# repos:
# primary: "." # Primary repo (where this database lives)
# additional: # Additional repos to hydrate from (read-only)
# - ~/beads-planning # Personal planning repo
# - ~/work-planning # Work planning repo
# Integration settings (access with 'bd config get/set')
# These are stored in the database, not in this file:
# - jira.url
# - jira.project
# - linear.url
# - linear.api-key
# - github.org
# - github.repo

39
.beads/issues.jsonl Normal file
View file

@ -0,0 +1,39 @@
{"id":"ops-jrz1-00e","title":"Upgrade NixOS from 24.05 to 24.11","description":"Running NixOS 24.05.20241230 (Uakari). Current stable is 24.11. May be missing security patches. Low priority as no known critical CVEs, but should plan upgrade.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-04T21:03:22.760228514-08:00","updated_at":"2025-12-04T21:04:35.805980055-08:00","comments":[{"id":1,"issue_id":"ops-jrz1-00e","author":"dan","text":"Analysis Findings:\n1. Version Mismatch: Local flake.nix is pinned to 'nixos-24.05', but the dev environment reports '25.11' (Unstable), indicating state divergence.\n2. Upstream Bugs: Blocking issues in mautrix-slack (ops-jrz1-blh) and maubot (sync failure) are present in the current unstable revision (2025-12-02).\n3. Recommendation: Upgrade platform to NixOS 24.11 (Stable) to align environment, ensure stability, and pull fresh upstream fixes.","created_at":"2025-12-08T23:54:57Z"}]}
{"id":"ops-jrz1-03o","title":"Upgrade mautrix-slack to v25.11","description":"Upgrade is just flake update + deploy. Current deployed: v0.2.3+dev.unknown (Oct 13). Flake lock: v25.10 (Oct 22). Latest nixpkgs-unstable: v25.11. Run: nix flake update nixpkgs-unstable \u0026\u0026 deploy. May fix edit panic (ops-jrz1-qxr).","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T18:24:18.332067067-08:00","updated_at":"2025-12-05T19:07:09.156981447-08:00","closed_at":"2025-12-05T19:07:09.156981447-08:00"}
{"id":"ops-jrz1-3ca","title":"Persist opencode state/cache across restarts","description":"opencode may store index/cache in ~/.cache or other dirs not covered by current bind mounts. AI context could be lost on container restart. Verify and add mounts.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:30.90315778-08:00","updated_at":"2025-12-05T15:32:30.90315778-08:00","dependencies":[{"issue_id":"ops-jrz1-3ca","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.247361009-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-3fd","title":"Deploy and test single-user instance (Phase 1)","description":"Deploy one container for testing. Validate: WebSocket, extensions, terminal, opencode, memory usage. Access via SSH tunnel initially.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.783260036-08:00","updated_at":"2025-12-05T17:16:54.783260036-08:00","dependencies":[{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.400677984-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-5oe","type":"blocks","created_at":"2025-12-05T17:17:38.708397909-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-av0","type":"blocks","created_at":"2025-12-05T17:17:38.721665448-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-9gd","type":"blocks","created_at":"2025-12-05T17:17:38.737824478-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-3so","title":"Browser-based dev environment with opencode","description":"Epic: Provide VS Code in browser via code-server with opencode AI integration.\n\nKey decisions:\n- code-server in Podman containers (rootless)\n- opencode CLI + VS Code extension pre-installed\n- Subdomain routing (dan.code.clarun.xyz)\n- Custom container image\n- Target users: non-programmers, testers, learners\n\nDesign doc: specs/004-browser-dev-environment/design.md\n\nMigrated from ops-jrz1-ndl","status":"open","priority":1,"issue_type":"epic","created_at":"2025-12-05T17:04:36.709352529-08:00","updated_at":"2025-12-05T17:04:36.709352529-08:00"}
{"id":"ops-jrz1-3x4","title":"Add maubot SDK and deploy script to container image","description":"Container image needs:\n- Python 3.11 + maubot SDK\n- deploy.sh script (zip → .mbp → curl to maubot API)\n- maubot API reachable from container (host network or port forward)\n\nPart of learner onboarding for bot development.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-06T12:18:06.841708662-08:00","updated_at":"2025-12-06T12:18:06.841708662-08:00","dependencies":[{"issue_id":"ops-jrz1-3x4","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-06T12:18:16.085519885-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3x4","depends_on_id":"ops-jrz1-d58","type":"blocks","created_at":"2025-12-06T12:18:16.110944935-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-45v","title":"Matrix/Slack identity mismatch: dan vs vlad","description":"Matrix user @dan:clarun.xyz is linked to Slack user 'vlad'. Messages appear as vlad in Slack but dan in Element. Cosmetic confusion. Options: rename Matrix display name, or re-login bridge with different Slack account.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T19:38:19.899555475-08:00","updated_at":"2025-12-05T19:38:19.899555475-08:00"}
{"id":"ops-jrz1-46y","title":"Write onboarding documentation","description":"Critical for non-programmers. Cover: login, opencode usage, Git setup (PAT workflow), resource limits, security hygiene. Keep concise.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:41.586544583-08:00","updated_at":"2025-12-05T15:32:41.586544583-08:00","dependencies":[{"issue_id":"ops-jrz1-46y","depends_on_id":"ops-jrz1-7j4","type":"blocks","created_at":"2025-12-05T15:33:25.328712413-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-46y","depends_on_id":"ops-jrz1-wj2","type":"blocks","created_at":"2025-12-05T15:33:25.351559821-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-46y","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.401868669-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-4jm","title":"Smoke test Matrix server (conduwuit)","description":"Verify Matrix homeserver is healthy: check /_matrix/client/versions endpoint, test registration, verify federation status (disabled). Quick health check after deployments.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T18:09:47.220765063-08:00","updated_at":"2025-12-05T18:19:33.059734881-08:00","closed_at":"2025-12-05T18:19:33.059734881-08:00"}
{"id":"ops-jrz1-5fk","title":"Smoke test Maubot service","description":"Verify Maubot is healthy: check management UI accessible via SSH tunnel, verify bot instances running, test plugin functionality. Quick health check after deployments.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T18:09:47.33773092-08:00","updated_at":"2025-12-05T18:19:33.061388913-08:00","closed_at":"2025-12-05T18:19:33.061388913-08:00"}
{"id":"ops-jrz1-5ki","title":"Set up programmatic QA test user for bridge testing","description":"","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T20:17:04.312571398-08:00","updated_at":"2025-12-05T20:17:04.312571398-08:00"}
{"id":"ops-jrz1-5oe","title":"Create NixOS module for code-server containers","description":"Module to manage per-user Podman containers, nginx routing, secrets. Use virtualisation.oci-containers. Generate systemd units.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.656121092-08:00","updated_at":"2025-12-05T17:16:54.656121092-08:00","dependencies":[{"issue_id":"ops-jrz1-5oe","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.386278268-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-5oe","depends_on_id":"ops-jrz1-d58","type":"blocks","created_at":"2025-12-05T17:17:38.694752468-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-6of","title":"AI cost/rate limiting per user","description":"One user could drain API credits with runaway script. Need rate limiting per user, either via proxy middleware or opencode config. Track usage.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:30.772304538-08:00","updated_at":"2025-12-05T17:42:42.773613559-08:00","closed_at":"2025-12-05T17:42:42.773613559-08:00","dependencies":[{"issue_id":"ops-jrz1-6of","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.206816868-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-6of","depends_on_id":"ops-jrz1-wj2","type":"blocks","created_at":"2025-12-05T17:17:38.658742196-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-7j4","title":"Git credential strategy for non-programmers","description":"Non-programmers can't manage SSH keys. Pre-configure git-credential-store or provide simple PAT workflow with docs. Store in persistent home with 600 perms.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:19.673999683-08:00","updated_at":"2025-12-05T17:38:54.788694408-08:00","closed_at":"2025-12-05T17:38:54.788694408-08:00","dependencies":[{"issue_id":"ops-jrz1-7j4","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.139749437-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-88o","title":"Implement backup strategy for VPS","description":"No backups configured. Critical data: Matrix DB (622M), PostgreSQL (161M), Forgejo (2.5M), maubot (320K). No recovery path if disk fails. Need automated backups with off-site storage.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-04T22:55:25.546850172-08:00","updated_at":"2025-12-05T00:56:27.720623612-08:00","closed_at":"2025-12-05T00:56:27.720623612-08:00"}
{"id":"ops-jrz1-9gd","title":"Upgrade VPS RAM for dev environments","description":"Current: 2GB. Need 4-8GB for multiple code-server containers. Coordinate with Vultr, plan maintenance window.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.267689439-08:00","updated_at":"2025-12-05T17:16:54.267689439-08:00","dependencies":[{"issue_id":"ops-jrz1-9gd","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.331146543-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-av0","title":"Configure wildcard DNS and ACME cert","description":"Set up *.code.clarun.xyz DNS record and wildcard SSL cert via ACME. Depends on subdomain routing decision (kg0).","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.387356964-08:00","updated_at":"2025-12-05T17:16:54.387356964-08:00","dependencies":[{"issue_id":"ops-jrz1-av0","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.34918436-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-av0","depends_on_id":"ops-jrz1-kg0","type":"blocks","created_at":"2025-12-05T17:17:38.676800677-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-bhk","title":"Add disk quotas for user workspaces","description":"User could fill host disk via /var/lib/vscode/\u003cuser\u003e/. Add per-directory quotas or monitoring/alerting on disk usage.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:41.199417226-08:00","updated_at":"2025-12-05T15:32:41.199417226-08:00","dependencies":[{"issue_id":"ops-jrz1-bhk","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.309592029-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-blh","title":"mautrix-slack edit panic persists in v25.11","description":"mautrix-slack panic on rapid message edits (race condition)\n\n**Root cause**: Edit event arrives before original message is stored in DB. ConvertEdit accesses nil metadata.\n\n**Location**: handleslack.go:575 - has TODO comment: 'this can panic?'\n\n**Reproduction**: Edit a Slack message within ~1 second of sending\n\n**Upstream status**: \n- v25.11 is latest (we're on it)\n- Known to devs (TODO in code)\n- No open issue filed yet\n\n**Stack trace**:\ngo.mau.fi/mautrix-slack/pkg/connector.(*SlackMessage).ConvertEdit\n handleslack.go:575\nmaunium.net/go/mautrix/bridgev2.(*Portal).handleRemoteEdit\n portal.go:2838","status":"open","priority":2,"issue_type":"bug","created_at":"2025-12-05T19:40:33.255395189-08:00","updated_at":"2025-12-05T23:05:05.344825241-08:00","comments":[{"id":2,"issue_id":"ops-jrz1-blh","author":"dan","text":"Confirmed panic exists in nixpkgs-unstable from 2025-12-02. Fix will be addressed via platform upgrade (see ops-jrz1-00e).","created_at":"2025-12-08T23:54:57Z"}]}
{"id":"ops-jrz1-d58","title":"Build custom code-server container image","description":"Dockerfile with: code-server, opencode CLI, opencode VS Code extension (Open VSX), Python, Node, Git. Push to registry or build locally.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.507577308-08:00","updated_at":"2025-12-05T17:16:54.507577308-08:00","dependencies":[{"issue_id":"ops-jrz1-d58","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.369590207-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-dhj","title":"Port forwarding strategy for user apps","description":"When user runs app on localhost:3000, how do they view it? code-server has /proxy/\u003cport\u003e but URL is confusing for learners. Need clear UX or docs.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:30.649292743-08:00","updated_at":"2025-12-05T17:41:01.486505687-08:00","closed_at":"2025-12-05T17:41:01.486505687-08:00","dependencies":[{"issue_id":"ops-jrz1-dhj","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.175857247-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-dt9","title":"Increase container RAM limits (2GB too tight)","description":"2GB hard limit will OOM with code-server + opencode + LSP + user app. Gemini/GPT recommend 3-4GB per container or add swap. Need to size server appropriately.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:19.400171408-08:00","updated_at":"2025-12-05T17:38:54.770433169-08:00","closed_at":"2025-12-05T17:38:54.770433169-08:00","dependencies":[{"issue_id":"ops-jrz1-dt9","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.066130377-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-dux","title":"Container isolation: maubot API access only","description":"Security design for learner containers:\n\n**Container CAN access**:\n- maubot API (:29316) for plugin deploy\n- Matrix rooms via bot (through maubot)\n- Slack via bridge (through Matrix)\n\n**Container CANNOT access**:\n- Host filesystem\n- Other containers\n- PostgreSQL directly\n- Matrix homeserver directly\n- sops secrets\n\nImplementation: Podman network config, no --privileged, limited port exposure.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-06T12:18:16.212646624-08:00","updated_at":"2025-12-06T12:18:16.212646624-08:00","dependencies":[{"issue_id":"ops-jrz1-dux","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-06T12:18:21.627621772-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-ezf","title":"Maubot plugin dev workflow for learners","description":"Design frictionless dev workflow for Python/Go learners building maubot plugins.\n\n**Requirements**:\n- No SSH tunnel setup for learners\n- Fast feedback loop (edit → see bot respond)\n- Circuit breakers (allowed_rooms, rate limits)\n- Test channel: #vlads-pad (Slack) ↔ Matrix\n\n**Options being considered**:\n1. Git-push deploy: push to repo → CI builds .mbp → deploys to maubot\n2. Code-server containers: browser IDE on VPS, deploy script talks to maubot locally\n3. Hybrid: code-server + git workflow\n\n**Related**: ops-jrz1-3so (browser-dev-environment epic)","status":"open","priority":2,"issue_type":"feature","created_at":"2025-12-06T01:36:26.529372206-08:00","updated_at":"2025-12-06T01:36:26.529372206-08:00","dependencies":[{"issue_id":"ops-jrz1-ezf","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-06T12:18:06.743837766-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-gci","title":"Enable fail2ban for SSH brute force protection","description":"SSH brute force attempts generate log noise but don't pose security risk (key-only auth). fail2ban would help but is low priority. Deferred pending RFC on SSH log management strategy.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-04T21:03:22.651495544-08:00","updated_at":"2025-12-04T22:55:13.805471391-08:00","dependencies":[{"issue_id":"ops-jrz1-gci","depends_on_id":"ops-jrz1-nir","type":"blocks","created_at":"2025-12-04T22:56:14.777377818-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-glk","title":"VS Code extension policy (security)","description":"Extensions can run arbitrary code. Decide: allow arbitrary installs, or curate/restrict? For non-programmers, pre-install safe set and optionally disable marketplace.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:41.463030936-08:00","updated_at":"2025-12-05T15:32:41.463030936-08:00","dependencies":[{"issue_id":"ops-jrz1-glk","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.372120465-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-i8i","title":"Enable mautrix-slack relay mode for bot bridging","description":"","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T19:09:42.087506995-08:00","updated_at":"2025-12-06T19:09:47.612545472-08:00","closed_at":"2025-12-06T19:09:47.612545472-08:00"}
{"id":"ops-jrz1-iok","title":"Instagram bot missing base-config.yaml","description":"Plugin was missing base-config.yaml required by maubot Config class. Fixed in commit 4b9481d.","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-06T13:02:10.103730128-08:00","updated_at":"2025-12-06T13:02:15.055396318-08:00","closed_at":"2025-12-06T13:02:15.055396318-08:00"}
{"id":"ops-jrz1-jit","title":"Logging and monitoring for dev environments","description":"No observability plan. Need: container CPU/mem metrics, nginx logs, disk usage monitoring, alert on repeated 401s or resource exhaustion.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:41.318448038-08:00","updated_at":"2025-12-05T15:32:41.318448038-08:00","dependencies":[{"issue_id":"ops-jrz1-jit","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.343610481-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-kg0","title":"Switch to subdomain routing (dan.code.clarun.xyz)","description":"Path-based routing (/code/dan/) is fragile. Extensions assume root path, cookies scope incorrectly, PWA breaks. Switch to wildcard subdomains for cleaner isolation.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-05T15:32:19.283887085-08:00","updated_at":"2025-12-05T17:23:11.983564455-08:00","closed_at":"2025-12-05T17:23:11.983564455-08:00","dependencies":[{"issue_id":"ops-jrz1-kg0","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.043217984-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-kia","title":"Container reset mechanism (keep workspace)","description":"If user breaks their environment, need simple way to wipe container and restore default image while preserving /workspace. Script or admin command.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:31.045592689-08:00","updated_at":"2025-12-05T15:32:31.045592689-08:00","dependencies":[{"issue_id":"ops-jrz1-kia","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.275530016-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-ndl","title":"Browser-based dev environment (code-server)","description":"Explore setting up browser-based development:\n\nOptions:\n- code-server / openvscode-server - VS Code in browser\n- ttyd / wetty - terminal in browser \n- PWA install to home screen for native app feel\n\nCould combine with Tailscale for secure access without exposing ports.\n\nRef: ops-dev thin client brainstorm session","notes":"Design doc created: specs/004-browser-dev-environment/design.md - covers architecture, tech choices, resource planning, security model, rollout phases","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-04T15:08:02.406274744-08:00","updated_at":"2025-12-05T17:05:52.872944892-08:00","closed_at":"2025-12-05T17:05:52.872944892-08:00"}
{"id":"ops-jrz1-nir","title":"RFC: SSH log noise reduction strategy","description":"Research showed 99.8% of SSH logs are scanner noise (9000 failed attempts/day). Options: (1) Change SSH port - simple, ~99% reduction (2) journald filter - surgical but complex (3) LogLevel ERROR - loses successful login audit trail (4) fail2ban - bans IPs, partial reduction. Orch consensus: Gemini opposed LogLevel ERROR due to losing audit trail, GPT supported. Need RFC to decide approach. See posture review from Dec 2025 session.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-04T22:55:13.990334935-08:00","updated_at":"2025-12-04T22:55:13.990334935-08:00"}
{"id":"ops-jrz1-nvx","title":"Slack bot architecture: Matrix-first approach","description":"**Decision**: Use Matrix as primary platform for Slack bot development.\n\n**Architecture**: Bots run as maubot plugins (or Matrix bots), communicate to Slack via mautrix-slack bridge.\n\n**Rationale**:\n- Existing infrastructure (maubot deployed, bridge working)\n- Single platform to manage\n- Bots work with Matrix users too\n- Avoid Socket Mode contention (only one xapp- connection allowed)\n\n**Trade-offs accepted**:\n- Bridge dependency (edit panic bug exists)\n- Extra latency through bridge hop\n- Limited to bridged channels\n\n**Alternative considered (Option B - direct Slack API)**:\n- Could use xoxb- token for outbound-only (REST)\n- Would need new Slack app for full Socket Mode independence\n- Deferred for now\n\n**Credentials available**:\n- slack-oauth-token (xoxb-) - shareable for REST calls if needed\n- slack-app-token (xapp-) - reserved for bridge Socket Mode\n\n**Status**: DECIDED - staying with Matrix-first","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-05T23:12:22.011872713-08:00","updated_at":"2025-12-05T23:12:28.329467732-08:00","closed_at":"2025-12-05T23:12:28.329467732-08:00"}
{"id":"ops-jrz1-qxr","title":"mautrix-slack message edit panic (upstream bug)","description":"Bridge upgraded to v25.11. Need to verify if edit panic is fixed by testing a Slack message edit. Watch logs: journalctl -u mautrix-slack -f | grep -E 'ERR|panic|edit'","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-05T18:22:38.18203834-08:00","updated_at":"2025-12-05T19:36:00.556011621-08:00","closed_at":"2025-12-05T19:36:00.556011621-08:00","dependencies":[{"issue_id":"ops-jrz1-qxr","depends_on_id":"ops-jrz1-03o","type":"blocks","created_at":"2025-12-05T18:24:23.259399275-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-u0w","title":"Security review of running server","description":"","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-04T21:03:22.420507724-08:00","updated_at":"2025-12-04T21:04:31.989886731-08:00","closed_at":"2025-12-04T21:04:31.989886731-08:00"}
{"id":"ops-jrz1-wj2","title":"Design API key provisioning strategy","description":"opencode needs API keys (OpenAI, Anthropic). Options: 1) Shared key with proxy + rate limiting, 2) Per-user keys in sops-nix. Need to prevent key exposure and enable usage tracking.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-05T15:32:19.526073243-08:00","updated_at":"2025-12-05T17:25:10.534718515-08:00","closed_at":"2025-12-05T17:25:10.534718515-08:00","dependencies":[{"issue_id":"ops-jrz1-wj2","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.103332379-08:00","created_by":"daemon"}]}
{"id":"ops-jrz1-xz1","title":"Fix maubot admin UI exposed to internet (port 29316)","description":"Maubot admin UI on port 29316 is publicly accessible (returns 401 but API surface exposed). Firewall explicitly allows this port. Risk: brute force on admin password, direct exploit of any maubot vulnerabilities. Fix: bind to 127.0.0.1 only, remove from firewall, access via SSH tunnel.","status":"closed","priority":1,"issue_type":"bug","created_at":"2025-12-04T21:03:22.531676543-08:00","updated_at":"2025-12-04T22:35:24.162735368-08:00","closed_at":"2025-12-04T22:35:24.162735368-08:00"}
{"id":"ops-jrz1-zvh","title":"Fix maubot health check (failing every 5 min)","description":"Health check at /_matrix/maubot/v1/version returns 401 (auth required). Check script doesn't provide auth token. Spamming error logs every 5 minutes.","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-04T22:55:25.755541054-08:00","updated_at":"2025-12-05T02:00:19.284410671-08:00","closed_at":"2025-12-05T02:00:19.284410671-08:00"}

4
.beads/metadata.json Normal file
View file

@ -0,0 +1,4 @@
{
"database": "beads.db",
"jsonl_export": "issues.jsonl"
}

1
.gitignore vendored
View file

@ -48,6 +48,7 @@ venv/
# Spec-kit framework (auto-updated by framework)
.claude/commands/speckit.*.md
.codex/
.specify/memory/
.specify/scripts/
.specify/templates/

48
AGENTS.md Normal file
View file

@ -0,0 +1,48 @@
# Beads Issue Tracking
**Session start**: Run `bd ready` to see available work.
## Commands
- `bd ready` - Issues with no blockers
- `bd show <id>` - Issue details
- `bd update <id> --status=in_progress` - Claim work
- `bd close <id>` - Complete work
- `bd create --title="..." --type=task|bug|feature` - New issue
- `bd dep add <issue> <depends-on>` - Add dependency
## Session End
Before finishing: `git status`, `git add`, `git commit`. This is an ephemeral branch - merge to main locally.
# Repository Guidelines
## Project Structure & Module Organization
- `configuration.nix` holds shared system defaults; adjust service toggles in host overlays instead of editing it directly.
- `hosts/ops-jrz1.nix` and `hosts/ops-jrz1-vm.nix` override environment-specific networking, secrets, and hardware details; mirror changes across both when possible.
- `modules/` contains composable NixOS modules (`matrix-continuwuity.nix`, `mautrix-*.nix`, `security/*`); keep new modules kebab-cased and expose options via `lib.mkOption`.
- `scripts/` provides sanitization utilities. Stage external imports under `staging/`, run `./scripts/sanitize-files.sh SRC staging/modules`, then promote files into `modules/` once validation passes.
- `specs/` and `docs/` capture design intent and runbooks; update the relevant spec when changing feature scope.
## Build, Test, and Development Commands
- `nix flake check` validates module wiring, options, and formatting before review.
- `nix build .#nixosConfigurations.ops-jrz1` produces the deployable system closure; use this to catch evaluation regressions.
- `nixos-rebuild switch --flake .#ops-jrz1 --target-host root@ops-jrz1` deploys to the VPS; replace the target host when testing elsewhere.
- `./scripts/validate-sanitization.sh modules/` ensures redacted content before commit; rerun after manual edits to sanitized files.
## Coding Style & Naming Conventions
- Prefer two-space indentation in Nix files; align attribute sets and option blocks for readability.
- Use `lowerCamelCase` for option names, kebab-case for file names, and leave explanatory comments above non-obvious logic paths only.
- Format Nix with `nix fmt` (nixpkgs-fmt) or equivalent before committing to keep diffs minimal.
## Testing Guidelines
- Treat `nix flake check` as the minimum gate; add targeted VM tests in `hosts/ops-jrz1-vm.nix` when introducing new services.
- Name ad-hoc verification scripts under `scripts/local-*` and avoid committing transient debug helpers.
- Capture manual verification steps in `docs/worklogs/` immediately after deploys for traceability.
## Commit & Pull Request Guidelines
- Follow the existing Git log style: single-line, capitalized summaries in ~70 characters (e.g., `Tighten bridge secret validation`).
- Reference related specs or worklogs in the body, and list `nix flake check` (and any VM smoke tests) under a short "Validation" block.
- PRs should link the tracked task, summarize scope, highlight sanitization steps, and mention any secrets or infra touchpoints reviewers must provision.
## Security & Secrets Handling
- Never commit decrypted material; use `sops secrets/secrets.yaml` for edits and confirm `git status` shows only encrypted blobs.
- Replace real domains, IPs, and tokens with repository-safe placeholders. When importing upstream configs, run the sanitize and validate scripts before staging changes.

View file

@ -98,6 +98,21 @@ ssh root@45.77.205.49 'sudo -u postgres psql mautrix_slack -c "\dt"'
ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack' > backup.sql
```
### SSH Tunnels
```bash
# Maubot web UI (admin interface for managing bot instances)
ssh -L 29316:localhost:29316 root@45.77.205.49
# Then access: http://localhost:29316
# Login: admin / (password from secrets/secrets.yaml)
# Matrix homeserver (for debugging)
ssh -L 8008:localhost:8008 root@45.77.205.49
# Then access: http://localhost:8008
# Keep tunnel open in background
ssh -fN -L 29316:localhost:29316 root@45.77.205.49
```
## Code Style
- Nix 2.x, NixOS 24.05+, Bash 5.x: Follow standard conventions
- NixOS modules: Use nixpkgs module pattern (options, config, mkIf)
@ -199,6 +214,7 @@ git branch -d 003-feature-name
- Tag releases for deployment milestones
## Recent Changes
- 003-maubot-integration: Added [if applicable, e.g., PostgreSQL, CoreData, files or N/A]
- 001-extract-matrix-platform: Added Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts)
- 002-slack-bridge-integration: Deployed mautrix-slack bridge with Socket Mode (2025-10-26)
- Phase 0-1: Research and design complete
@ -458,4 +474,4 @@ postgresql.service
- Fresh database recommended after conduwuit version upgrades
- Debug logging currently enabled on conduwuit
<!-- MANUAL ADDITIONS END -->
<!-- MANUAL ADDITIONS END -->

View file

@ -34,11 +34,11 @@
},
"nixpkgs-unstable": {
"locked": {
"lastModified": 1761114652,
"narHash": "sha256-f/QCJM/YhrV/lavyCVz8iU3rlZun6d+dAiC3H+CDle4=",
"lastModified": 1764667669,
"narHash": "sha256-7WUCZfmqLAssbDqwg9cUDAXrSoXN79eEEq17qhTNM/Y=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "01f116e4df6a15f4ccdffb1bcd41096869fb385c",
"rev": "418468ac9527e799809c900eda37cbff999199b6",
"type": "github"
},
"original": {

View file

@ -3,12 +3,16 @@
{ config, pkgs, pkgs-unstable, lib, ... }:
{
# Disable built-in NixOS maubot module to use our sops-nix enhanced version
disabledModules = [ "services/matrix/maubot.nix" ];
imports = [
# Import all modules (same as production)
../modules/matrix-continuwuity.nix
../modules/mautrix-slack.nix
../modules/mautrix-whatsapp.nix
../modules/mautrix-gmessages.nix
../modules/maubot.nix
../modules/dev-services.nix
../modules/security/fail2ban.nix
../modules/security/ssh-hardening.nix
@ -74,5 +78,11 @@
allowedTCPPorts = [ 22 80 443 8008 3000 ];
};
# Dummy filesystem for VM evaluation
fileSystems."/" = {
device = "/dev/vda1";
fsType = "ext4";
};
system.stateVersion = "24.05";
}

View file

@ -4,6 +4,9 @@
# ops-jrz1 production VPS configuration
# Imports extracted Matrix modules from ops-base
# Disable built-in NixOS maubot module to use our sops-nix enhanced version
disabledModules = [ "services/matrix/maubot.nix" ];
imports = [
# Hardware configuration
../hardware-configuration.nix
@ -11,10 +14,12 @@
# Matrix platform modules
../modules/matrix-continuwuity.nix
../modules/mautrix-slack.nix
../modules/maubot.nix
../modules/dev-services.nix
../modules/security/fail2ban.nix
../modules/security/ssh-hardening.nix
../modules/matrix-secrets
../modules/backup.nix
];
# System configuration
@ -35,6 +40,16 @@
mode = "0444";
};
sops.secrets.maubot-admin-password = {
# Maubot management interface admin password
mode = "0400";
};
sops.secrets.maubot-secret-key = {
# Maubot session secret key
mode = "0400";
};
# Matrix homeserver configuration
# NOTE: Disabled in favor of dev-platform.matrix which provides integrated
# bridge coordination and systemd credential-based secrets management
@ -68,7 +83,16 @@
workspace = "chochacho";
port = 29319;
};
maubot = {
enable = true;
port = 29316;
plugins = [ ../modules/plugins/sna-instagram-bot.mbp ];
};
};
# Local backup service (Phase 1: manual trigger)
services.backup.enable = true;
system.stateVersion = "24.05";
}

108
modules/backup.nix Normal file
View file

@ -0,0 +1,108 @@
# Local backup service for PostgreSQL and Maubot
# Phase 1: Manual trigger via `systemctl start backup`
# Phase 2: Enable timer for daily automation
{ config, pkgs, lib, ... }:
with lib;
let
cfg = config.services.backup;
in
{
options.services.backup = {
enable = mkEnableOption "local backup service";
location = mkOption {
type = types.str;
default = "/var/backup";
description = "Backup storage directory";
};
retention = mkOption {
type = types.int;
default = 4;
description = "Days to retain backups";
};
};
config = mkIf cfg.enable {
# Ensure backup directory exists
systemd.tmpfiles.rules = [
"d ${cfg.location} 0750 root root -"
];
# Backup service (oneshot, manual trigger)
systemd.services.backup = {
description = "Local backup service";
after = [ "postgresql.service" ];
requires = [ "postgresql.service" ];
serviceConfig = {
Type = "oneshot";
User = "root";
# Low priority - don't impact running services
IOSchedulingClass = "idle";
Nice = 19;
};
path = [
config.services.postgresql.package # pg_dumpall
pkgs.gzip
pkgs.sqlite
pkgs.util-linux # runuser
pkgs.coreutils
pkgs.findutils
];
script = ''
set -euo pipefail
DATE=$(date +%Y-%m-%d)
BASE="${cfg.location}"
TMP="$BASE/.incomplete-$DATE"
DEST="$BASE/$DATE"
# Skip if today's backup exists
if [ -d "$DEST" ]; then
echo "Backup already exists: $DEST"
exit 0
fi
# Clean up any previous incomplete attempts
rm -rf "$BASE"/.incomplete-*
mkdir -p "$TMP"
# PostgreSQL (hot, consistent via MVCC)
echo "Backing up PostgreSQL..."
runuser -u postgres -- pg_dumpall | gzip > "$TMP/postgres.sql.gz"
gzip -t "$TMP/postgres.sql.gz"
# Maubot SQLite (consistent via .backup API)
if [ -f /var/lib/maubot/bot.db ]; then
echo "Backing up Maubot..."
sqlite3 /var/lib/maubot/bot.db ".backup '$TMP/maubot.db'"
else
echo "Maubot DB not found, skipping"
fi
# Atomic publish
mv "$TMP" "$DEST"
# Prune old backups (keep ${toString cfg.retention} days)
find "$BASE" -mindepth 1 -maxdepth 1 -type d -mtime +${toString cfg.retention} -exec rm -rf {} +
echo "Backup complete: $DEST"
ls -lh "$DEST"
'';
};
# Timer (disabled by default, enable for Phase 2)
# systemd.timers.backup = {
# wantedBy = [ "timers.target" ];
# timerConfig = {
# OnCalendar = "daily";
# Persistent = true;
# };
# };
};
}

View file

@ -75,6 +75,26 @@ in
description = "Slack bridge port";
};
};
maubot = {
enable = mkOption {
type = types.bool;
default = false;
description = "Enable Maubot bot framework";
};
port = mkOption {
type = types.port;
default = 29316;
description = "Maubot management interface port";
};
plugins = mkOption {
type = types.listOf types.path;
default = [];
description = "Maubot plugins to deploy";
};
};
};
config = mkIf cfg.enable {
@ -217,8 +237,30 @@ in
};
bridge.permissions = {
"${cfg.matrix.serverName}" = "user";
"${cfg.matrix.serverName}" = "admin";
};
encryption.enable = false;
# Enable relay mode so non-logged-in Matrix users (like bots)
# can send messages to Slack via a logged-in relay account
extraConfig = {
bridge.relay = {
enabled = true;
admin_only = false; # Allow room admins to set relay
};
};
};
# Maubot bot framework (using custom module with sops-nix integration)
services.maubot = mkIf cfg.maubot.enable {
enable = true;
homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
serverName = cfg.matrix.serverName;
port = cfg.maubot.port;
adminPasswordFile = "/run/secrets/maubot-admin-password";
secretKeyFile = "/run/secrets/maubot-secret-key";
plugins = cfg.maubot.plugins;
};
# Basic Nginx reverse proxy

View file

@ -20,7 +20,7 @@ let
allow_federation = ${boolToString cfg.enableFederation}
database_backend = "rocksdb"
database_path = "${cfg.dataDir}/db/"
log = "info,continuwuity=debug"
log = "info"
${optionalString cfg.enableFederation ''
trusted_servers = ["matrix.org"]
''}

393
modules/maubot.nix Normal file
View file

@ -0,0 +1,393 @@
# Maubot Matrix bot framework module
# Plugin-based Matrix bot system following established infrastructure patterns
{ config, pkgs, lib, ... }:
with lib;
let
cfg = config.services.maubot;
# Python environment with maubot and Instagram bot dependencies
maubotEnv = pkgs.python3.withPackages (ps: with ps; [
maubot
yt-dlp
# instaloader # Not available in nixpkgs, fallback to yt-dlp only
aiohttp
pillow
]);
in
{
options.services.maubot = {
enable = mkEnableOption "Maubot Matrix bot framework";
homeserverUrl = mkOption {
type = types.str;
default = "http://127.0.0.1:8008";
description = "Matrix homeserver URL for bot connections";
};
serverName = mkOption {
type = types.str;
default = "matrix.talu.uno";
description = "Matrix server name for bot users";
};
port = mkOption {
type = types.port;
default = 29316;
description = "Port for Maubot management interface";
};
adminUser = mkOption {
type = types.str;
default = "admin";
description = "Admin username for Maubot management interface";
};
adminPasswordFile = mkOption {
type = types.nullOr types.path;
default = null;
description = "Path to file containing admin password (more secure than adminPassword option)";
};
secretKeyFile = mkOption {
type = types.nullOr types.path;
default = null;
description = "Path to file containing Maubot secret key for sessions";
};
registrationSecretFile = mkOption {
type = types.nullOr types.path;
default = null;
description = "Path to file containing Matrix homeserver registration secret";
};
database = mkOption {
type = types.str;
default = "sqlite:/var/lib/maubot/bot.db";
description = "Database connection string (sqlite:// or postgresql://)";
};
logLevel = mkOption {
type = types.str;
default = "INFO";
description = "Log level (DEBUG, INFO, WARNING, ERROR)";
};
enableEncryption = mkOption {
type = types.bool;
default = true;
description = "Enable end-to-end encryption support for bots";
};
publicUrl = mkOption {
type = types.str;
default = "http://localhost:29316";
description = "Public URL where Maubot management interface is accessible";
};
plugins = mkOption {
type = types.listOf types.path;
default = [];
description = "List of maubot plugin .mbp files to deploy";
};
};
config = mkIf cfg.enable {
# User and group
users.users.maubot = {
isSystemUser = true;
group = "maubot";
home = "/var/lib/maubot";
createHome = true;
};
users.groups.maubot = {};
# Configuration file generation
environment.etc."maubot/config.yaml" = {
text = ''
# Maubot configuration - generated by NixOS
# Database configuration
database: "${cfg.database}"
# Server configuration
server:
hostname: 127.0.0.1
port: ${toString cfg.port}
public_url: ${cfg.publicUrl}
# Admin users for management interface
admins:
${cfg.adminUser}: ${if cfg.adminPasswordFile != null then "REPLACE_ADMIN_PASSWORD" else "changeme-set-password"}
# Bot configuration
api_features:
login: true
plugin: true
plugin_upload: true
instance: true
instance_database: true
log: true
# Logging configuration
logging:
version: 1
formatters:
precise:
format: '[%(levelname)s@%(name)s] %(message)s'
handlers:
console:
class: logging.StreamHandler
formatter: precise
file:
class: logging.handlers.RotatingFileHandler
formatter: precise
filename: /var/log/maubot/maubot.log
maxBytes: 52428800
backupCount: 10
loggers:
maubot:
level: ${cfg.logLevel}
mau:
level: ${cfg.logLevel}
aiohttp:
level: WARNING
root:
level: WARNING
handlers: [console, file]
# Plugin directories - using flat keys as expected by maubot
plugin_directories.upload: /var/lib/maubot/plugins
plugin_directories.load:
- /var/lib/maubot/plugins
plugin_directories.trash: /var/lib/maubot/trash
# Plugin databases configuration
plugin_databases:
sqlite: /var/lib/maubot/plugins
postgres: null
postgres_max_conns_per_plugin: 3
postgres_opts: {}
# Crypto configuration
crypto:
allow: ${if cfg.enableEncryption then "true" else "false"}
allow_level: warn
# Secret key for sessions
secret_key: ${if cfg.secretKeyFile != null then "REPLACE_SECRET_KEY" else "insecure-default-change-me"}
'';
user = "maubot";
group = "maubot";
mode = "0440";
};
# Systemd service with hardening
systemd.services.maubot = {
description = "Maubot Matrix bot framework";
after = [ "network.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "simple";
User = "maubot";
Group = "maubot";
WorkingDirectory = "/var/lib/maubot";
# Use StateDirectory for runtime data
# RuntimeDirectory removed to avoid race condition with manual creation
StateDirectory = "maubot";
LogsDirectory = "maubot";
# LoadCredential directives for secure secret injection
LoadCredential =
(optional (cfg.adminPasswordFile != null) "admin-password:${cfg.adminPasswordFile}") ++
(optional (cfg.secretKeyFile != null) "secret-key:${cfg.secretKeyFile}");
# Pre-start script to generate runtime config with secrets
ExecStartPre =
if (cfg.adminPasswordFile != null || cfg.secretKeyFile != null) then
[
(pkgs.writeShellScript "maubot-prepare-config" ''
set -e
# Ensure config directory exists
${pkgs.coreutils}/bin/mkdir -p /var/lib/maubot/config
# Use text substitution to preserve YAML structure while injecting secrets
${pkgs.python3.withPackages (ps: [ ps.pyyaml ])}/bin/python3 << 'EOF'
import os
import re
# Read base configuration as text
with open('/etc/maubot/config.yaml', 'r') as f:
config_text = f.read()
# Read secrets from CREDENTIALS_DIRECTORY if available
creds_dir = os.environ.get('CREDENTIALS_DIRECTORY')
if creds_dir:
# Replace admin password placeholder
admin_password_file = os.path.join(creds_dir, 'admin-password')
if os.path.exists(admin_password_file):
with open(admin_password_file, 'r') as f:
admin_password = f.read().strip()
config_text = config_text.replace('REPLACE_ADMIN_PASSWORD', admin_password)
# Replace secret key placeholder
secret_key_file = os.path.join(creds_dir, 'secret-key')
if os.path.exists(secret_key_file):
with open(secret_key_file, 'r') as f:
secret_key = f.read().strip()
config_text = config_text.replace('REPLACE_SECRET_KEY', secret_key)
# Write runtime config with restrictive permissions
os.umask(0o077) # Ensure only owner can read
with open('/var/lib/maubot/config/config.yaml', 'w') as f:
f.write(config_text)
EOF
'')
]
else
[
(pkgs.writeShellScript "maubot-prepare-config-simple" ''
${pkgs.coreutils}/bin/mkdir -p /var/lib/maubot/config
${pkgs.coreutils}/bin/cp /etc/maubot/config.yaml /var/lib/maubot/config/config.yaml
'')
];
# Start Maubot with runtime config
ExecStart = "${maubotEnv}/bin/maubot -c /var/lib/maubot/config/config.yaml";
# Restart policy
Restart = "always";
RestartSec = 10;
# Security hardening following established patterns
NoNewPrivileges = true;
ProtectSystem = "strict";
ProtectHome = true;
# PrivateTmp disabled to allow access to /run/maubot
PrivateTmp = false;
PrivateDevices = true;
ProtectKernelTunables = true;
ProtectKernelModules = true;
ProtectControlGroups = true;
# Allow writing to data, log, and runtime directories
ReadWritePaths = [
"/var/lib/maubot"
"/var/log/maubot"
"/run/maubot"
];
# Network restrictions
RestrictAddressFamilies = [ "AF_INET" "AF_INET6" "AF_UNIX" ];
# System calls - Python application needs broader access
SystemCallArchitectures = "native";
SystemCallFilter = [
"@system-service"
"@network-io"
"@file-system"
"~@privileged"
];
# Resource limits
MemoryMax = "512M";
CPUWeight = 50; # Lower priority than Matrix server
IOWeight = 50;
# Process security
UMask = "0027";
LockPersonality = true;
RestrictRealtime = true;
RestrictSUIDSGID = true;
RemoveIPC = true;
# Logging
StandardOutput = "journal";
StandardError = "journal";
SyslogIdentifier = "maubot";
};
};
# Directory permissions
systemd.tmpfiles.rules = [
"d /var/lib/maubot 0755 maubot maubot -"
"d /var/lib/maubot/plugins 0755 maubot maubot -"
"d /var/lib/maubot/trash 0755 maubot maubot -"
"d /var/log/maubot 0755 maubot maubot -"
"d /run/maubot 0700 maubot maubot -"
] ++ (map (plugin:
"L+ /var/lib/maubot/plugins/${baseNameOf plugin} - - - - ${plugin}"
) cfg.plugins);
# Health check service
systemd.services.maubot-health = {
description = "Maubot health check";
after = [ "maubot.service" ];
serviceConfig = {
Type = "oneshot";
User = "nobody";
Group = "nogroup";
ExecStart = pkgs.writeShellScript "maubot-health" ''
# Check if Maubot management interface is responding
# Note: All maubot endpoints require auth, so 401 is expected and healthy
HTTP_CODE=$(${pkgs.curl}/bin/curl -s -o /dev/null -w "%{http_code}" "http://localhost:${toString cfg.port}/_matrix/maubot/v1/login" 2>/dev/null)
if [ "$HTTP_CODE" = "401" ] || [ "$HTTP_CODE" = "200" ]; then
echo "Maubot health check: OK (HTTP $HTTP_CODE)"
exit 0
else
echo "Maubot health check: FAILED (HTTP $HTTP_CODE)"
exit 1
fi
'';
StandardOutput = "journal";
StandardError = "journal";
};
};
systemd.timers.maubot-health = {
description = "Maubot health check timer";
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*:0/5"; # Every 5 minutes
Persistent = true;
};
};
# Health check failure handling - restart service if health check fails consistently
systemd.services.maubot-health-restart = {
description = "Restart Maubot on health check failure";
serviceConfig = {
Type = "oneshot";
ExecStart = pkgs.writeShellScript "maubot-health-restart" ''
# Check if maubot health service failed recently
if systemctl is-failed maubot-health.service >/dev/null 2>&1; then
echo "Maubot health check failed, restarting maubot service"
systemctl restart maubot.service
# Reset health check failure state
systemctl reset-failed maubot-health.service
fi
'';
User = "root";
StandardOutput = "journal";
StandardError = "journal";
};
};
systemd.timers.maubot-health-restart = {
description = "Monitor Maubot health check failures and restart if needed";
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*:2/10"; # Every 10 minutes, offset from health check
Persistent = true;
};
};
# Maubot management interface only accessible via SSH tunnel (localhost:29316)
# Do NOT expose to internet - admin UI has no rate limiting
};
}

View file

@ -0,0 +1,9 @@
enabled: true
max_file_size: 50000000
supported_formats:
- mp4
- jpg
- jpeg
- png
- webp
allowed_rooms: []

View file

@ -0,0 +1,553 @@
import re
import asyncio
import tempfile
import os
from typing import Optional, Tuple
from urllib.parse import urlparse
from mautrix.types import EventType, MediaMessageEventContent, MessageType, TextMessageEventContent, Format
from mautrix.util.config import BaseProxyConfig
from maubot import Plugin, MessageEvent
from maubot.handlers import event
import aiohttp
import yt_dlp
try:
import instaloader
HAS_INSTALOADER = True
except ImportError:
HAS_INSTALOADER = False
class Config(BaseProxyConfig):
def do_update(self, helper):
helper.copy("enabled")
helper.copy("max_file_size")
helper.copy("supported_formats")
helper.copy("allowed_rooms")
class SocialMediaBot(Plugin):
"""
Maubot plugin for automatic social media content extraction.
Detects Instagram and TikTok URLs in Matrix messages and automatically
extracts and uploads the media content to the room.
"""
@classmethod
def get_config_class(cls):
return Config
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Set up logging levels
self.log.info("SocialMediaBot initialized")
if self.config is not None:
self.log.debug(f"Configuration: enabled={self.config['enabled']}, "
f"max_file_size={self.config['max_file_size']}, "
f"allowed_rooms={self.config['allowed_rooms']}")
# Compile URL patterns for Instagram and TikTok
self.instagram_pattern = re.compile(
r'https?://(?:www\.)?instagram\.com/(?:p|reel|stories)/[\w-]+/?'
)
self.tiktok_pattern = re.compile(
r'https?://(?:www\.)?(?:tiktok\.com|vm\.tiktok\.com)/(?:@[\w.-]+/video/\d+|[\w-]+)/?'
)
# Combined pattern for finding any social media URL
self.combined_pattern = re.compile(
r'https?://(?:www\.)?(?:instagram\.com/(?:p|reel|stories)/[\w-]+|'
r'(?:tiktok\.com/@[\w.-]+/video/\d+|vm\.tiktok\.com/[\w-]+))/?'
)
@event.on(EventType.ROOM_MESSAGE)
async def handle_message(self, event: MessageEvent) -> None:
"""
Process incoming Matrix room messages for social media URLs.
Implements:
- FR-021: Only processes messages from allowed_rooms
- FR-022: Silently ignores non-allowed rooms with debug logging
- FR-014: Ignores own messages to prevent loops
- FR-015: Processes only first URL by text position
"""
# FR-021/FR-022: Check allowed_rooms filter (early return with debug logging)
self.log.info(f"Received message in room {event.room_id}")
if self.config is None:
self.log.info(f"Config not loaded, ignoring message in {event.room_id}")
return
allowed_rooms = self.config.get("allowed_rooms", [])
self.log.info(f"allowed_rooms config: {allowed_rooms}")
if not allowed_rooms:
self.log.info(f"Bot disabled (empty allowed_rooms list), ignoring message in {event.room_id}")
return
if event.room_id not in allowed_rooms:
self.log.info(f"Room {event.room_id} not in allowed_rooms, ignoring message")
return
# FR-014: Ignore messages sent by the bot itself (prevent loops)
if event.sender == self.client.mxid:
self.log.debug(f"Ignoring own message from {event.sender}")
return
# Check if message has body content
if not hasattr(event.content, 'body') or not event.content.body:
return
message_text = event.content.body
# FR-015: Find first URL by text position (left-to-right scan, platform-agnostic)
url, platform = self._find_first_url(message_text)
if url:
self.log.info(f"Processing first URL from message: {url} (platform: {platform})")
await self.process_url(event, url, platform)
def _find_first_url(self, text: str) -> Tuple[Optional[str], Optional[str]]:
"""
Find the first social media URL in text by position (left-to-right).
Returns:
Tuple of (url, platform) or (None, None) if no URL found
Implements FR-015: First-URL detection by text position, platform-agnostic
"""
# Find all matches with their positions
instagram_matches = [(m.group(), m.start(), 'instagram')
for m in self.instagram_pattern.finditer(text)]
tiktok_matches = [(m.group(), m.start(), 'tiktok')
for m in self.tiktok_pattern.finditer(text)]
# Combine and sort by position
all_matches = instagram_matches + tiktok_matches
if not all_matches:
return None, None
all_matches.sort(key=lambda x: x[1]) # Sort by position
# Log any additional URLs at debug level (FR-015)
if len(all_matches) > 1:
extra_urls = [m[0] for m in all_matches[1:]]
self.log.debug(f"Found {len(all_matches)} URLs, ignoring extras: {extra_urls}")
first_url, _, platform = all_matches[0]
return first_url, platform
async def process_url(self, event: MessageEvent, url: str, platform: str) -> None:
"""
Process a single social media URL with progressive status message editing.
Implements:
- FR-011: Progressive message editing from "Processing..." to final result
- FR-012: Inline media embedding with fallback to separate message
- FR-013: Error messages via status message editing
Args:
event: Matrix message event
url: Single URL string to process
platform: Platform identifier ("instagram" or "tiktok")
"""
platform_name = platform.title()
platform_emoji = "📸" if platform == "instagram" else "🎵"
# FR-011: Send initial "Processing..." message and store event_id
try:
status_msg = await event.respond(f"🔍 Processing {platform_name} content...")
status_event_id = status_msg.event_id
except Exception as e:
self.log.error(f"Failed to send initial status message: {e}")
return
try:
# Extract content
content_info = await self.extract_content_with_ytdlp(url)
if content_info:
# Log successful extraction
self.log.info(f"Extracted {platform} content: type={content_info.get('type')}, "
f"size={len(content_info.get('content', []))}, "
f"dimensions={content_info.get('width')}x{content_info.get('height')}, "
f"duration={content_info.get('duration')}")
# Upload media and edit status message with result
content_info['platform'] = platform
content_info['platform_emoji'] = platform_emoji
await self.upload_media_and_edit_message(event.room_id, status_event_id, content_info)
else:
# FR-013: Edit status message to show error
error_msg = f"❌ Failed to extract content: Content unavailable or private"
await self.edit_status_message(event.room_id, status_event_id, error_msg)
self.log.warning(f"Failed to extract content from {url}")
except Exception as e:
# FR-013: Edit status message with error description
self.log.exception(f"Error processing {platform} URL {url}")
error_msg = f"❌ Failed to extract content: {str(e)}"
await self.edit_status_message(event.room_id, status_event_id, error_msg)
async def extract_content_with_ytdlp(self, url: str) -> Optional[dict]:
"""
Extract media content from social media URL using yt-dlp with fallback.
Implements graceful degradation (Constitution III):
- Primary: yt-dlp (works for both Instagram and TikTok)
- Fallback: instaloader (Instagram only)
- Specific error handling for private content, deleted posts, rate limits
Args:
url: Social media URL string
Returns:
dict with content data or None if extraction fails
Raises:
Exception with specific error messages for known failure cases
"""
# Try primary extraction with yt-dlp
try:
return await self.extract_with_ytdlp(url)
except Exception as e:
error_str = str(e).lower()
# Check for specific error cases
if any(keyword in error_str for keyword in ['private', 'login', 'auth', 'permission']):
self.log.warning(f"yt-dlp: Private content or authentication required for {url}")
raise Exception("Private content or authentication required")
elif any(keyword in error_str for keyword in ['not found', '404', 'deleted', 'unavailable']):
self.log.warning(f"yt-dlp: Content deleted or unavailable for {url}")
raise Exception("Content deleted or unavailable")
elif any(keyword in error_str for keyword in ['rate limit', 'too many requests', '429']):
self.log.warning(f"yt-dlp: Rate limited for {url}")
raise Exception("Rate limit exceeded - please try again later")
else:
self.log.warning(f"yt-dlp extraction failed for {url}: {e}")
# For Instagram URLs, try instaloader as fallback
if 'instagram.com' in url and HAS_INSTALOADER:
try:
return await self.extract_with_instaloader(url)
except Exception as e:
error_str = str(e).lower()
# Check for specific error cases in fallback
if any(keyword in error_str for keyword in ['private', 'login', 'auth']):
raise Exception("Private content or authentication required")
elif any(keyword in error_str for keyword in ['not found', '404', 'deleted']):
raise Exception("Content deleted or unavailable")
elif any(keyword in error_str for keyword in ['rate limit', 'too many']):
raise Exception("Rate limit exceeded - please try again later")
else:
self.log.warning(f"instaloader extraction failed for {url}: {e}")
return None
async def extract_with_ytdlp(self, url: str) -> Optional[dict]:
"""
Extract content using yt-dlp library.
Implements:
- Constitution II: Async-first via run_in_executor
- Constitution IV: Temporary file cleanup via context managers
"""
def _extract():
# Constitution IV: Context manager ensures cleanup
with tempfile.TemporaryDirectory() as temp_dir:
ydl_opts = {
'outtmpl': os.path.join(temp_dir, '%(title)s.%(ext)s'),
'writeinfojson': True,
'writethumbnail': True,
'quiet': True,
'no_warnings': True,
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url, download=True)
# Find downloaded files
files = os.listdir(temp_dir)
media_file = None
thumbnail_file = None
# Prioritize video files over images
video_file = None
image_file = None
for file in files:
full_path = os.path.join(temp_dir, file)
if file.endswith('.mp4'):
video_file = full_path
elif file.endswith(('.jpg', '.jpeg', '.png')) and 'thumbnail' not in file.lower():
image_file = full_path
elif 'thumbnail' in file.lower() or file.endswith('.webp'):
thumbnail_file = full_path
# Prefer video over image
media_file = video_file or image_file
if media_file and os.path.exists(media_file):
# Read file content into memory
with open(media_file, 'rb') as f:
content = f.read()
thumbnail_content = None
if thumbnail_file and os.path.exists(thumbnail_file):
with open(thumbnail_file, 'rb') as f:
thumbnail_content = f.read()
return {
'type': 'video' if media_file.endswith('.mp4') else 'image',
'content': content,
'thumbnail': thumbnail_content,
'filename': os.path.basename(media_file),
'title': info.get('title', 'Social Media Content'),
'description': info.get('description', ''),
'uploader': info.get('uploader', ''),
'width': info.get('width'),
'height': info.get('height'),
'duration': info.get('duration'),
}
return None
# Constitution II: Run blocking operation in thread pool (non-blocking)
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, _extract)
async def extract_with_instaloader(self, url: str) -> Optional[dict]:
"""
Extract Instagram content using instaloader library (fallback).
Only called for Instagram URLs when yt-dlp fails.
"""
if not HAS_INSTALOADER:
return None
def _extract():
with tempfile.TemporaryDirectory() as temp_dir:
loader = instaloader.Instaloader(
download_videos=True,
download_comments=False,
save_metadata=False,
download_geotags=False,
quiet=True,
)
# Extract shortcode from URL
shortcode_match = re.search(r'/(?:p|reel)/([^/]+)', url)
if not shortcode_match:
return None
shortcode = shortcode_match.group(1)
try:
post = instaloader.Post.from_shortcode(loader.context, shortcode)
loader.download_post(post, temp_dir)
# Find downloaded files
files = os.listdir(temp_dir)
media_file = None
for file in files:
if file.endswith(('.mp4', '.jpg', '.jpeg')):
media_file = os.path.join(temp_dir, file)
break
if media_file and os.path.exists(media_file):
with open(media_file, 'rb') as f:
content = f.read()
return {
'type': 'video' if media_file.endswith('.mp4') else 'image',
'content': content,
'filename': os.path.basename(media_file),
'title': f"Post by @{post.owner_username}",
'description': post.caption or '',
'uploader': post.owner_username,
}
except Exception as e:
self.log.error(f"Instaloader error: {e}")
return None
return None
# Run in thread pool
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, _extract)
async def edit_status_message(self, room_id: str, event_id: str, new_content: str,
media_uri: Optional[str] = None) -> None:
"""
Edit a previously sent status message with updated content.
Implements FR-011, FR-012: Progressive message editing
Args:
room_id: Matrix room ID where message was sent
event_id: Event ID of message to edit
new_content: Updated message content (text, formatted with markdown)
media_uri: Optional mxc:// URI for inline media embedding
"""
try:
# Create edited message content
content = TextMessageEventContent(
msgtype=MessageType.TEXT,
body=new_content,
format=Format.HTML,
formatted_body=new_content.replace('\n', '<br>').replace('**', '<strong>').replace('**', '</strong>'),
)
# Set up the edit relationship
content["m.new_content"] = {
"msgtype": "m.text",
"body": new_content,
}
content["m.relates_to"] = {
"rel_type": "m.replace",
"event_id": event_id,
}
# FR-012: Attempt inline media embedding if media_uri provided
# Note: This is a SHOULD - fallback handled in upload_media_and_edit_message
if media_uri:
self.log.debug(f"Attempting inline media embedding in edited message")
# Media embedding in edits depends on client/bridge support
# For now, we'll use separate media message as fallback
await self.client.send_message_event(room_id, EventType.ROOM_MESSAGE, content)
except Exception as e:
self.log.error(f"Failed to edit status message: {e}")
# Don't raise - graceful degradation
async def upload_media_and_edit_message(self, room_id: str, event_id: str,
content_info: dict) -> None:
"""
Upload media to Matrix and edit status message to show result.
Implements:
- FR-008: Upload extracted content to Matrix room
- FR-009: Preserve video metadata (dimensions, duration, thumbnail)
- FR-010: Preserve content metadata (title, caption, uploader)
- FR-012: SHOULD inline embed, MAY use separate message
- FR-016: Enforce file size and format limits
Args:
room_id: Matrix room ID
event_id: Status message event ID to edit
content_info: Content dictionary from extraction
"""
try:
filename = content_info['filename']
content_bytes = content_info['content']
# FR-016: Validate file size (50MB max by default)
max_file_size = self.config.get("max_file_size", 50000000) if self.config else 50000000
if len(content_bytes) > max_file_size:
error_msg = f"❌ File too large: {len(content_bytes) / 1000000:.1f}MB (max: {max_file_size / 1000000}MB)"
await self.edit_status_message(room_id, event_id, error_msg)
self.log.warning(f"File size {len(content_bytes)} exceeds limit {max_file_size}")
return
# FR-016: Validate file format
supported_formats = self.config.get("supported_formats", ["mp4", "jpg", "jpeg", "png", "webp"]) if self.config else ["mp4", "jpg", "jpeg", "png", "webp"]
file_ext = filename.rsplit('.', 1)[-1].lower() if '.' in filename else ''
if file_ext not in supported_formats:
error_msg = f"❌ Unsupported format: .{file_ext} (supported: {', '.join(supported_formats)})"
await self.edit_status_message(room_id, event_id, error_msg)
self.log.warning(f"File format .{file_ext} not in supported formats {supported_formats}")
return
# Determine MIME type from filename extension
if filename.endswith('.mp4'):
mime_type = 'video/mp4'
msgtype = MessageType.VIDEO
elif filename.endswith(('.jpg', '.jpeg')):
mime_type = 'image/jpeg'
msgtype = MessageType.IMAGE
elif filename.endswith('.png'):
mime_type = 'image/png'
msgtype = MessageType.IMAGE
elif filename.endswith('.webp'):
mime_type = 'image/webp'
msgtype = MessageType.IMAGE
else:
mime_type = 'application/octet-stream'
msgtype = MessageType.FILE
# Upload binary content to Matrix homeserver (gets mxc:// URI)
media_uri = await self.client.upload_media(
content_info['content'],
mime_type=mime_type,
filename=filename,
)
# FR-012: Attempt inline embedding (SHOULD), fall back to separate message (MAY)
# Current implementation: Use separate media message due to bridge limitations
# Future enhancement: Try inline embedding first, fall back if unsupported
# Create media message content
media_content = MediaMessageEventContent(
msgtype=msgtype,
body=filename,
url=media_uri,
)
# FR-009: Preserve video metadata (dimensions, duration, thumbnail)
if msgtype == MessageType.VIDEO:
media_content.info = {
'mimetype': mime_type,
'size': len(content_info['content']),
}
if content_info.get('width'):
media_content.info['w'] = content_info['width']
if content_info.get('height'):
media_content.info['h'] = content_info['height']
if content_info.get('duration'):
# Convert to milliseconds
media_content.info['duration'] = int(content_info['duration'] * 1000)
# Upload thumbnail if available
if content_info.get('thumbnail'):
thumbnail_uri = await self.client.upload_media(
content_info['thumbnail'],
mime_type='image/jpeg',
filename=f"thumb_{filename}",
)
media_content.info['thumbnail_url'] = thumbnail_uri
# Send the media message
await self.client.send_message(room_id, media_content)
# FR-010: Format final message with platform emoji, title, caption, creator attribution
platform_emoji = content_info.get('platform_emoji', '📸')
title = content_info.get('title', 'Social Media Content')
description = content_info.get('description', '')
uploader = content_info.get('uploader', '')
# Build caption text
caption_parts = [f"{platform_emoji} **{title}**"]
if description:
caption_parts.append(f"\n\n{description}")
if uploader:
caption_parts.append(f"\n\n👤 By: @{uploader}")
caption = ''.join(caption_parts)
# Edit status message to show success
await self.edit_status_message(room_id, event_id, caption)
self.log.info(f"Successfully uploaded and sent {msgtype} content to {room_id}")
except Exception as e:
self.log.exception("Error uploading content to Matrix")
error_msg = f"❌ Error uploading content: {str(e)}"
await self.edit_status_message(room_id, event_id, error_msg)

View file

@ -0,0 +1,61 @@
# Instagram Content Bot for Maubot
# Automatically detects Instagram URLs in messages and posts the content to Matrix
# Target maubot version
maubot: 0.1.0
# The unique ID for the plugin
id: sna.instagram
# A PEP 440 compliant version string
version: 1.0.0
# The SPDX license identifier for the plugin
license: MIT
# The list of modules to load from the plugin archive
modules:
- instagram_bot
# The main class of the plugin
main_class: SocialMediaBot
# Whether or not instances need a database
database: false
# Dependencies required for Instagram content extraction
dependencies:
- yt-dlp>=2023.1.6
- instaloader>=4.9.0
- aiohttp>=3.8.0
# Soft dependencies (optional but recommended)
soft_dependencies:
- pillow>=9.0.0
# Extra files to include in the plugin package
extra_files:
- README.md
# Plugin metadata
meta:
display_name: "Instagram Content Bot"
description: "Automatically detects Instagram URLs and posts the content to Matrix rooms"
author: "Claude Code"
homepage: "https://github.com/maubot/maubot"
# Plugin configuration
config:
enabled: true
max_file_size: 50000000 # 50MB max file size
supported_formats:
- mp4
- jpg
- jpeg
- png
- webp
# Room access control (safety feature)
# List of Matrix room IDs where bot is allowed to operate
# Format: ["!roomid1:server.domain", "!roomid2:server.domain"]
# Empty list = bot disabled in all rooms (safety default)
allowed_rooms: []

Binary file not shown.

View file

@ -2,6 +2,8 @@ matrix-registration-token: ENC[AES256_GCM,data:H7BgtpsDLOYcywjOHru+u7t6BCbqhFrmP
acme-email: ENC[AES256_GCM,data:+tN+nRfn2kpGLdF3Vg==,iv:uZvSw4viBWCTT35C718cLOCrSLM1EnkmEZH644aVuPI=,tag:tf6+7ubiOLVj7k4rfNI3lQ==,type:str]
slack-oauth-token: ""
slack-app-token: ""
maubot-admin-password: ENC[AES256_GCM,data:Omh6VFsnlLgS+UktM5qHjj3+VK84YmMgWcQCvkiMchfb621RV0LBg1ZB3tg=,iv:cINVFlHJJGkAcasK8BJr3Sd2zqkpQOyRgF+V0JhBJXE=,tag:PnS9TdtuR/87yQfttJTLow==,type:str]
maubot-secret-key: ENC[AES256_GCM,data:krq8zjZelAYRNrFs+DYqh7j0bDd80YKRkro88hGiAxJOBCuFV6PdyyUKgqdSuGMhoFhZtMPmRKOQvAxKclOBEQ==,iv:PePSXEOcBKcReXYBzicDhGQ/yxJIZ/TNzARg4z9G7dA=,tag:ihVw9PAXScoZgrSzWkAMdQ==,type:str]
sops:
age:
- recipient: age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q
@ -22,7 +24,7 @@ sops:
TzI2NGdaVHd1RFZWRE50bjZ0cHhBOXMKRXVYFMNxNIX+8uVxf1X4hu+OfOKKs2TK
A2qdAMJIfdy9f7SPVrPnrGMIwl/prxIkbSRwYC/UNK5NNkjMrGoSwg==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2025-10-02T21:33:16Z"
mac: ENC[AES256_GCM,data:B/9XWKEYWv00+xfcnsrqqRvM7mf/1/VMxeaW9V0HoD32Wv8EvjUIOptU4VV/iDHb1zGCzd41XVOulowlKfXbcuDbA2Pi8cVT38F9ZuxSyCjpssDnPYj816SvXNp5gwCHxfvIp32ekrQ7PNQLZVWhHzL/H1doalXv9XHO1xUY6X8=,iv:NKjxEOG0SlJQurfb9f2GRYUFDlNk0mjxpci87r0vmX8=,tag:sGrhVfwq18QI6MS7L5x31w==,type:str]
lastmodified: "2025-10-27T04:21:51Z"
mac: ENC[AES256_GCM,data:k1aBVnSUnpgq1y+AQjZFB7AXmQe2r/SpSVl9xVsJku2/lehBfY6vRGZutRHV4iTaB3FmxwgGCOV29gPZ5NGUQDf9tg5hMacZOREJGd7lMWoSlZbCGjjkOQEvpKLq3kJNuV66Lb1LzKQtR6ws5k/EmnXneyDtjuEbFs4AZZi+WRE=,iv:zc58CMvJqPsKbANOCGLBuo+AiUnoF4Wx3Z33j6a+sfI=,tag:ENek+3uial24ladKBqW3sg==,type:str]
unencrypted_suffix: _unencrypted
version: 3.10.2

View file

@ -0,0 +1,360 @@
# Implementation Plan: Maubot Integration
**Branch**: `003-maubot-integration` | **Date**: 2025-10-26 | **Spec**: [spec.md](./spec.md)
**Input**: Feature specification from `/specs/003-maubot-integration/spec.md`
## Summary
Extract maubot bot framework from ops-base and deploy to ops-jrz1 with Instagram bot plugin. Primary approach: adapt proven ops-base maubot.nix module to ops-jrz1 patterns (conduwuit homeserver, sops-nix secrets, dev-platform wrapper), using registration token auth instead of shared secret. Instagram content fetching via yt-dlp (community scraping). Deployment validates single-instance initially, architecture supports 3+ concurrent instances.
## Technical Context
**Language/Version**: Python 3.11 (maubot runtime environment)
**Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite, sops-nix
**Storage**: SQLite `/var/lib/maubot/bot.db` (service state), per-bot databases (plugin-specific)
**Testing**: Manual QA on production VPS (no staging environment), 7-day validation period
**Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49, x86_64-linux)
**Project Type**: Infrastructure service (NixOS module)
**Performance Goals**: <5 second Instagram content fetch (SC-001), 99% uptime over 7 days (SC-003), <2 second management UI load (SC-007)
**Constraints**: Localhost-only management interface (SSH tunnel required), single Instagram bot instance initially, conduwuit registration token auth (no shared secret)
**Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002), small team usage (<20 Instagram fetches/day)
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
### Principle I: Declarative Infrastructure ✅ PASS
**Compliance**:
- All maubot configuration defined in NixOS modules (maubot.nix, dev-services.nix)
- No imperative modifications required (service managed via nixos-rebuild)
- Configuration changes deployed declaratively
- Rollback via NixOS generations
**Evidence**:
- Module adaptation documented in research.md (ops-base → ops-jrz1 pattern)
- Secrets via sops-nix (declarative encryption)
- Runtime config generated from NixOS module options
### Principle II: Security First ✅ PASS
**Compliance**:
- All secrets encrypted via sops-nix (maubot-admin-password, maubot-secret-key, registration-token)
- Runtime secrets in /run/secrets/ (tmpfs, ephemeral)
- No secrets in Nix store or configuration files (LoadCredential pattern)
- Management interface localhost-only (SSH tunnel required per FR-003)
**Evidence**:
- Secrets management pattern documented in data-model.md
- File permissions: 0400 for secrets, 0600 for config with credentials
- Pre-commit hooks scan for secret leaks (inherited from platform)
### Principle III: Presentable State Over Speed ✅ PASS
**Compliance**:
- Comprehensive specification (spec.md with 16 functional requirements, 4 user stories)
- Complete documentation suite (research.md, data-model.md, quickstart.md)
- 7-day validation period required before announcement (per constitution)
- Success criteria measurable and testable (SC-001 through SC-008)
**Evidence**:
- Spec clarification session resolved all ambiguities (5 questions answered)
- Quickstart.md provides deployment runbook with troubleshooting
- Testing checklist in quickstart.md validates all success criteria
### Principle IV: Quality Over Quick Wins ✅ PASS
**Compliance**:
- Extracted proven pattern from ops-base (391-line maubot.nix module in production)
- Research phase documented alternatives (yt-dlp vs instaloader, SQLite vs PostgreSQL)
- Follows established ops-jrz1 patterns (mautrix-slack module structure, sops-nix secrets)
- Spec-kit workflow followed (specify → clarify → plan → tasks → implement)
**Evidence**:
- Research.md documents 3 major technical decisions with rationale
- Module adaptation strategy preserves ops-base proven components
- Constitution check validates pattern consistency
**Gate Status**: ✅ ALL CHECKS PASS - Proceed to implementation
## Project Structure
### Documentation (this feature)
```text
specs/003-maubot-integration/
├── spec.md # Feature specification (✅ complete)
├── plan.md # This file (✅ complete)
├── research.md # Phase 0 output (✅ complete)
├── data-model.md # Phase 1 output (✅ complete)
├── quickstart.md # Phase 1 output (✅ complete)
├── checklists/
│ └── requirements.md # Quality validation (✅ complete)
└── tasks.md # Phase 2 output (/speckit.tasks - pending)
```
### Source Code (repository root)
**Structure Decision**: Infrastructure service (NixOS module) - no application source code
```text
/home/dan/proj/ops-jrz1/
├── modules/
│ ├── maubot.nix # Low-level maubot service module (to create)
│ ├── dev-services.nix # High-level wrapper (to update)
│ ├── mautrix-slack.nix # Reference pattern (existing)
│ └── matrix-continuwuity.nix # Matrix homeserver (existing)
├── hosts/
│ └── ops-jrz1.nix # VPS configuration (to update: enable maubot)
├── secrets/
│ └── secrets.yaml # Encrypted secrets (to update: add maubot secrets)
├── specs/
│ └── 003-maubot-integration/ # This feature directory
└── docs/
├── platform-vision.md # North star document (reference)
├── CLAUDE.md # Development guidelines (to update)
└── worklogs/ # Session logs (to create after deployment)
```
**External source files** (to copy/adapt):
```text
/home/dan/proj/ops-base/
└── vm-configs/modules/
└── maubot.nix # Source module (391 lines, proven in production)
/home/dan/proj/sna/
├── instagram_bot.py # Instagram bot source (11,643 bytes)
└── sna-instagram-bot.mbp # Packaged plugin (ready to upload)
```
**Runtime state** (on VPS after deployment):
```text
/var/lib/maubot/
├── config/
│ └── config.yaml # Generated runtime config
├── plugins/
│ └── sna.instagram-v1.0.0.mbp # Uploaded plugin
├── bot.db # SQLite database (service state)
└── trash/ # Deleted plugins
/run/secrets/ # sops-nix decrypted secrets (tmpfs)
├── maubot-admin-password
├── maubot-secret-key
└── matrix-registration-token
```
## Deployment Strategy
**Context**: ops-jrz1 is a live production server with critical services (Matrix homeserver, Slack bridge, PostgreSQL, Forgejo, nginx). Deployment must be incremental with validation checkpoints.
### Live Server Risk Assessment
**Critical Services** (must remain operational):
- conduwuit Matrix homeserver (8008) - All Matrix functionality
- mautrix-slack (29319) - ~50 Slack channels syncing bidirectionally
- PostgreSQL (5432) - Bridge database (172KB, critical state)
- Forgejo (git.clarun.xyz) - Code hosting
- nginx (443) - TLS termination for all public services
**New Service** (isolated):
- maubot (29316, localhost-only) - New SQLite database, different port, no appservice registration
### Incremental Deployment Approach
Deploy in 4 phases with git commits as rollback points:
**Phase 1: Module Files (No-Op Deployment)**
- Add modules/maubot.nix (adapted from ops-base)
- Add services.dev-platform.maubot wrapper to modules/dev-services.nix (options + config)
- **Do NOT enable**: services.dev-platform.maubot.enable remains unset
- Deploy → Verify no services changed → Git commit
- **Rollback**: nixos-rebuild switch --rollback OR git revert
**Phase 2: Secrets (Preparation)**
- Add maubot-admin-password, maubot-secret-key to secrets/secrets.yaml
- Add sops.secrets declarations to hosts/ops-jrz1.nix
- **Still disabled**: services.dev-platform.maubot.enable remains unset
- Deploy → Verify secrets decrypt to /run/secrets/ → Git commit
- **Rollback**: nixos-rebuild switch --rollback OR git revert
**Phase 3: Service Start (Module Only)**
- Enable in hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = true
- Deploy → Verify maubot.service starts → Verify existing services healthy → Git commit
- **Rollback**: Set enable = false + redeploy OR nixos-rebuild switch --rollback
**Phase 4: Bot Deployment (Manual, Reversible)**
- SSH tunnel to management UI (localhost:29316)
- Create bot Matrix user via registration token
- Upload Instagram plugin (.mbp file)
- Create bot instance (test in private room first)
- **Rollback**: Delete bot instance via web UI (no code changes to revert)
### Validation Checkpoints
After each phase deployment:
```bash
# 1. Verify existing services still healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
# 2. Check for errors in last 5 minutes (excluding maubot)
ssh root@45.77.205.49 'journalctl --since "5 minutes ago" | grep -E "ERR|CRIT|FTL" | grep -v maubot'
# 3. Test Slack bridge (post in Slack, verify appears in Matrix)
# Phase-specific validations documented in tasks.md
```
### Rollback Procedures
**NixOS Generation Rollback** (fastest):
```bash
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
```
**Git Revert** (if committed):
```bash
git revert HEAD
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
```
**Service Disable** (Phase 3 specific):
```nix
# In hosts/ops-jrz1.nix
services.dev-platform.maubot.enable = false; # Then redeploy
```
### Risk Mitigation
**Known risks from mautrix-slack deployment** (2025-10-26):
1. IPv4 vs localhost: Always use 127.0.0.1 (not localhost) in homeserverUrl
2. Conduwuit database corruption: Have database wipe procedure ready (low risk - fresh maubot install)
3. Port conflicts: Maubot uses 29316 (unique, no conflicts expected)
**Blast radius containment**:
- Phase 1 fail → Nix syntax errors only, no runtime impact
- Phase 2 fail → Secrets issue, no services affected
- Phase 3 fail → Maubot won't start, but Matrix/Slack/Forgejo unaffected (different ports, databases)
- Phase 4 fail → Bot instance only, delete via UI
### Success Criteria Per Phase
- **Phase 1**: Build succeeds, nixos-rebuild reports "no services changed"
- **Phase 2**: /run/secrets/maubot-* files exist with mode 0400, existing services healthy
- **Phase 3**: systemctl status maubot.service shows "active (running)", management UI accessible via SSH tunnel
- **Phase 4**: Bot responds to Instagram URL in <5 seconds (SC-001)
### Update/Upgrade Procedure (State-Preserving)
After initial deployment, future updates must preserve runtime state in `/var/lib/maubot/`:
- `bot.db` - Service state (bot instances, plugin configurations)
- `plugins/` - Uploaded .mbp files
- `config/config.yaml` - Generated runtime config
**Typical update scenarios**:
**Scenario 1: Module Configuration Change** (e.g., change port, add new option)
```bash
# 1. Edit modules/dev-services.nix or hosts/ops-jrz1.nix
# 2. Deploy
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
# 3. Verify service restarted cleanly
ssh root@45.77.205.49 'systemctl status maubot.service'
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
# 4. Verify bot instances still running (check management UI)
# StateDirectory persists across service restarts
```
**Scenario 2: Maubot Version Upgrade** (nixpkgs update)
```bash
# 1. Update flake.lock or nixpkgs input
nix flake update
# 2. Review maubot changelog for breaking changes
# Check: https://github.com/maubot/maubot/releases
# 3. Deploy with build test first
nixos-rebuild build --flake .#ops-jrz1
# 4. If build succeeds, deploy
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
# 5. Monitor service restart
ssh root@45.77.205.49 'journalctl -u maubot.service -f'
# 6. Verify bot instances reconnected (check Matrix room for bot presence)
```
**Scenario 3: Plugin Update** (new Instagram bot version)
```bash
# Manual via web UI:
# 1. Upload new .mbp file (Plugins tab → Upload)
# 2. Maubot detects version change
# 3. Restart affected bot instances (Instances tab → Stop → Start)
# 4. Test in private room before production use
# No nixos-rebuild needed - plugin is runtime state
```
**Scenario 4: Add New Bot Instance** (e.g., second Instagram bot or new bot type)
```bash
# Manual via web UI:
# 1. Create bot Matrix user (via registration token)
# 2. Upload plugin if new type (Plugins tab)
# 3. Create bot instance (Instances tab → Add instance)
# 4. Configure and enable
# No nixos-rebuild needed - bot instances are runtime state
```
**State Preservation Guarantees**:
- NixOS StateDirectory (`/var/lib/maubot/`) persists across:
- Service restarts (systemctl restart maubot.service)
- System reboots
- Module configuration changes
- Maubot version upgrades (unless database schema incompatible)
- StateDirectory only wiped if:
- Explicitly deleted manually
- Service definition changes StateDirectory path
- Major maubot version with incompatible schema (rare, documented in release notes)
**Rollback with State**:
```bash
# NixOS generation rollback preserves StateDirectory
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
# Bot instances resume with previous configuration
# Database and plugins unchanged
```
**When to wipe database** (rare, destructive):
```bash
# Only if:
# 1. Database corruption detected
# 2. Major version migration requires clean slate (check release notes)
# 3. Testing fresh deployment
# Backup first:
ssh root@45.77.205.49 'tar czf /root/maubot-backup-$(date +%Y%m%d).tar.gz /var/lib/maubot/'
# Wipe:
ssh root@45.77.205.49 'systemctl stop maubot.service'
ssh root@45.77.205.49 'rm -rf /var/lib/maubot/bot.db'
ssh root@45.77.205.49 'systemctl start maubot.service'
# Reconfigure all bot instances via web UI
```
## Complexity Tracking
**No violations** - All constitution principles satisfied.
This feature follows established patterns:
- Declarative infrastructure (NixOS modules)
- Security first (sops-nix encrypted secrets)
- Presentable state (comprehensive spec, 7-day validation)
- Quality over speed (extract proven ops-base module, document alternatives)
**No simpler alternatives rejected** - Chosen approach is the simplest that meets requirements while maintaining quality standards.

View file

@ -0,0 +1,667 @@
# Quickstart: Maubot Integration Deployment
**Feature**: 003-maubot-integration
**Target**: ops-jrz1 VPS (45.77.205.49)
**Estimated time**: 2-3 hours
## Prerequisites
- [x] ops-jrz1 VPS operational with conduwuit Matrix homeserver
- [x] SSH access to VPS as root
- [x] sops-nix configured with server SSH host key
- [x] Local machine with Nix/NixOS
- [ ] Instagram bot .mbp file available (`/home/dan/proj/sna/sna-instagram-bot.mbp`)
---
## Phase 0: Secrets Preparation
### 1. Generate Maubot Secrets
```bash
# Generate admin password (32 characters)
MAUBOT_ADMIN_PW=$(openssl rand -base64 32)
# Generate secret key (48 bytes base64-encoded)
MAUBOT_SECRET=$(openssl rand -base64 48)
echo "Admin Password: $MAUBOT_ADMIN_PW"
echo "Secret Key: $MAUBOT_SECRET"
```
### 2. Add Secrets to sops-nix
```bash
cd /home/dan/proj/ops-jrz1
# Edit encrypted secrets
sops secrets/secrets.yaml
```
Add these entries:
```yaml
maubot-admin-password: "<paste MAUBOT_ADMIN_PW>"
maubot-secret-key: "<paste MAUBOT_SECRET>"
# matrix-registration-token already exists - reuse for bot creation
```
### 3. Declare Secrets in NixOS Config
Edit `hosts/ops-jrz1.nix`:
```nix
sops.secrets.maubot-admin-password = { mode = "0400"; };
sops.secrets.maubot-secret-key = { mode = "0400"; };
```
---
## Phase 1: Module Extraction and Adaptation
### 1. Extract maubot.nix from ops-base
```bash
cd /home/dan/proj/ops-jrz1
# Copy module from ops-base
cp /home/dan/proj/ops-base/vm-configs/modules/maubot.nix \
modules/maubot.nix
```
### 2. Adapt Module Namespace
Edit `modules/maubot.nix`:
**Change module namespace**:
```nix
# From:
options.services.matrix-vm.maubot = { ... };
# To:
options.services.maubot = { ... };
```
**Update homeserver URL**:
```nix
# From:
homeserverUrl = mkOption {
default = "http://127.0.0.1:6167"; # ops-base continuwuity port
};
# To:
homeserverUrl = mkOption {
default = "http://127.0.0.1:8008"; # ops-jrz1 conduwuit port
};
```
**Remove registration_secrets** (conduwuit doesn't support this):
```nix
# REMOVE this section from config generation (around line 140-150):
# registration_secrets:
# ${cfg.serverName}:
# url: ${cfg.homeserverUrl}
# secret: REPLACE_REGISTRATION_SECRET
```
**Update StateDirectory** (move from /run to /var/lib):
```nix
# Change config path from:
/run/maubot/config.yaml
# To:
/var/lib/maubot/config/config.yaml
```
### 3. Add dev-platform Wrapper
Edit `modules/dev-services.nix`:
Add options section:
```nix
options.services.dev-platform.maubot = {
enable = mkEnableOption "maubot bot framework";
port = mkOption {
type = types.port;
default = 29316;
description = "Management interface port";
};
};
```
Add config section:
```nix
config = mkIf cfg.maubot.enable {
services.maubot = {
enable = true;
homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}";
serverName = cfg.matrix.serverName;
port = cfg.maubot.port;
adminPasswordFile = config.sops.secrets.maubot-admin-password.path;
secretKeyFile = config.sops.secrets.maubot-secret-key.path;
};
};
```
---
## Phase 2: Incremental Deployment (Live Server)
⚠️ **IMPORTANT**: ops-jrz1 is a live production server with critical services:
- conduwuit Matrix homeserver - All Matrix functionality
- mautrix-slack bridge - ~50 Slack channels syncing
- PostgreSQL, Forgejo, nginx - Core infrastructure
Deploy incrementally with validation checkpoints. Each phase creates a git commit as a rollback point.
---
### Phase 2.1: Module Files Only (No-Op Deployment)
**Goal**: Add maubot module without starting any services
**Steps**:
1. Verify services.dev-platform.maubot.enable is NOT set in `hosts/ops-jrz1.nix`
2. Deploy:
```bash
cd /home/dan/proj/ops-jrz1
nixos-rebuild switch --flake .#ops-jrz1 \
--target-host root@45.77.205.49 \
--build-host localhost
```
**Validation**:
```bash
# Should report "no services changed" or only unrelated restarts
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
# Expected: Both active (running), no recent restarts
```
**Git checkpoint**:
```bash
git add modules/maubot.nix modules/dev-services.nix
git commit -m "Add maubot module files (service disabled)"
```
**Rollback if needed**:
```bash
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
```
---
### Phase 2.2: Secrets Preparation
**Goal**: Add secrets without starting service
**Steps**:
1. Verify services.dev-platform.maubot.enable is still NOT set
2. Deploy (secrets added in Phase 0 and Phase 1 config):
```bash
nixos-rebuild switch --flake .#ops-jrz1 \
--target-host root@45.77.205.49 \
--build-host localhost
```
**Validation**:
```bash
# Verify secrets decrypted
ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*'
# Expected:
# -r-------- 1 root root ... /run/secrets/maubot-admin-password
# -r-------- 1 root root ... /run/secrets/maubot-secret-key
# Verify existing services healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
```
**Git checkpoint**:
```bash
git add hosts/ops-jrz1.nix secrets/secrets.yaml
git commit -m "Add maubot secrets (service not enabled)"
```
---
### Phase 2.3: Enable Maubot Service
**Goal**: Start maubot service, verify isolation from existing services
**Steps**:
1. Enable in `hosts/ops-jrz1.nix`:
```nix
services.dev-platform.maubot = {
enable = true;
port = 29316;
};
```
2. Deploy:
```bash
nixos-rebuild switch --flake .#ops-jrz1 \
--target-host root@45.77.205.49 \
--build-host localhost
```
**Validation**:
```bash
# 1. Verify maubot service started
ssh root@45.77.205.49 'systemctl status maubot.service'
# Expected: active (running)
# 2. Check logs for errors
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
# Look for: "Starting maubot on port 29316", "Connected to homeserver"
# No ERROR or CRITICAL messages
# 3. Verify existing services still healthy
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
# 4. Test Slack bridge (critical validation)
# Post message in Slack → verify appears in Matrix within 5 seconds
# 5. Test management UI access
ssh -L 29316:localhost:29316 root@45.77.205.49
# In browser: http://localhost:29316/_matrix/maubot
# Should load login page
```
**Git checkpoint**:
```bash
git add hosts/ops-jrz1.nix
git commit -m "Enable maubot service (no bots deployed yet)"
```
**Rollback if needed**:
```bash
# Option 1: NixOS generation rollback (fastest)
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
# Option 2: Disable service (if you want to keep other changes)
# Edit hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = false
# Then redeploy
```
---
### Rollback Procedures
**If ANY deployment phase fails or breaks existing services**:
1. **Immediate rollback** (restores last working state):
```bash
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
```
2. **Verify services restored**:
```bash
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
# Test Slack bridge: post message, verify in Matrix
```
3. **Investigate issue** before retrying:
```bash
# Check what changed
ssh root@45.77.205.49 'journalctl --since "10 minutes ago" | grep -E "ERR|CRIT|FTL"'
# Review deployment logs
ssh root@45.77.205.49 'journalctl -u nixos-rebuild -n 100'
```
**Git-based rollback** (if committed but want to revert):
```bash
git log --oneline -5 # Find commit to revert
git revert <commit-hash>
nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
```
---
### Phase 2.4: Deployment Success Criteria
Before proceeding to bot configuration, verify:
- [ ] maubot.service is active (running)
- [ ] Management UI loads at http://localhost:29316/_matrix/maubot (via SSH tunnel)
- [ ] No errors in maubot service logs
- [ ] All existing services healthy (Matrix, Slack bridge, Forgejo, PostgreSQL, nginx)
- [ ] Slack bridge functional (test message flow Slack ↔ Matrix)
- [ ] Phase 2.3 git commit created
If all criteria pass, proceed to Phase 3 (Bot Registration). Otherwise, rollback and investigate.
---
## Phase 3: Bot Registration and Configuration
### 1. Access Management Interface
```bash
# Create SSH tunnel
ssh -L 29316:localhost:29316 root@45.77.205.49
# In browser:
# Navigate to: http://localhost:29316/_matrix/maubot
```
### 2. Login to Maubot
- Username: `admin`
- Password: `<from sops secrets>`
### 3. Create Bot Matrix User
**Option A: Registration Token** (recommended):
1. Configure conduwuit registration token (if not already set)
2. In Maubot UI: Clients → Add client
3. Enter Matrix user ID: `@instagram-bot:clarun.xyz`
4. Select "Register" and provide registration token
5. Bot user created automatically
**Option B: Admin Room Commands**:
1. Access Matrix homeserver admin room
2. Run: `!admin users create-user instagram-bot`
3. Copy generated password
4. In Maubot UI: Create client with username/password
### 4. Upload Instagram Plugin
```bash
# Copy plugin to VPS
scp /home/dan/proj/sna/sna-instagram-bot.mbp \
root@45.77.205.49:/tmp/
# Or upload via web UI:
# - Plugins tab → Upload
# - Select sna-instagram-bot.mbp
```
### 5. Create Bot Instance
In Maubot UI:
1. Instances tab → Add instance
2. **ID**: `instagram-bot-1`
3. **Type**: `sna.instagram`
4. **Primary user**: Select `@instagram-bot:clarun.xyz`
5. **Enabled**: ✓
6. **Config**:
```json
{
"enabled": true,
"max_file_size": 50000000,
"room_subscriptions": []
}
```
7. Save
### 6. Configure Room Subscriptions
**Get Matrix room ID**:
```bash
# In Element or Matrix client:
# Room Settings → Advanced → Internal Room ID
# Example: !abc123def:clarun.xyz
```
**Add to bot config** (per FR-010):
Edit bot instance config in Maubot UI:
```json
{
"enabled": true,
"max_file_size": 50000000,
"room_subscriptions": [
"!abc123def:clarun.xyz"
]
}
```
**Restart bot instance**: Stop → Start in Maubot UI
---
## Phase 4: Testing
### 1. Invite Bot to Test Room
In Matrix client:
```
/invite @instagram-bot:clarun.xyz
```
### 2. Test Instagram URL Fetching
Post in the room:
```
https://www.instagram.com/p/EXAMPLE123/
```
**Expected behavior**:
- Bot responds within 5 seconds (SC-001)
- Image/video appears in room
- Caption and metadata posted as text message
### 3. Test Room Subscription Enforcement
Post Instagram URL in a room NOT in `room_subscriptions`:
**Expected behavior**:
- Bot ignores URL (no response)
### 4. Monitor Logs
```bash
ssh root@45.77.205.49 'journalctl -u maubot.service -f --since "5 minutes ago"'
# Check for:
# - Instagram URL detection
# - yt-dlp extraction
# - Matrix upload
# - Any ERROR/CRITICAL logs
```
---
## Phase 5: Health Monitoring
### 1. Verify Health Check Timer
```bash
ssh root@45.77.205.49 'systemctl list-timers | grep maubot'
# Expected:
# maubot-health.timer (runs every 5 minutes)
# maubot-health-restart.timer (runs every 10 minutes)
```
### 2. Manual Health Check
```bash
ssh root@45.77.205.49 'curl -s http://localhost:29316/_matrix/maubot/v1/version | jq .'
# Expected output:
# {
# "version": "0.5.2",
# "server": "maubot"
# }
```
### 3. Check Bot Instance Status
In Maubot UI:
- Instances tab
- Verify `instagram-bot-1` shows green "Running" status
- Check "Last Sync" timestamp (should be <10 minutes)
---
## Troubleshooting
### Bot Not Responding to Instagram URLs
**Check**:
1. Room ID is in `room_subscriptions` config
2. Bot has joined the room (`/invite @instagram-bot:clarun.xyz`)
3. URL is public Instagram post (not private/story)
4. Logs show URL detection: `journalctl -u maubot.service | grep -i instagram`
**Fix**:
- Update room_subscriptions config
- Restart bot instance in Maubot UI
### Service Won't Start
**Check**:
```bash
ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
```
**Common issues**:
- Port 29316 already in use → Check `ss -tlnp | grep 29316`
- Database permissions → Check `/var/lib/maubot/` ownership
- Secrets not decrypted → Check `/run/secrets/maubot-*` exists
### Bot Can't Connect to Matrix
**Check**:
1. conduwuit is running: `systemctl status matrix-continuwuity`
2. Homeserver URL is correct: `http://127.0.0.1:8008` (IPv4)
3. Bot Matrix user exists and has valid access token
**Fix**:
- Recreate bot client in Maubot UI
- Check Matrix homeserver logs: `journalctl -u matrix-continuwuity | grep instagram`
### Instagram Content Fetch Fails
**Check logs**:
```bash
ssh root@45.77.205.49 'journalctl -u maubot.service | grep -A 10 "yt-dlp"'
```
**Common issues**:
- Instagram rate limiting (429 error) → Wait 30 minutes, reduce request frequency
- Private post → Can't fetch (expected behavior)
- yt-dlp outdated → Update nixpkgs, redeploy
---
## Rollback Procedure
If deployment fails:
```bash
# List NixOS generations
ssh root@45.77.205.49 'nixos-rebuild list-generations'
# Rollback to previous generation
ssh root@45.77.205.49 'nixos-rebuild switch --rollback'
# Verify services restored
ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack'
```
---
## Success Criteria Validation
Verify all success criteria before marking feature complete:
- [ ] **SC-001**: Instagram bot responds within 5 seconds
- [ ] **SC-002**: System supports 3 concurrent bot instances (test by creating 2 more instances)
- [ ] **SC-003**: Service maintains 99% uptime over 7 days
- [ ] **SC-004**: Auto-recovery within 2 minutes after restart
- [ ] **SC-005**: New bot deployment completes in <10 minutes
- [ ] **SC-006**: 95% success rate for public Instagram URLs
- [ ] **SC-007**: Management interface loads in <2 seconds
- [ ] **SC-008**: Server reboot without data loss (test with `reboot`)
**Testing period**: 7 days operational before merging to main (per constitution Principle III)
---
## Post-Deployment
### 1. Update Documentation
```bash
# Update CLAUDE.md with maubot commands
# Example section to add:
### Maubot Management
- Management UI: http://localhost:29316/_matrix/maubot (via SSH tunnel)
- Bot registration: Use conduwuit registration token
- Room subscriptions: Edit config JSON, restart instance
- Logs: journalctl -u maubot.service -f
```
### 2. Commit and Tag
```bash
git add modules/maubot.nix modules/dev-services.nix hosts/ops-jrz1.nix
git commit -m "Add maubot bot framework with Instagram bot
- Extract and adapt maubot.nix from ops-base
- Configure for conduwuit (registration token auth)
- Deploy Instagram bot with room-based activation
- Add health monitoring timers
Implements feature 003-maubot-integration
"
git tag -a v0.3.0 -m "Release v0.3.0: Maubot Integration
Features:
- Maubot bot framework service
- Instagram content fetcher bot
- Room-based bot activation
- Management web interface (localhost only)
- Health monitoring and auto-recovery
Success criteria validated (SC-001 through SC-008)
Constitution compliance verified
"
git push origin main --tags
```
### 3. Create Worklog
Document the deployment session:
```bash
# Create worklog
docs/worklogs/2025-10-26-maubot-deployment.org
```
---
## Reference Files
**Module locations**:
- `/home/dan/proj/ops-jrz1/modules/maubot.nix` (service module)
- `/home/dan/proj/ops-jrz1/modules/dev-services.nix` (high-level wrapper)
**Secrets**:
- `/home/dan/proj/ops-jrz1/secrets/secrets.yaml` (encrypted)
- `/run/secrets/maubot-*` (runtime, on VPS)
**Runtime state** (on VPS):
- `/var/lib/maubot/bot.db` (SQLite database)
- `/var/lib/maubot/config/config.yaml` (generated config)
- `/var/lib/maubot/plugins/` (uploaded .mbp files)
**Source reference**:
- ops-base module: `/home/dan/proj/ops-base/vm-configs/modules/maubot.nix`
- Instagram plugin: `/home/dan/proj/sna/sna-instagram-bot.mbp`
- ops-base docs: `/home/dan/proj/ops-base/docs/maubot-*.md`
---
**Deployment time estimate**: 2-3 hours (including testing and validation)
**Status**: Ready for Phase 2 (implementation)

View file

@ -0,0 +1,287 @@
# Feature Specification: Matrix Bot Framework (Maubot) Integration
**Feature Branch**: `003-maubot-integration`
**Created**: 2025-10-26
**Status**: Draft
**Input**: User description: "Begin maubot feature spec. instagram bot is one of our goals."
## Clarifications
### Session 2025-10-26
- Q: Instagram bot activation behavior - should it respond to all Instagram URLs, only when mentioned, or in designated rooms? → A: Bot responds to Instagram URLs only in designated bot-enabled rooms
- Q: Bot error notification method - how should errors be communicated to administrators? → A: Error notification behavior based on severity levels (DEBUG/INFO logs only, WARN logs + dashboard visibility, ERROR/CRITICAL logs + dashboard + Matrix admin room notifications)
- Q: Room enablement mechanism - how do administrators enable bot in specific rooms? → A: Edit bot configuration file with room IDs, restart bot instance
- Q: Admin notification room configuration - should each bot have dedicated admin room, shared room, or reuse homeserver admin room? → A: Reuse Matrix homeserver admin room for bot ERROR/CRITICAL notifications
- Q: Management interface authentication - single shared account, multi-user, or Matrix homeserver auth? → A: Single shared admin account (username/password configured in sops-nix secrets)
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Instagram Content Sharing to Matrix (Priority: P1)
A team member shares an Instagram post URL in a Matrix room, and the bot automatically fetches and displays the content (image, caption, metadata) directly in the chat, allowing team members to view and discuss Instagram content without leaving Matrix.
**Why this priority**: This is the core value proposition - bringing Instagram content into team communication. Demonstrates immediate utility of the bot framework and validates the integration works correctly.
**Independent Test**: Can be fully tested by posting an Instagram URL in a Matrix room and verifying the bot responds with content preview, delivering immediate value as an Instagram content viewer.
**Acceptance Scenarios**:
1. **Given** Instagram bot is enabled in a specific Matrix room, **When** user posts "https://instagram.com/p/ABC123/" in that room, **Then** bot responds within 5 seconds with image, caption, and post metadata (likes, comments count)
2. **Given** Instagram bot is NOT enabled in a Matrix room, **When** user posts Instagram URL in that room, **Then** bot ignores the URL and does not respond
3. **Given** bot receives Instagram URL in enabled room, **When** content is a video, **Then** bot provides video thumbnail, caption, and download link
4. **Given** bot receives Instagram URL in enabled room, **When** content is a carousel (multiple images), **Then** bot displays all images in sequence with navigation
5. **Given** bot receives Instagram profile URL in enabled room, **When** URL is "https://instagram.com/username", **Then** bot displays profile info (bio, follower count, recent posts preview)
6. **Given** bot encounters rate limiting in enabled room, **When** too many requests in short period, **Then** bot queues request and notifies user of delay
---
### User Story 2 - Bot Management Interface (Priority: P2)
Platform administrators can configure, start, stop, and monitor bots through a web-based management interface without editing configuration files or restarting services.
**Why this priority**: Essential for operational management and enables non-developer administrators to manage bots. Required for long-term maintainability but bot can work without it initially.
**Independent Test**: Can be tested by accessing management interface, creating a test bot instance, and verifying it appears in Matrix - demonstrates full bot lifecycle management.
**Acceptance Scenarios**:
1. **Given** administrator accesses Maubot management UI, **When** they log in with shared admin credentials, **Then** dashboard displays all bot instances, their status, and health metrics
2. **Given** administrator wants to deploy Instagram bot, **When** they upload maubot plugin file (.mbp), **Then** plugin appears in available plugins list
3. **Given** plugin is uploaded, **When** administrator creates new bot instance with Matrix user credentials and room subscription list, **Then** bot appears online in Matrix within 30 seconds and only responds in configured rooms
4. **Given** administrator wants to change enabled rooms, **When** they edit bot configuration file with new room IDs and restart bot instance, **Then** bot begins responding only in newly configured rooms
5. **Given** bot is running, **When** administrator clicks "Stop" button, **Then** bot goes offline and stops responding to commands
6. **Given** bot encounters error, **When** viewing bot logs in UI, **Then** error messages are displayed with timestamps, severity level, and context
7. **Given** bot experiences CRITICAL error, **When** error occurs, **Then** notification is sent to Matrix homeserver admin room with error details and affected bot instance
---
### User Story 3 - Bot Framework Service Reliability (Priority: P2)
The Maubot service starts automatically on server boot, maintains bot instances across restarts, and recovers from failures without manual intervention.
**Why this priority**: Critical for production use but can be validated after basic functionality works. Prevents the bot framework from being a maintenance burden.
**Independent Test**: Can be tested by rebooting the server and verifying Maubot service auto-starts and all bot instances resume operation automatically.
**Acceptance Scenarios**:
1. **Given** server reboots, **When** system comes back online, **Then** Maubot service starts automatically within 2 minutes and all bot instances reconnect to Matrix
2. **Given** Matrix homeserver restarts, **When** homeserver is available again, **Then** bot instances re-establish connections and resume operation without manual intervention
3. **Given** bot instance crashes, **When** Maubot detects failure, **Then** service attempts automatic restart with exponential backoff
4. **Given** bot encounters persistent error (ERROR/CRITICAL severity), **When** restart attempts fail, **Then** service logs detailed diagnostics, updates dashboard status, and sends notification to Matrix homeserver admin room
5. **Given** database connection lost, **When** connectivity is restored, **Then** Maubot reconnects automatically and restores bot state
---
### User Story 4 - Additional Bot Deployment (Priority: P3)
Platform administrators can deploy additional custom bots beyond Instagram bot by uploading plugin files and configuring bot instances, enabling extensible bot functionality for future team needs.
**Why this priority**: Demonstrates platform extensibility and future-proofs the investment, but not required for initial value delivery. Can be added after Instagram bot proves value.
**Independent Test**: Can be tested by deploying a simple echo bot or reaction bot from maubot plugin repository and verifying it works independently.
**Acceptance Scenarios**:
1. **Given** administrator has custom maubot plugin (.mbp file), **When** they upload via management interface, **Then** plugin is validated and added to available plugins
2. **Given** plugin requires configuration, **When** creating bot instance, **Then** administrator can provide plugin-specific settings through UI
3. **Given** multiple bot instances exist, **When** administrator views dashboard, **Then** all bots are clearly listed with their types, status, and resource usage
4. **Given** bot requires database storage, **When** bot instance is created, **Then** Maubot automatically provisions isolated database for that bot
5. **Given** plugin has dependencies, **When** uploading plugin, **Then** Maubot validates dependencies and reports missing requirements
---
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: System MUST extract and deploy maubot module from ops-base repository to ops-jrz1 infrastructure
- **FR-002**: System MUST integrate Maubot with existing conduwuit Matrix homeserver on clarun.xyz
- **FR-003**: System MUST provide web-based management interface on dedicated port (default: 29316) accessible to platform administrators via single shared admin account credentials stored in sops-nix secrets
- **FR-004**: Maubot service MUST support automatic startup on system boot and auto-recovery from failures
- **FR-005**: System MUST support Instagram bot plugin deployment with content fetching capabilities
- **FR-006**: Instagram bot MUST fetch and display images, videos, captions, and metadata from Instagram URLs posted only in designated bot-enabled Matrix rooms (bot ignores URLs in rooms where it is not explicitly enabled)
- **FR-007**: Instagram bot MUST handle rate limiting gracefully with user-friendly error messages
- **FR-008**: System MUST support multiple bot instances running concurrently with isolated configurations (architecture supports 3+ instances per SC-002, production deploys 1 instance initially per quickstart.md)
- **FR-009**: System MUST persist bot configurations and state to survive service restarts
- **FR-010**: Administrators MUST be able to configure bot room subscriptions by editing bot configuration file with Matrix room IDs and restarting the bot instance
- **FR-011**: System MUST provide health monitoring for bot instances with status indicators (health check API endpoint and dashboard status display via management interface)
- **FR-012**: System MUST integrate with existing sops-nix secrets management for bot credentials
- **FR-013**: System MUST support uploading and deploying additional maubot plugins (.mbp files) - functionality inherited from ops-base maubot.nix module, validated in T029
- **FR-014**: System MUST provide logging capabilities for bot activity and errors accessible via management interface with severity-based propagation (DEBUG/INFO to logs only, WARN to logs and dashboard, ERROR/CRITICAL to logs, dashboard, and Matrix homeserver admin room)
- **FR-015**: Bot instances MUST authenticate with Matrix homeserver using registration tokens (conduwuit compatibility requirement, shared secret not supported)
- **FR-016**: System MUST support per-bot database storage with automatic provisioning
### Key Entities
- **Maubot Service**: Plugin-based Matrix bot framework that manages multiple bot instances, provides management interface, and handles Matrix homeserver integration
- **Bot Instance**: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment (e.g., "instagram-bot-1")
- **Plugin**: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies (e.g., Instagram content fetcher, echo bot, reaction bot)
- **Bot Configuration**: Settings specific to bot instance including Matrix credentials, plugin settings, room subscriptions (list of enabled room IDs), and command prefixes
- **Management Interface**: Web UI for administrators to create, configure, monitor, and control bot instances, displaying logs with severity levels and real-time status updates
- **Admin Notification**: ERROR and CRITICAL level bot notifications sent to existing Matrix homeserver admin room (shared with other platform notifications)
- **Bot Database**: Per-instance isolated SQLite database for plugin state and data persistence
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: Instagram bot responds to Instagram URLs with content preview within 5 seconds under normal conditions
- **SC-002**: System supports at least 3 concurrent bot instances without performance degradation
- **SC-003**: Maubot service maintains 99% uptime over 7-day testing period
- **SC-004**: Bot instances automatically recover within 2 minutes after service restart
- **SC-005**: Administrators can deploy a new bot instance from scratch in under 10 minutes
- **SC-006**: Instagram bot successfully fetches content for 95% of public Instagram post URLs
- **SC-007**: Management interface loads and displays bot status within 2 seconds
- **SC-008**: System handles server reboot without data loss or manual intervention required
**Validation Note**: SC-001, SC-002, SC-003, SC-004, SC-008 have explicit task validation (T026, T042, T038, T034, T034). SC-005, SC-006, SC-007 are measured during the 7-day operational validation period (T038) and documented in deployment worklog (T044).
## Scope *(mandatory)*
### In Scope
- Extract and adapt maubot.nix module from ops-base to ops-jrz1
- Configure Maubot to integrate with conduwuit Matrix homeserver
- Deploy Instagram bot plugin as primary use case
- Set up management web interface with authentication
- Implement health monitoring and auto-recovery mechanisms
- Configure sops-nix secrets for bot credentials
- Document bot deployment and management procedures including room subscription configuration workflow
- Support for uploading additional maubot plugins
### Out of Scope
- Custom Instagram bot development (use existing maubot Instagram plugin from community)
- Migration of other bots from ops-base besides Instagram bot
- Advanced analytics or metrics dashboard for bot performance
- Multi-homeserver support (only clarun.xyz)
- Custom plugin development beyond Instagram bot deployment
- Mobile app for bot management (web interface only)
- Automatic Instagram authentication (manual token provisioning acceptable)
- Real-time Instagram feed monitoring or notifications
## Constraints *(mandatory)*
### Technical Constraints
- Must work with conduwuit Matrix homeserver (ops-base used continuwuity, may require compatibility testing)
- Limited to Python 3.11 for maubot runtime (nixpkgs availability)
- Instagram bot functionality depends on Instagram API/scraping availability and rate limits
- Must adapt from ops-base VM-based deployment pattern to ops-jrz1 VPS single-host pattern
- Dependent on deprecated olm-3.2.16 library for Matrix encryption (known CVEs, acceptable risk documented in ops-base)
### Operational Constraints
- Deployment must not disrupt existing services (Matrix homeserver, Slack bridge, Forgejo)
- Management interface must be secured (single admin account authentication, localhost-only access)
- Management interface credentials must be stored in sops-nix encrypted secrets
- Bot Matrix accounts require registration tokens from homeserver
- Instagram tokens may require periodic renewal based on Instagram API policies
### Resource Constraints
- Maubot service limited to 512M memory (as per ops-base configuration)
- Additional database space required for bot state (estimated <100MB initially)
- Management interface port 29316 must not conflict with existing services
## Dependencies *(mandatory)*
### External Dependencies
- ops-base repository access to extract maubot.nix module and documentation
- Instagram bot plugin from maubot community or ops-base implementation
- Instagram authentication tokens (if required by current Instagram API policies)
- Matrix homeserver registration token for bot user creation
### Internal Dependencies
- conduwuit Matrix homeserver must be operational on clarun.xyz
- sops-nix secrets management must be configured for bot credentials
- SQLite for bot state storage (decision per plan.md research: lightweight isolation better than shared PostgreSQL)
- Existing NixOS infrastructure and deployment patterns
### Blocking Issues
- Need to verify conduwuit compatibility with maubot (ops-base used continuwuity)
- Need to assess current Instagram API access requirements and scraping feasibility
- Need to extract and adapt ops-base module configuration options from `services.matrix-vm.maubot` to `services.dev-platform.maubot`
## Assumptions *(mandatory)*
- Instagram content fetching remains technically feasible (no major Instagram API changes blocking access)
- Maubot works with conduwuit Matrix homeserver with minimal or no modifications
- ops-base maubot module can be adapted to VPS deployment with reasonable effort
- Instagram bot plugin from ops-base is functional and can be reused or community plugin exists
- Team accepts olm-3.2.16 security risk with documented mitigation plan (migration to vodozemac when available)
- Bot traffic will remain under Instagram rate limits for small team usage (<100 requests/hour)
- Single VPS deployment sufficient (no distributed bot architecture needed)
- Single shared admin account sufficient for initial deployment (no multi-user management required)
## Non-Goals *(optional)*
- Automated Instagram post monitoring or scheduled fetching
- Direct posting to Instagram from Matrix (read-only integration)
- Instagram DM integration or two-way messaging
- Advanced content moderation or filtering
- Custom Instagram analytics or engagement tracking
- Multi-tenant bot hosting for external teams
- Commercial Instagram API integration (acceptable to use community scraping approaches)
- Real-time Instagram notifications or webhooks
## Known Limitations *(optional)*
The following edge cases are known limitations not addressed in MVP scope:
- **Deleted/private Instagram posts**: Bot does not handle posts that become private or deleted after initial fetch (content remains in Matrix chat history)
- **Instagram rate limiting**: System may experience delays during high-traffic periods (429 responses). FR-007 requires graceful handling with user notifications.
- **Matrix account credential expiry**: Bot user account credentials are managed via registration tokens and do not expire automatically. Manual re-authentication required if revoked.
- **Instagram story URLs**: 24-hour expiry stories not supported (yt-dlp limitation for ephemeral content)
- **Command collision**: Multiple bot instances in same room may respond to overlapping triggers. Recommendation: enable only one bot per room or use distinct command prefixes.
- **Age-restricted/geo-blocked content**: Instagram content with access restrictions may fail to fetch depending on VPS location and yt-dlp capabilities
- **Management interface connection loss**: If Maubot loses connection to Matrix homeserver, bot instances stop responding until connection restored (monitored via health checks in FR-011)
- **Database corruption**: No automated backup/recovery. Recommendation: implement manual backup procedure for /var/lib/maubot/ during operational period.
## Risks *(optional)*
### Technical Risks
- **Risk**: Instagram API/scraping methods may break with Instagram updates
- **Mitigation**: Document bot as best-effort, plan for periodic maintenance, monitor Instagram bot community for updates
- **Risk**: Conduwuit compatibility issues with maubot not discovered until integration
- **Mitigation**: Test maubot registration and basic functionality early in implementation phase
- **Risk**: olm-3.2.16 vulnerabilities may be exploited
- **Mitigation**: Follow ops-base mitigation strategy - monitor for vodozemac migration, limit bot network exposure, document accepted risk
### Operational Risks
- **Risk**: Instagram rate limiting may impact bot responsiveness during high usage
- **Mitigation**: Implement request queuing, user notifications for delays, consider rate limit monitoring
- **Risk**: Bot management interface security breach could compromise Matrix homeserver
- **Mitigation**: Require strong authentication, limit network exposure, regular security audits, use sops-nix for credential storage
- **Risk**: Bot instance failure may go unnoticed without monitoring
- **Mitigation**: Implement health checks, automated restarts, log monitoring, administrator alerts for persistent failures
## Clarified Requirements *(resolved 2025-10-26)*
### Instagram Authentication Approach
**Decision**: Use community scraping methods (instaloader, yt-dlp) for Instagram content fetching.
**Rationale**: Easier to set up immediately without requiring Facebook developer account approval. Acceptable for internal team use with understanding that scraping methods may require periodic updates if Instagram changes their interface.
### Management Interface Network Exposure
**Decision**: Restrict management interface to localhost only, requiring SSH tunnel for remote administration.
**Rationale**: Maximizes security by eliminating network attack surface. Administrators already have SSH access for deployment, so tunnel setup is acceptable operational overhead for the security benefit.
### Bot Instance Quantity Planning
**Decision**: Support single Instagram bot instance initially (1 instance).
**Rationale**: Minimal resource requirements, proves concept quickly, demonstrates value before scaling. Architecture can support additional instances later if needed without major rework.
**Note**: SC-002 requires validating 3-instance capability during testing to ensure architecture can scale when needed, but production deployment starts with single instance.

View file

@ -0,0 +1,348 @@
# Implementation Tasks: Maubot Integration
**Feature**: 003-maubot-integration
**Branch**: `003-maubot-integration`
**Target**: ops-jrz1 VPS (45.77.205.49)
**Estimated Duration**: 2-3 hours deployment + 7 days validation
## Task Summary
- **Total Tasks**: 47 (updated for incremental deployment strategy)
- **Setup Phase**: 4 tasks
- **Foundational Phase**: 6 tasks
- **User Story 1 (P1)**: 20 tasks - Instagram content sharing (MVP)
- Infrastructure: 3 tasks (T011-T013)
- Phase 1 deployment: 4 tasks (T013a-d)
- Phase 2 deployment: 4 tasks (T013e-h)
- Phase 3 deployment: 6 tasks (T014-T017c)
- Phase 4 bot config: 6 tasks (T018-T023)
- Testing: 4 tasks (T024-T027)
- **User Story 2 (P2)**: 6 tasks - Management interface
- **User Story 3 (P2)**: 5 tasks - Service reliability
- **User Story 4 (P3)**: 3 tasks - Additional bot deployment
- **Polish Phase**: 3 tasks
**MVP Scope**: User Story 1 (20 tasks) - validates core value proposition with incremental deployment
---
## Phase 1: Setup (Project Initialization)
**Goal**: Prepare development environment and extract source modules from ops-base
- [X] T001 Create feature branch 003-maubot-integration from main
- [X] T002 Copy maubot.nix module from /home/dan/proj/ops-base/vm-configs/modules/maubot.nix to modules/maubot.nix
- [X] T003 Copy Instagram bot plugin from /home/dan/proj/sna/sna-instagram-bot.mbp to local working directory
- [X] T004 Generate maubot secrets (admin password 32 chars, secret key 48 bytes) using openssl rand -base64
**Checkpoint**: Source files ready for adaptation
---
## Phase 2: Foundational (Blocking Prerequisites)
**Goal**: Adapt maubot module for ops-jrz1 and configure secrets
**Independent Test**: Deploy adapted module and verify service starts without errors
### Module Adaptation
- [X] T005 Update module namespace from services.matrix-vm.maubot to services.maubot in modules/maubot.nix
- [X] T006 Update homeserver URL from http://127.0.0.1:6167 to http://127.0.0.1:8008 in modules/maubot.nix
- [X] T007 Remove registration_secrets section from config generation in modules/maubot.nix (lines ~140-150, conduwuit doesn't support shared secret)
- [X] T008 Change config path from /run/maubot/config.yaml to /var/lib/maubot/config/config.yaml in modules/maubot.nix
- [X] T009 Add LoadCredential removal for registration-secret (keep admin-password and secret-key only) in modules/maubot.nix systemd service section
- [X] T010 [P] Add maubot secrets to secrets/secrets.yaml (maubot-admin-password, maubot-secret-key) using sops secrets/secrets.yaml
**Checkpoint**: Module adapted for conduwuit, secrets encrypted
---
## Phase 3: User Story 1 - Instagram Content Sharing to Matrix (Priority: P1)
**Goal**: Deploy maubot service with Instagram bot and validate content fetching
**Independent Test**: Post Instagram URL in enabled Matrix room and verify bot responds with image/video/caption within 5 seconds
**Why MVP**: Core value proposition - brings Instagram content into team communication, validates integration works
### Infrastructure Deployment
- [X] T011 [US1] Add sops secret declarations to hosts/ops-jrz1.nix (sops.secrets.maubot-admin-password, sops.secrets.maubot-secret-key)
- [X] T012 [US1] Create dev-platform wrapper options in modules/dev-services.nix (services.dev-platform.maubot with enable and port options)
- [X] T013 [US1] Add dev-platform config block in modules/dev-services.nix (maps to services.maubot with homeserverUrl, serverName, port, secret paths)
### Service Deployment - Phase 1: Module Files
- [ ] T013a [US1] Deploy Phase 1 to VPS (modules added, service disabled) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
- [ ] T013b [US1] Verify Phase 1: Check nixos-rebuild output reports "no services changed" or only unrelated service restarts
- [ ] T013c [US1] Verify existing services healthy: ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx'
- [ ] T013d [US1] Git commit Phase 1 with message "Add maubot module files (service disabled)"
### Service Deployment - Phase 2: Secrets
- [ ] T013e [US1] Deploy Phase 2 to VPS (secrets added in Phase 0 and Phase 1, service still disabled) using nixos-rebuild switch
- [ ] T013f [US1] Verify Phase 2: Check secrets decrypted via ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*' (expect 0400 permissions)
- [ ] T013g [US1] Verify existing services healthy (same command as T013c)
- [ ] T013h [US1] Git commit Phase 2 with message "Add maubot secrets (service not enabled)"
### Service Deployment - Phase 3: Enable Service
- [ ] T014 [US1] Enable maubot service in hosts/ops-jrz1.nix (services.dev-platform.maubot.enable = true, port = 29316)
- [ ] T015 [US1] Deploy Phase 3 to VPS (enable maubot service) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
- [ ] T016 [US1] Verify service status via ssh root@45.77.205.49 'systemctl status maubot.service' (expect active running)
- [ ] T017 [US1] Check logs for errors via ssh root@45.77.205.49 'journalctl -u maubot.service -n 50'
- [ ] T017a [US1] Verify existing services still healthy after maubot deployment (same command as T013c)
- [ ] T017b [US1] Test Slack bridge functionality (post message in Slack, verify appears in Matrix within 5 seconds)
- [ ] T017c [US1] Git commit Phase 3 with message "Enable maubot service (no bots deployed yet)"
### Bot Configuration - Phase 4: Manual Deployment
- [ ] T018 [US1] Create SSH tunnel to management interface: ssh -L 29316:localhost:29316 root@45.77.205.49
- [ ] T019 [US1] Login to maubot web UI at http://localhost:29316/_matrix/maubot (username: admin, password from sops secrets)
- [ ] T020 [US1] Create bot Matrix user @instagram-bot:clarun.xyz via conduwuit registration token (Clients tab → Add client → Register)
- [ ] T021 [US1] Upload Instagram plugin sna-instagram-bot.mbp via web UI (Plugins tab → Upload)
- [ ] T022 [US1] Create bot instance instagram-bot-1 (type: sna.instagram, primary_user: @instagram-bot:clarun.xyz, config: {"enabled": true, "max_file_size": 50000000, "room_subscriptions": []})
- [ ] T023 [US1] Invite bot to test Matrix room via /invite @instagram-bot:clarun.xyz
### Testing & Validation
- [ ] T024 [US1] Add test room ID to bot config room_subscriptions in maubot web UI
- [ ] T025 [US1] Restart bot instance (Stop → Start in web UI)
- [ ] T026 [US1] Post public Instagram URL in test room and verify bot responds within 5 seconds with image/video/caption (SC-001)
- [ ] T027 [US1] Post Instagram URL in non-subscribed room and verify bot ignores it (FR-006 enforcement)
**Acceptance Criteria**:
- ✅ Bot responds to Instagram URLs in subscribed rooms only
- ✅ Content fetched within 5 seconds (SC-001)
- ✅ Images, videos, and captions displayed correctly
- ✅ Bot ignores URLs in non-subscribed rooms
**MVP Checkpoint**: Core functionality working - Instagram content visible in Matrix
---
## Phase 4: User Story 2 - Bot Management Interface (Priority: P2)
**Goal**: Validate management interface functionality for bot lifecycle operations
**Independent Test**: Access management UI, create/stop/restart bot instance, view logs and status
**Why this priority**: Essential for operations but bot works without admin features initially
### Management Interface Validation
- [ ] T028 [US2] Access management dashboard via SSH tunnel and verify all bot instances listed with status (instances tab)
- [ ] T029 [US2] Test plugin upload via web UI (upload test .mbp file, verify appears in plugins list)
- [ ] T030 [US2] Test bot instance creation via web UI (create test instance, verify appears online in Matrix within 30 seconds)
- [ ] T031 [US2] Test bot configuration edit (edit room_subscriptions via config JSON, restart instance, verify bot responds only in new rooms)
- [ ] T032 [US2] Test bot stop/start via web UI (click Stop button, verify bot goes offline, click Start, verify reconnects)
- [ ] T033 [US2] View bot logs in UI and verify error messages display with timestamps and severity levels
**Acceptance Criteria**:
- ✅ Dashboard displays all bot instances with status
- ✅ Plugin upload succeeds and validates
- ✅ Bot lifecycle operations (create/stop/start) work via UI
- ✅ Configuration changes take effect after restart
- ✅ Logs visible with proper formatting
---
## Phase 5: User Story 3 - Bot Framework Service Reliability (Priority: P2)
**Goal**: Validate auto-start, auto-recovery, and failure handling
**Independent Test**: Reboot server and verify maubot service and all bot instances resume automatically
**Why this priority**: Critical for production reliability but can be validated after basic functionality proven
### Reliability Testing
- [ ] T034 [US3] Test server reboot recovery (ssh root@45.77.205.49 'reboot', wait 2 minutes, verify service auto-starts via systemctl status maubot)
- [ ] T035 [US3] Test Matrix homeserver restart handling (restart matrix-continuwuity service, verify bot reconnects automatically without manual intervention)
- [ ] T036 [US3] Verify health check timers active (ssh root@45.77.205.49 'systemctl list-timers | grep maubot', expect maubot-health.timer and maubot-health-restart.timer)
- [ ] T037 [US3] Test manual health check (curl http://localhost:29316/_matrix/maubot/v1/version, verify JSON response with version field)
- [ ] T038 [US3] Monitor 7-day uptime for SC-003 validation (99% uptime target, check periodically: uptime -p, journalctl -u maubot | grep -i error)
**Acceptance Criteria**:
- ✅ Service auto-starts on server boot within 2 minutes
- ✅ Bot instances reconnect after Matrix homeserver restart
- ✅ Health timers operational
- ✅ 99% uptime achieved over 7-day period
---
## Phase 6: User Story 4 - Additional Bot Deployment (Priority: P3)
**Goal**: Demonstrate platform extensibility by deploying a second bot type
**Independent Test**: Deploy echo bot or reaction bot from maubot plugin repository and verify independent operation
**Why this priority**: Future-proofs investment, not required for initial Instagram bot value
### Extensibility Validation
- [ ] T039 [US4] Download additional maubot plugin from community repository (e.g., echo bot, reaction bot)
- [ ] T040 [US4] Upload second plugin via management UI and verify validation succeeds
- [ ] T041 [US4] Create second bot instance using new plugin and verify appears in dashboard with type, status, and resource usage
- [ ] T042 [US4] Test SC-002 multi-instance validation (run 3 concurrent bot instances, verify no performance degradation)
**Acceptance Criteria**:
- ✅ Multiple plugin types supported
- ✅ Dashboard shows all bots with clear differentiation
- ✅ 3+ concurrent instances run without degradation (SC-002)
---
## Phase 7: Polish & Cross-Cutting Concerns
**Goal**: Complete documentation and prepare for merge
### Documentation
- [ ] T043 Update CLAUDE.md with maubot management commands (service status, logs, SSH tunnel, room subscription workflow)
- [ ] T044 Create deployment worklog in docs/worklogs/2025-10-26-maubot-deployment.org documenting session
- [ ] T045 Commit changes and tag release v0.3.0 (message: "Add maubot bot framework with Instagram bot - Implements 003-maubot-integration")
**Final Checkpoint**: All documentation complete, ready for 7-day validation period
---
## Dependencies & Execution Order
### User Story Dependencies
```
Phase 1 (Setup)
Phase 2 (Foundational) ← BLOCKING for all user stories
├─→ User Story 1 (P1) ← MVP, no dependencies
├─→ User Story 2 (P2) ← depends on US1 (needs running bot to manage)
├─→ User Story 3 (P2) ← depends on US1 (needs service deployed to test reliability)
└─→ User Story 4 (P3) ← depends on US2 (needs management UI working)
Phase 7 (Polish) ← depends on all user stories complete
```
### Critical Path
1. Setup (T001-T004)
2. Foundational (T005-T010) - **MUST complete before user stories**
3. User Story 1 (T011-T027) - **MVP - Deploy first, validate before continuing**
4. Validate MVP success before proceeding to US2/US3/US4
5. User Stories 2, 3, 4 can proceed in parallel after US1 validates
6. Polish (T043-T045) after all user stories complete
---
## Parallel Execution Opportunities
### Phase 2 (Foundational)
**Parallel**:
- T010 can run in parallel with T005-T009 (secrets vs module editing, different files)
### Phase 3 (User Story 1)
**Parallel**:
- T011, T012, T013 can run in parallel (different files: hosts/ops-jrz1.nix, modules/dev-services.nix)
- After T015 deploys: T016, T017 can run in parallel (both read-only checks)
**Sequential**:
- T014 depends on T011, T012, T013 (needs config in place)
- T015 depends on T014 (deployment needs config)
- T018-T027 must run sequentially (UI workflow dependencies)
### Phase 4-6 (User Stories 2, 3, 4)
**Parallel after US1**:
- US2 tasks (T028-T033) can run in parallel with US3 tasks (T034-T038) if US1 validates
- US4 tasks (T039-T042) should wait for US2 to confirm management UI working
---
## Implementation Strategy
### MVP-First Approach
**Week 1**: Focus exclusively on User Story 1 (T001-T027)
- Goal: Working Instagram bot responding to URLs in designated rooms
- Success: Can demo "post Instagram URL → see content in Matrix"
- Decision point: If MVP fails, stop and reassess before continuing
**Week 2**: Expand to User Stories 2 & 3 (T028-T038) in parallel
- Goal: Operational management and reliability validated
- Success: Admins can manage bots via UI, service survives restarts
**Week 3**: Add extensibility (User Story 4) if needed (T039-T042)
- Goal: Prove multi-bot capability
- Success: 3 concurrent bot instances running
**Week 4+**: 7-day validation period
- Monitor uptime (SC-003: 99% target)
- Monitor Instagram fetch success rate (SC-006: 95% target)
- Collect user feedback
### Incremental Delivery
Each user story delivers independently testable value:
- **US1**: Instagram content in Matrix (core value)
- **US2**: Self-service bot management (operational efficiency)
- **US3**: Production reliability (reduces maintenance burden)
- **US4**: Platform extensibility (future-proofing)
Can stop after any user story and still have working system.
---
## Testing Strategy
**Manual QA** (no automated tests per plan.md):
- Each user story has "Independent Test" criteria
- Acceptance scenarios from spec.md validated manually
- Success criteria (SC-001 through SC-008) checked via quickstart.md checklist
**Validation Period**:
- 7 days operational before merging to main (per constitution Principle III)
- Monitor metrics: uptime, response time, fetch success rate
- Document issues in worklog
---
## Risk Mitigation
**High-risk tasks**:
- T007: Removing registration_secrets (conduwuit incompatibility) - carefully test bot registration after change
- T015: Initial deployment (first time on ops-jrz1) - have rollback ready via nixos-rebuild switch --rollback
- T020: Bot user registration (new auth pattern) - document exact steps in worklog for repeatability
**Rollback points**:
- After T010: Can rollback before deployment if module adaptation fails
- After T015: NixOS generation rollback if service won't start
- After T027: Can remove bot and redeploy if issues found
---
## Success Metrics
**Per User Story**:
- US1: Bot responds to Instagram URLs within 5 seconds (SC-001)
- US2: Management UI loads within 2 seconds (SC-007)
- US3: 99% uptime over 7 days (SC-003), auto-recovery within 2 minutes (SC-004)
- US4: 3 concurrent instances without degradation (SC-002)
**Overall**:
- [ ] All 8 success criteria validated (SC-001 through SC-008)
- [ ] Constitution check passes (all 4 principles compliant)
- [ ] 7-day stability period completed without critical issues
- [ ] Documentation complete (spec, plan, quickstart, worklog, CLAUDE.md updated)
---
**Estimated Timeline**:
- **MVP (US1)**: 2-3 hours deployment + testing
- **Full Feature (US1-4)**: 1 week implementation + 1 week validation
- **Production Ready**: 2 weeks total (including 7-day stability period)
**Next Command**: `/speckit.implement` to begin execution (start with T001)

View file

@ -0,0 +1,532 @@
# Browser-Based Development Environment
## Overview
Provide VS Code in the browser via code-server, with:
- **opencode** AI coding agent pre-installed (CLI + VS Code extension)
- Container-based isolation for security against LLM-generated code risks
- Zero-setup experience for users of varying skill levels
## User Personas
| Persona | Description | Needs |
|---------|-------------|-------|
| **Non-programmer** | Learning to code with AI assistance | GUI-first, minimal friction, no terminal knowledge required |
| **Programmer (testing)** | Evaluating AI coding tools | Fast setup, full terminal access, multiple language support |
| **Learner** | Learning AI-assisted dev or new languages | Gentle on-ramp, room to grow, pre-configured tools |
## Requirements
| Requirement | Value |
|-------------|-------|
| Users | 1-5, separate workspaces |
| Inter-user isolation | Not required |
| Security model | Container sandbox per user |
| Access | HTTPS via existing nginx |
| Persistence | User workspaces survive restarts |
| AI tooling | opencode pre-installed and configured |
## Architecture
**Routing**: Subdomain-based (`dan.code.clarun.xyz`) for clean isolation.
Path-based routing (`/code/dan/`) was considered but rejected:
- VS Code extensions assume root path, break with subpaths
- Cookie scoping issues across users
- PWA installation fails
- WebSocket URL construction breaks
```
┌──────────────────────────────────────────┐
│ DNS (Vultr) │
│ *.code.clarun.xyz → 45.77.205.49 │
└────────────────────┬─────────────────────┘
┌─────────────────────────────────────────────┴─────────────────────────────────┐
│ nginx :443 │
│ (wildcard ACME cert for *.code.clarun.xyz) │
└─────────────────────┬───────────────────────────────────────────────┬─────────┘
│ │
┌────────────────┼────────────────┬────────────────┐ │
▼ ▼ ▼ ▼ ▼
dan.code. alice.code. bob.code. *.code. clarun.xyz
clarun.xyz clarun.xyz clarun.xyz clarun.xyz (existing)
│ │ │ │
▼ ▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────────┐
│ Podman │ │ Podman │ │ Podman │ │ 404 landing │
│ Container │ │ Container │ │ Container │ │ page │
│ │ │ │ │ │ │ (unknown user)│
│ code- │ │ code- │ │ code- │ └───────────────┘
│ server │ │ server │ │ server │
│ +opencode │ │ +opencode │ │ +opencode │
│ :8081 │ │ :8082 │ │ :8083 │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ /var/lib/ │ │ /var/lib/ │ │ /var/lib/ │
│ vscode/ │ │ vscode/ │ │ vscode/ │
│ dan/ │ │ alice/ │ │ bob/ │
└───────────┘ └───────────┘ └───────────┘
(bind mount)
```
### User Experience Flow
```
User opens browser
┌─────────────────────────────────────────────────────────────────┐
│ VS Code (in browser) │
│ │
│ ┌─────────────────────────┐ ┌──────────────────────────────┐ │
│ │ Editor Pane │ │ opencode Panel │ │
│ │ │ │ (Ctrl+Esc to open) │ │
│ │ [select code] ────────┼──► Context auto-shared │ │
│ │ │ │ │ │
│ │ ◄─────────────────────┼── AI suggests/edits │ │
│ │ │ │ │ │
│ └─────────────────────────┘ └──────────────────────────────┘ │
│ │
│ Keybindings: │
│ • Ctrl+Esc → Open opencode in split terminal │
│ • Ctrl+Shift+Esc → New opencode session │
│ • Alt+Ctrl+K → Insert file reference (@File#L37-42) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
## Technology Choices
### code-server (not openvscode-server)
| Factor | code-server | openvscode-server |
|--------|-------------|-------------------|
| Built-in auth | ✅ Password | ❌ Need proxy |
| Maintenance | Active (Coder) | Active (Gitpod) |
| NixOS module | ✅ `services.code-server` | ❌ Manual |
| Features | More batteries | Pure VS Code |
**Decision**: code-server for built-in auth and NixOS integration.
### Podman Rootless (not Docker)
| Factor | Podman | Docker |
|--------|--------|--------|
| Rootless | ✅ Native | ⚠️ Requires setup |
| Daemonless | ✅ Yes | ❌ dockerd required |
| NixOS integration | ✅ `virtualisation.oci-containers` | ✅ Also supported |
| Security | Container root → unprivileged user | Root unless configured |
**Decision**: Podman rootless for better security defaults and systemd integration.
### Bind Mounts (not Docker volumes)
| Factor | Bind Mounts | Docker Volumes |
|--------|-------------|----------------|
| Transparency | Standard directories | Opaque blobs |
| Backup | rsync, restic, tar | docker cp required |
| Recovery | Host filesystem tools | Volume commands |
| Permissions | Standard Unix perms | Volume driver dependent |
**Decision**: Bind mounts to `/var/lib/vscode/<user>/` for simplicity and backup compatibility.
### Authentication
| Option | Pros | Cons |
|--------|------|------|
| code-server password | Simple, per-user | Manual password management |
| nginx basic auth | Centralized | WebSocket conflicts, breaks PWA |
| OAuth proxy | SSO, enterprise | Complexity, RAM overhead |
**Decision**: code-server password auth, managed via sops-nix. nginx handles HTTPS only.
## Resource Planning
### Per-Container Limits
| Resource | Limit | Rationale |
|----------|-------|-----------|
| Memory (soft) | 2.5GB | Normal operation headroom for VS Code + opencode |
| Memory (hard) | 3GB | Comfortable for AI agent workloads, prevents OOM |
| CPU | 1.5 cores | Fair share, prevent monopolization |
### Server Sizing
| Users | RAM Required | CPU | Recommendation |
|-------|--------------|-----|----------------|
| 1 | ~3.5GB (3GB container + system) | 1-2 | Tight on 2GB VPS |
| 2-3 | ~7-10GB | 2 | Upgrade to 8GB |
| 4-5 | ~12-16GB | 2-4 | Upgrade to 16GB |
**Action**: Upgrade VPS to 8GB RAM before deployment (supports 2 users comfortably).
## Storage Layout
```
/var/lib/vscode/
├── dan/
│ ├── workspace/ # Project files (bind mount → container /home/coder/project)
│ └── config/ # VS Code settings, extensions (bind mount → container ~/.local/share/code-server)
├── alice/
│ ├── workspace/
│ └── config/
└── ...
```
### Backup Integration
Existing backup service (`modules/backup.nix`) can be extended:
```bash
# Add to backup script
tar czf "$TMP/vscode-workspaces.tar.gz" /var/lib/vscode/
```
## NixOS Implementation
### Module Structure
```
modules/
└── code-server-containers.nix # New module
```
### Configuration Interface
```nix
services.code-server-multi = {
enable = true;
users = {
dan = {
port = 8081;
passwordFile = config.sops.secrets.code-server-dan.path;
memoryLimit = "2G";
cpuLimit = "1.5";
};
alice = {
port = 8082;
passwordFile = config.sops.secrets.code-server-alice.path;
};
};
# Shared settings
baseImage = "codercom/code-server:latest"; # Or custom image with Nix
workspaceBase = "/var/lib/vscode";
};
```
### Generated Resources
For each user, the module generates:
1. **Podman container** via `virtualisation.oci-containers`
2. **Storage directories** via `systemd.tmpfiles.rules`
3. **nginx virtual host** (`<user>.code.clarun.xyz`) with WebSocket support
4. **sops secret** reference for password
**DNS requirement**: Wildcard A record `*.code.clarun.xyz` → server IP (configured in Vultr DNS)
### nginx Configuration
Per-user virtual hosts generated by module (one per user):
```nix
# Generated for each user (e.g., dan)
services.nginx.virtualHosts."dan.code.clarun.xyz" = {
forceSSL = true;
useACMEHost = "code.clarun.xyz"; # Wildcard cert
locations."/" = {
proxyPass = "http://127.0.0.1:8081"; # User's port
proxyWebsockets = true;
extraConfig = ''
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection upgrade;
proxy_set_header Accept-Encoding gzip;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
'';
};
};
# Wildcard cert for all subdomains
security.acme.certs."code.clarun.xyz" = {
domain = "code.clarun.xyz";
extraDomainNames = [ "*.code.clarun.xyz" ];
dnsProvider = "vultr"; # Requires DNS-01 challenge for wildcard
credentialsFile = config.sops.secrets.vultr-api-key.path;
};
# Catch-all for unknown subdomains
services.nginx.virtualHosts."*.code.clarun.xyz" = {
useACMEHost = "code.clarun.xyz";
locations."/" = {
return = "404";
};
};
```
**Note**: Wildcard certs require DNS-01 challenge (HTTP-01 won't work). Need Vultr API key for DNS automation.
## API Key Management
opencode requires API keys for AI providers (Anthropic, OpenAI). Strategy for managing these in multi-user environment:
### Phase 1: Shared Keys (MVP)
For 1-5 trusted users, inject shared API keys via environment variables:
```nix
# Per-user container gets keys from sops-nix
services.code-server-multi.users.dan = {
# ... other config ...
environment = {
ANTHROPIC_API_KEY = config.sops.secrets.opencode-anthropic.path;
OPENAI_API_KEY = config.sops.secrets.opencode-openai.path;
};
};
```
**Cost control at provider level:**
- Set monthly spend limits on API keys ($50-100/month)
- Create project-specific keys for this use case
- Monitor usage via provider dashboards
**Pros**: Simple, no additional infrastructure
**Cons**: Users can see keys via `env`, no per-user tracking
### Phase 2: Proxy with BYOK (Future)
If scale or cost becomes an issue:
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Container │────►│ API Proxy │────►│ AI Provider │
│ (opencode) │ │ (host) │ │ │
│ │ │ - Rate limit │ │ │
│ Base URL: │ │ - Log usage │ │ │
│ proxy:8080 │ │ - Add API key│ │ │
└──────────────┘ └──────────────┘ └──────────────┘
```
Options:
- **litellm**: Proxy supporting multiple providers, usage tracking
- **Custom**: Minimal proxy that adds keys and logs requests
**Bring Your Own Key (BYOK)**: Users provide their own API keys, stored in their container's persistent config.
### Decision: Phase 1 for MVP
For initial deployment with 1-5 users:
1. Shared keys injected via sops-nix environment variables
2. Per-key spend limits set at provider level (OpenAI: $50, Anthropic: $50)
3. Trust model: users are known/trusted, not adversarial
4. Re-evaluate when hitting limits or adding untrusted users
## Port Forwarding
### Phase 1: No User-Controlled Ports (MVP)
Users cannot expose their own web apps externally. Dev servers run inside container, accessible only via VS Code's built-in port forwarding (localhost within the browser session).
**Rationale**: Simplifies security model, avoids wildcard subdomain proliferation, reduces attack surface.
### Phase 2: Platform-Controlled Ports (Future)
If needed, platform team can expose specific user apps:
```
# Per-user app subdomain (requires platform team to configure)
dan-app.code.clarun.xyz → container port 8080
# Or numbered ports per user
dan.code.clarun.xyz:8080 → container port 8080
```
**Design consideration**: Reserve subdomain/port space in DNS and nginx config for future expansion without architectural changes.
## Security Model
### Container Isolation
| Threat | Mitigation |
|--------|------------|
| Filesystem escape | Bind mounts limit visible paths |
| Credential theft | Don't mount ~/.ssh, secrets |
| Host process access | Container namespaces |
| Resource exhaustion | Memory/CPU limits, OOM targets container |
| Network exfil | Possible future: network policy |
### What Containers Don't Prevent
- Malicious code running inside container
- Package supply chain attacks (npm, pip)
- Data exfiltration via allowed network
- Container escape via kernel vulnerability (rare)
### Defense in Depth
1. **Container**: Limits blast radius
2. **No host secrets**: ~/.ssh, AWS creds not mounted
3. **Resource limits**: Can't fork bomb host
4. **Easy reset**: Nuke container, keep workspace
5. **Backup**: Restore workspace from backup if compromised
## Image Strategy
### Custom Image with opencode (Required)
Since we need opencode pre-installed, a custom image is required:
```dockerfile
FROM codercom/code-server:latest
# Install opencode CLI
RUN curl -fsSL https://opencode.ai/install | bash
# Pre-install opencode VS Code extension (from Open VSX)
RUN code-server --install-extension sst-dev.opencode
# Install common language toolchains
RUN apt-get update && apt-get install -y \
python3 python3-pip \
nodejs npm \
git \
&& rm -rf /var/lib/apt/lists/*
# Optional: Install Nix for on-demand packages
# RUN curl -L https://nixos.org/nix/install | sh
# ENV PATH="/root/.nix-profile/bin:$PATH"
```
### Container Contents
| Component | Purpose |
|-----------|---------|
| code-server | VS Code in browser |
| opencode CLI | AI coding agent |
| sst-dev.opencode extension | VS Code integration for opencode |
| Python 3 | Common language |
| Node.js | Common language |
| Git | Version control |
### Image Management
Options for keeping image updated:
1. **Manual rebuild**: Rebuild and redeploy periodically
2. **CI/CD**: Auto-rebuild on Dockerfile changes
3. **Watchtower equivalent**: Auto-pull new tags (risky for stability)
**Decision**: Manual rebuild initially, automate via CI later if needed.
### Extension Pre-Installation
The opencode extension is available on Open VSX (required for code-server):
- Registry: [open-vsx.org/extension/sst-dev/opencode](https://open-vsx.org/extension/sst-dev/opencode)
- Install command: `code-server --install-extension sst-dev.opencode`
## Rollout Plan
### Phase 1: Single User (SSH Tunnel)
1. Deploy one container for testing
2. Access via SSH tunnel only
3. Validate WebSocket, extensions, terminal
4. Test memory usage under load
### Phase 2: nginx Integration
1. Add nginx reverse proxy route
2. Enable HTTPS via ACME
3. Test from external network
4. Validate PWA install works
### Phase 3: Multi-User
1. Add additional users
2. Upgrade server RAM if needed
3. Test concurrent usage
4. Document onboarding
### Phase 4: Hardening
1. Custom image with Nix (if needed)
2. Network policies (if needed)
3. Automated backup of workspaces
4. Monitoring/alerting
## Open Questions
1. ~~**Domain**: `code.clarun.xyz` or path under existing domain?~~ → Resolved: Subdomain routing (`dan.code.clarun.xyz`)
2. ~~**API keys**: How to provision opencode API keys (OpenAI, Anthropic, etc.) per user?~~ → Resolved: Phase 1 shared keys via sops-nix, provider-level spend limits
3. ~~**Git credentials**: How do users authenticate to git remotes?~~ → Resolved: Deferred - local-only projects initially, add git auth in Phase 2 if needed
4. **Onboarding docs**: What documentation do non-programmers need?
## References
### code-server
- [code-server GitHub](https://github.com/coder/code-server)
- [code-server multi-user blog](https://coder.com/blog/code-server-multiple-users)
- [NixOS oci-containers](https://nixos.wiki/wiki/Podman)
### opencode
- [opencode.ai](https://opencode.ai/)
- [opencode GitHub](https://github.com/sst/opencode)
- [opencode VS Code extension (Open VSX)](https://open-vsx.org/extension/sst-dev/opencode)
- [opencode VS Code extension (MS Marketplace)](https://marketplace.visualstudio.com/items?itemName=sst-dev.opencode)
### Other
- [Tailscale code-server guide](https://tailscale.com/kb/1166/vscode-ipad) (for iPad/PWA patterns)
## Appendix: Alternatives Considered
### VS Code Remote SSH
Users run VS Code locally, SSH to server for compute.
| Pros | Cons |
|------|------|
| Less server RAM (UI on laptop) | Not browser-only |
| Native VS Code experience | Requires local VS Code install |
| No container complexity | Less isolation |
| Better keyboard shortcuts | Higher barrier for non-programmers |
**Why not chosen**: Non-programmer users need zero-install browser access.
### openvscode-server (instead of code-server)
| Factor | code-server | openvscode-server |
|--------|-------------|-------------------|
| Built-in auth | ✅ | ❌ |
| NixOS module | ✅ | ❌ |
| Maintenance | Active | Active |
**Why not chosen**: code-server has built-in auth and better NixOS integration.
### Coder Platform (instead of DIY)
Enterprise platform for provisioning dev environments.
| Pros | Cons |
|------|------|
| Multi-user built-in | Terraform complexity |
| SSO, audit logs | Overkill for 1-5 users |
| Auto-shutdown | Designed for cloud provisioning |
**Why not chosen**: We have existing infrastructure; Coder adds unnecessary complexity.
### Terminal-Only (SSH + tmux + neovim)
| Pros | Cons |
|------|------|
| Minimal resources | High learning curve |
| Power user friendly | Non-programmers excluded |
**Why not chosen**: Must support non-programmer learners with GUI.