diff --git a/.beads/.gitignore b/.beads/.gitignore new file mode 100644 index 0000000..f438450 --- /dev/null +++ b/.beads/.gitignore @@ -0,0 +1,29 @@ +# SQLite databases +*.db +*.db?* +*.db-journal +*.db-wal +*.db-shm + +# Daemon runtime files +daemon.lock +daemon.log +daemon.pid +bd.sock + +# Legacy database files +db.sqlite +bd.db + +# Merge artifacts (temporary files from 3-way merge) +beads.base.jsonl +beads.base.meta.json +beads.left.jsonl +beads.left.meta.json +beads.right.jsonl +beads.right.meta.json + +# Keep JSONL exports and config (source of truth for git) +!issues.jsonl +!metadata.json +!config.json diff --git a/.beads/.local_version b/.beads/.local_version new file mode 100644 index 0000000..ae6dd4e --- /dev/null +++ b/.beads/.local_version @@ -0,0 +1 @@ +0.29.0 diff --git a/.beads/README.md b/.beads/README.md new file mode 100644 index 0000000..50f281f --- /dev/null +++ b/.beads/README.md @@ -0,0 +1,81 @@ +# Beads - AI-Native Issue Tracking + +Welcome to Beads! This repository uses **Beads** for issue tracking - a modern, AI-native tool designed to live directly in your codebase alongside your code. + +## What is Beads? + +Beads is issue tracking that lives in your repo, making it perfect for AI coding agents and developers who want their issues close to their code. No web UI required - everything works through the CLI and integrates seamlessly with git. + +**Learn more:** [github.com/steveyegge/beads](https://github.com/steveyegge/beads) + +## Quick Start + +### Essential Commands + +```bash +# Create new issues +bd create "Add user authentication" + +# View all issues +bd list + +# View issue details +bd show + +# Update issue status +bd update --status in_progress +bd update --status done + +# Sync with git remote +bd sync +``` + +### Working with Issues + +Issues in Beads are: +- **Git-native**: Stored in `.beads/issues.jsonl` and synced like code +- **AI-friendly**: CLI-first design works perfectly with AI coding agents +- **Branch-aware**: Issues can follow your branch workflow +- **Always in sync**: Auto-syncs with your commits + +## Why Beads? + +✨ **AI-Native Design** +- Built specifically for AI-assisted development workflows +- CLI-first interface works seamlessly with AI coding agents +- No context switching to web UIs + +🚀 **Developer Focused** +- Issues live in your repo, right next to your code +- Works offline, syncs when you push +- Fast, lightweight, and stays out of your way + +🔧 **Git Integration** +- Automatic sync with git commits +- Branch-aware issue tracking +- Intelligent JSONL merge resolution + +## Get Started with Beads + +Try Beads in your own projects: + +```bash +# Install Beads +curl -sSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash + +# Initialize in your repo +bd init + +# Create your first issue +bd create "Try out Beads" +``` + +## Learn More + +- **Documentation**: [github.com/steveyegge/beads/docs](https://github.com/steveyegge/beads/tree/main/docs) +- **Quick Start Guide**: Run `bd quickstart` +- **Examples**: [github.com/steveyegge/beads/examples](https://github.com/steveyegge/beads/tree/main/examples) + +--- + +*Beads: Issue tracking that moves at the speed of thought* ⚡ diff --git a/.beads/config.yaml b/.beads/config.yaml new file mode 100644 index 0000000..f242785 --- /dev/null +++ b/.beads/config.yaml @@ -0,0 +1,62 @@ +# Beads Configuration File +# This file configures default behavior for all bd commands in this repository +# All settings can also be set via environment variables (BD_* prefix) +# or overridden with command-line flags + +# Issue prefix for this repository (used by bd init) +# If not set, bd init will auto-detect from directory name +# Example: issue-prefix: "myproject" creates issues like "myproject-1", "myproject-2", etc. +# issue-prefix: "" + +# Use no-db mode: load from JSONL, no SQLite, write back after each command +# When true, bd will use .beads/issues.jsonl as the source of truth +# instead of SQLite database +# no-db: false + +# Disable daemon for RPC communication (forces direct database access) +# no-daemon: false + +# Disable auto-flush of database to JSONL after mutations +# no-auto-flush: false + +# Disable auto-import from JSONL when it's newer than database +# no-auto-import: false + +# Enable JSON output by default +# json: false + +# Default actor for audit trails (overridden by BD_ACTOR or --actor) +# actor: "" + +# Path to database (overridden by BEADS_DB or --db) +# db: "" + +# Auto-start daemon if not running (can also use BEADS_AUTO_START_DAEMON) +# auto-start-daemon: true + +# Debounce interval for auto-flush (can also use BEADS_FLUSH_DEBOUNCE) +# flush-debounce: "5s" + +# Git branch for beads commits (bd sync will commit to this branch) +# IMPORTANT: Set this for team projects so all clones use the same sync branch. +# This setting persists across clones (unlike database config which is gitignored). +# Can also use BEADS_SYNC_BRANCH env var for local override. +# If not set, bd sync will require you to run 'bd config set sync.branch '. +# sync-branch: "beads-sync" + +# Multi-repo configuration (experimental - bd-307) +# Allows hydrating from multiple repositories and routing writes to the correct JSONL +# repos: +# primary: "." # Primary repo (where this database lives) +# additional: # Additional repos to hydrate from (read-only) +# - ~/beads-planning # Personal planning repo +# - ~/work-planning # Work planning repo + +# Integration settings (access with 'bd config get/set') +# These are stored in the database, not in this file: +# - jira.url +# - jira.project +# - linear.url +# - linear.api-key +# - github.org +# - github.repo diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl new file mode 100644 index 0000000..8276628 --- /dev/null +++ b/.beads/issues.jsonl @@ -0,0 +1,39 @@ +{"id":"ops-jrz1-00e","title":"Upgrade NixOS from 24.05 to 24.11","description":"Running NixOS 24.05.20241230 (Uakari). Current stable is 24.11. May be missing security patches. Low priority as no known critical CVEs, but should plan upgrade.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-04T21:03:22.760228514-08:00","updated_at":"2025-12-04T21:04:35.805980055-08:00","comments":[{"id":1,"issue_id":"ops-jrz1-00e","author":"dan","text":"Analysis Findings:\n1. Version Mismatch: Local flake.nix is pinned to 'nixos-24.05', but the dev environment reports '25.11' (Unstable), indicating state divergence.\n2. Upstream Bugs: Blocking issues in mautrix-slack (ops-jrz1-blh) and maubot (sync failure) are present in the current unstable revision (2025-12-02).\n3. Recommendation: Upgrade platform to NixOS 24.11 (Stable) to align environment, ensure stability, and pull fresh upstream fixes.","created_at":"2025-12-08T23:54:57Z"}]} +{"id":"ops-jrz1-03o","title":"Upgrade mautrix-slack to v25.11","description":"Upgrade is just flake update + deploy. Current deployed: v0.2.3+dev.unknown (Oct 13). Flake lock: v25.10 (Oct 22). Latest nixpkgs-unstable: v25.11. Run: nix flake update nixpkgs-unstable \u0026\u0026 deploy. May fix edit panic (ops-jrz1-qxr).","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T18:24:18.332067067-08:00","updated_at":"2025-12-05T19:07:09.156981447-08:00","closed_at":"2025-12-05T19:07:09.156981447-08:00"} +{"id":"ops-jrz1-3ca","title":"Persist opencode state/cache across restarts","description":"opencode may store index/cache in ~/.cache or other dirs not covered by current bind mounts. AI context could be lost on container restart. Verify and add mounts.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:30.90315778-08:00","updated_at":"2025-12-05T15:32:30.90315778-08:00","dependencies":[{"issue_id":"ops-jrz1-3ca","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.247361009-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-3fd","title":"Deploy and test single-user instance (Phase 1)","description":"Deploy one container for testing. Validate: WebSocket, extensions, terminal, opencode, memory usage. Access via SSH tunnel initially.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.783260036-08:00","updated_at":"2025-12-05T17:16:54.783260036-08:00","dependencies":[{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.400677984-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-5oe","type":"blocks","created_at":"2025-12-05T17:17:38.708397909-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-av0","type":"blocks","created_at":"2025-12-05T17:17:38.721665448-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3fd","depends_on_id":"ops-jrz1-9gd","type":"blocks","created_at":"2025-12-05T17:17:38.737824478-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-3so","title":"Browser-based dev environment with opencode","description":"Epic: Provide VS Code in browser via code-server with opencode AI integration.\n\nKey decisions:\n- code-server in Podman containers (rootless)\n- opencode CLI + VS Code extension pre-installed\n- Subdomain routing (dan.code.clarun.xyz)\n- Custom container image\n- Target users: non-programmers, testers, learners\n\nDesign doc: specs/004-browser-dev-environment/design.md\n\nMigrated from ops-jrz1-ndl","status":"open","priority":1,"issue_type":"epic","created_at":"2025-12-05T17:04:36.709352529-08:00","updated_at":"2025-12-05T17:04:36.709352529-08:00"} +{"id":"ops-jrz1-3x4","title":"Add maubot SDK and deploy script to container image","description":"Container image needs:\n- Python 3.11 + maubot SDK\n- deploy.sh script (zip → .mbp → curl to maubot API)\n- maubot API reachable from container (host network or port forward)\n\nPart of learner onboarding for bot development.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-06T12:18:06.841708662-08:00","updated_at":"2025-12-06T12:18:06.841708662-08:00","dependencies":[{"issue_id":"ops-jrz1-3x4","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-06T12:18:16.085519885-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-3x4","depends_on_id":"ops-jrz1-d58","type":"blocks","created_at":"2025-12-06T12:18:16.110944935-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-45v","title":"Matrix/Slack identity mismatch: dan vs vlad","description":"Matrix user @dan:clarun.xyz is linked to Slack user 'vlad'. Messages appear as vlad in Slack but dan in Element. Cosmetic confusion. Options: rename Matrix display name, or re-login bridge with different Slack account.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T19:38:19.899555475-08:00","updated_at":"2025-12-05T19:38:19.899555475-08:00"} +{"id":"ops-jrz1-46y","title":"Write onboarding documentation","description":"Critical for non-programmers. Cover: login, opencode usage, Git setup (PAT workflow), resource limits, security hygiene. Keep concise.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:41.586544583-08:00","updated_at":"2025-12-05T15:32:41.586544583-08:00","dependencies":[{"issue_id":"ops-jrz1-46y","depends_on_id":"ops-jrz1-7j4","type":"blocks","created_at":"2025-12-05T15:33:25.328712413-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-46y","depends_on_id":"ops-jrz1-wj2","type":"blocks","created_at":"2025-12-05T15:33:25.351559821-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-46y","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.401868669-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-4jm","title":"Smoke test Matrix server (conduwuit)","description":"Verify Matrix homeserver is healthy: check /_matrix/client/versions endpoint, test registration, verify federation status (disabled). Quick health check after deployments.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T18:09:47.220765063-08:00","updated_at":"2025-12-05T18:19:33.059734881-08:00","closed_at":"2025-12-05T18:19:33.059734881-08:00"} +{"id":"ops-jrz1-5fk","title":"Smoke test Maubot service","description":"Verify Maubot is healthy: check management UI accessible via SSH tunnel, verify bot instances running, test plugin functionality. Quick health check after deployments.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T18:09:47.33773092-08:00","updated_at":"2025-12-05T18:19:33.061388913-08:00","closed_at":"2025-12-05T18:19:33.061388913-08:00"} +{"id":"ops-jrz1-5ki","title":"Set up programmatic QA test user for bridge testing","description":"","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T20:17:04.312571398-08:00","updated_at":"2025-12-05T20:17:04.312571398-08:00"} +{"id":"ops-jrz1-5oe","title":"Create NixOS module for code-server containers","description":"Module to manage per-user Podman containers, nginx routing, secrets. Use virtualisation.oci-containers. Generate systemd units.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.656121092-08:00","updated_at":"2025-12-05T17:16:54.656121092-08:00","dependencies":[{"issue_id":"ops-jrz1-5oe","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.386278268-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-5oe","depends_on_id":"ops-jrz1-d58","type":"blocks","created_at":"2025-12-05T17:17:38.694752468-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-6of","title":"AI cost/rate limiting per user","description":"One user could drain API credits with runaway script. Need rate limiting per user, either via proxy middleware or opencode config. Track usage.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:30.772304538-08:00","updated_at":"2025-12-05T17:42:42.773613559-08:00","closed_at":"2025-12-05T17:42:42.773613559-08:00","dependencies":[{"issue_id":"ops-jrz1-6of","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.206816868-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-6of","depends_on_id":"ops-jrz1-wj2","type":"blocks","created_at":"2025-12-05T17:17:38.658742196-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-7j4","title":"Git credential strategy for non-programmers","description":"Non-programmers can't manage SSH keys. Pre-configure git-credential-store or provide simple PAT workflow with docs. Store in persistent home with 600 perms.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:19.673999683-08:00","updated_at":"2025-12-05T17:38:54.788694408-08:00","closed_at":"2025-12-05T17:38:54.788694408-08:00","dependencies":[{"issue_id":"ops-jrz1-7j4","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.139749437-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-88o","title":"Implement backup strategy for VPS","description":"No backups configured. Critical data: Matrix DB (622M), PostgreSQL (161M), Forgejo (2.5M), maubot (320K). No recovery path if disk fails. Need automated backups with off-site storage.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-04T22:55:25.546850172-08:00","updated_at":"2025-12-05T00:56:27.720623612-08:00","closed_at":"2025-12-05T00:56:27.720623612-08:00"} +{"id":"ops-jrz1-9gd","title":"Upgrade VPS RAM for dev environments","description":"Current: 2GB. Need 4-8GB for multiple code-server containers. Coordinate with Vultr, plan maintenance window.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.267689439-08:00","updated_at":"2025-12-05T17:16:54.267689439-08:00","dependencies":[{"issue_id":"ops-jrz1-9gd","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.331146543-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-av0","title":"Configure wildcard DNS and ACME cert","description":"Set up *.code.clarun.xyz DNS record and wildcard SSL cert via ACME. Depends on subdomain routing decision (kg0).","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.387356964-08:00","updated_at":"2025-12-05T17:16:54.387356964-08:00","dependencies":[{"issue_id":"ops-jrz1-av0","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.34918436-08:00","created_by":"daemon"},{"issue_id":"ops-jrz1-av0","depends_on_id":"ops-jrz1-kg0","type":"blocks","created_at":"2025-12-05T17:17:38.676800677-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-bhk","title":"Add disk quotas for user workspaces","description":"User could fill host disk via /var/lib/vscode/\u003cuser\u003e/. Add per-directory quotas or monitoring/alerting on disk usage.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:41.199417226-08:00","updated_at":"2025-12-05T15:32:41.199417226-08:00","dependencies":[{"issue_id":"ops-jrz1-bhk","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.309592029-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-blh","title":"mautrix-slack edit panic persists in v25.11","description":"mautrix-slack panic on rapid message edits (race condition)\n\n**Root cause**: Edit event arrives before original message is stored in DB. ConvertEdit accesses nil metadata.\n\n**Location**: handleslack.go:575 - has TODO comment: 'this can panic?'\n\n**Reproduction**: Edit a Slack message within ~1 second of sending\n\n**Upstream status**: \n- v25.11 is latest (we're on it)\n- Known to devs (TODO in code)\n- No open issue filed yet\n\n**Stack trace**:\ngo.mau.fi/mautrix-slack/pkg/connector.(*SlackMessage).ConvertEdit\n handleslack.go:575\nmaunium.net/go/mautrix/bridgev2.(*Portal).handleRemoteEdit\n portal.go:2838","status":"open","priority":2,"issue_type":"bug","created_at":"2025-12-05T19:40:33.255395189-08:00","updated_at":"2025-12-05T23:05:05.344825241-08:00","comments":[{"id":2,"issue_id":"ops-jrz1-blh","author":"dan","text":"Confirmed panic exists in nixpkgs-unstable from 2025-12-02. Fix will be addressed via platform upgrade (see ops-jrz1-00e).","created_at":"2025-12-08T23:54:57Z"}]} +{"id":"ops-jrz1-d58","title":"Build custom code-server container image","description":"Dockerfile with: code-server, opencode CLI, opencode VS Code extension (Open VSX), Python, Node, Git. Push to registry or build locally.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.507577308-08:00","updated_at":"2025-12-05T17:16:54.507577308-08:00","dependencies":[{"issue_id":"ops-jrz1-d58","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.369590207-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-dhj","title":"Port forwarding strategy for user apps","description":"When user runs app on localhost:3000, how do they view it? code-server has /proxy/\u003cport\u003e but URL is confusing for learners. Need clear UX or docs.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:30.649292743-08:00","updated_at":"2025-12-05T17:41:01.486505687-08:00","closed_at":"2025-12-05T17:41:01.486505687-08:00","dependencies":[{"issue_id":"ops-jrz1-dhj","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.175857247-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-dt9","title":"Increase container RAM limits (2GB too tight)","description":"2GB hard limit will OOM with code-server + opencode + LSP + user app. Gemini/GPT recommend 3-4GB per container or add swap. Need to size server appropriately.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:19.400171408-08:00","updated_at":"2025-12-05T17:38:54.770433169-08:00","closed_at":"2025-12-05T17:38:54.770433169-08:00","dependencies":[{"issue_id":"ops-jrz1-dt9","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.066130377-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-dux","title":"Container isolation: maubot API access only","description":"Security design for learner containers:\n\n**Container CAN access**:\n- maubot API (:29316) for plugin deploy\n- Matrix rooms via bot (through maubot)\n- Slack via bridge (through Matrix)\n\n**Container CANNOT access**:\n- Host filesystem\n- Other containers\n- PostgreSQL directly\n- Matrix homeserver directly\n- sops secrets\n\nImplementation: Podman network config, no --privileged, limited port exposure.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-06T12:18:16.212646624-08:00","updated_at":"2025-12-06T12:18:16.212646624-08:00","dependencies":[{"issue_id":"ops-jrz1-dux","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-06T12:18:21.627621772-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-ezf","title":"Maubot plugin dev workflow for learners","description":"Design frictionless dev workflow for Python/Go learners building maubot plugins.\n\n**Requirements**:\n- No SSH tunnel setup for learners\n- Fast feedback loop (edit → see bot respond)\n- Circuit breakers (allowed_rooms, rate limits)\n- Test channel: #vlads-pad (Slack) ↔ Matrix\n\n**Options being considered**:\n1. Git-push deploy: push to repo → CI builds .mbp → deploys to maubot\n2. Code-server containers: browser IDE on VPS, deploy script talks to maubot locally\n3. Hybrid: code-server + git workflow\n\n**Related**: ops-jrz1-3so (browser-dev-environment epic)","status":"open","priority":2,"issue_type":"feature","created_at":"2025-12-06T01:36:26.529372206-08:00","updated_at":"2025-12-06T01:36:26.529372206-08:00","dependencies":[{"issue_id":"ops-jrz1-ezf","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-06T12:18:06.743837766-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-gci","title":"Enable fail2ban for SSH brute force protection","description":"SSH brute force attempts generate log noise but don't pose security risk (key-only auth). fail2ban would help but is low priority. Deferred pending RFC on SSH log management strategy.","status":"open","priority":2,"issue_type":"task","created_at":"2025-12-04T21:03:22.651495544-08:00","updated_at":"2025-12-04T22:55:13.805471391-08:00","dependencies":[{"issue_id":"ops-jrz1-gci","depends_on_id":"ops-jrz1-nir","type":"blocks","created_at":"2025-12-04T22:56:14.777377818-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-glk","title":"VS Code extension policy (security)","description":"Extensions can run arbitrary code. Decide: allow arbitrary installs, or curate/restrict? For non-programmers, pre-install safe set and optionally disable marketplace.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:41.463030936-08:00","updated_at":"2025-12-05T15:32:41.463030936-08:00","dependencies":[{"issue_id":"ops-jrz1-glk","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.372120465-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-i8i","title":"Enable mautrix-slack relay mode for bot bridging","description":"","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-06T19:09:42.087506995-08:00","updated_at":"2025-12-06T19:09:47.612545472-08:00","closed_at":"2025-12-06T19:09:47.612545472-08:00"} +{"id":"ops-jrz1-iok","title":"Instagram bot missing base-config.yaml","description":"Plugin was missing base-config.yaml required by maubot Config class. Fixed in commit 4b9481d.","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-06T13:02:10.103730128-08:00","updated_at":"2025-12-06T13:02:15.055396318-08:00","closed_at":"2025-12-06T13:02:15.055396318-08:00"} +{"id":"ops-jrz1-jit","title":"Logging and monitoring for dev environments","description":"No observability plan. Need: container CPU/mem metrics, nginx logs, disk usage monitoring, alert on repeated 401s or resource exhaustion.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:41.318448038-08:00","updated_at":"2025-12-05T15:32:41.318448038-08:00","dependencies":[{"issue_id":"ops-jrz1-jit","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.343610481-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-kg0","title":"Switch to subdomain routing (dan.code.clarun.xyz)","description":"Path-based routing (/code/dan/) is fragile. Extensions assume root path, cookies scope incorrectly, PWA breaks. Switch to wildcard subdomains for cleaner isolation.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-05T15:32:19.283887085-08:00","updated_at":"2025-12-05T17:23:11.983564455-08:00","closed_at":"2025-12-05T17:23:11.983564455-08:00","dependencies":[{"issue_id":"ops-jrz1-kg0","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.043217984-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-kia","title":"Container reset mechanism (keep workspace)","description":"If user breaks their environment, need simple way to wipe container and restore default image while preserving /workspace. Script or admin command.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:31.045592689-08:00","updated_at":"2025-12-05T15:32:31.045592689-08:00","dependencies":[{"issue_id":"ops-jrz1-kia","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.275530016-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-ndl","title":"Browser-based dev environment (code-server)","description":"Explore setting up browser-based development:\n\nOptions:\n- code-server / openvscode-server - VS Code in browser\n- ttyd / wetty - terminal in browser \n- PWA install to home screen for native app feel\n\nCould combine with Tailscale for secure access without exposing ports.\n\nRef: ops-dev thin client brainstorm session","notes":"Design doc created: specs/004-browser-dev-environment/design.md - covers architecture, tech choices, resource planning, security model, rollout phases","status":"closed","priority":2,"issue_type":"feature","created_at":"2025-12-04T15:08:02.406274744-08:00","updated_at":"2025-12-05T17:05:52.872944892-08:00","closed_at":"2025-12-05T17:05:52.872944892-08:00"} +{"id":"ops-jrz1-nir","title":"RFC: SSH log noise reduction strategy","description":"Research showed 99.8% of SSH logs are scanner noise (9000 failed attempts/day). Options: (1) Change SSH port - simple, ~99% reduction (2) journald filter - surgical but complex (3) LogLevel ERROR - loses successful login audit trail (4) fail2ban - bans IPs, partial reduction. Orch consensus: Gemini opposed LogLevel ERROR due to losing audit trail, GPT supported. Need RFC to decide approach. See posture review from Dec 2025 session.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-04T22:55:13.990334935-08:00","updated_at":"2025-12-04T22:55:13.990334935-08:00"} +{"id":"ops-jrz1-nvx","title":"Slack bot architecture: Matrix-first approach","description":"**Decision**: Use Matrix as primary platform for Slack bot development.\n\n**Architecture**: Bots run as maubot plugins (or Matrix bots), communicate to Slack via mautrix-slack bridge.\n\n**Rationale**:\n- Existing infrastructure (maubot deployed, bridge working)\n- Single platform to manage\n- Bots work with Matrix users too\n- Avoid Socket Mode contention (only one xapp- connection allowed)\n\n**Trade-offs accepted**:\n- Bridge dependency (edit panic bug exists)\n- Extra latency through bridge hop\n- Limited to bridged channels\n\n**Alternative considered (Option B - direct Slack API)**:\n- Could use xoxb- token for outbound-only (REST)\n- Would need new Slack app for full Socket Mode independence\n- Deferred for now\n\n**Credentials available**:\n- slack-oauth-token (xoxb-) - shareable for REST calls if needed\n- slack-app-token (xapp-) - reserved for bridge Socket Mode\n\n**Status**: DECIDED - staying with Matrix-first","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-05T23:12:22.011872713-08:00","updated_at":"2025-12-05T23:12:28.329467732-08:00","closed_at":"2025-12-05T23:12:28.329467732-08:00"} +{"id":"ops-jrz1-qxr","title":"mautrix-slack message edit panic (upstream bug)","description":"Bridge upgraded to v25.11. Need to verify if edit panic is fixed by testing a Slack message edit. Watch logs: journalctl -u mautrix-slack -f | grep -E 'ERR|panic|edit'","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-05T18:22:38.18203834-08:00","updated_at":"2025-12-05T19:36:00.556011621-08:00","closed_at":"2025-12-05T19:36:00.556011621-08:00","dependencies":[{"issue_id":"ops-jrz1-qxr","depends_on_id":"ops-jrz1-03o","type":"blocks","created_at":"2025-12-05T18:24:23.259399275-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-u0w","title":"Security review of running server","description":"","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-04T21:03:22.420507724-08:00","updated_at":"2025-12-04T21:04:31.989886731-08:00","closed_at":"2025-12-04T21:04:31.989886731-08:00"} +{"id":"ops-jrz1-wj2","title":"Design API key provisioning strategy","description":"opencode needs API keys (OpenAI, Anthropic). Options: 1) Shared key with proxy + rate limiting, 2) Per-user keys in sops-nix. Need to prevent key exposure and enable usage tracking.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-05T15:32:19.526073243-08:00","updated_at":"2025-12-05T17:25:10.534718515-08:00","closed_at":"2025-12-05T17:25:10.534718515-08:00","dependencies":[{"issue_id":"ops-jrz1-wj2","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.103332379-08:00","created_by":"daemon"}]} +{"id":"ops-jrz1-xz1","title":"Fix maubot admin UI exposed to internet (port 29316)","description":"Maubot admin UI on port 29316 is publicly accessible (returns 401 but API surface exposed). Firewall explicitly allows this port. Risk: brute force on admin password, direct exploit of any maubot vulnerabilities. Fix: bind to 127.0.0.1 only, remove from firewall, access via SSH tunnel.","status":"closed","priority":1,"issue_type":"bug","created_at":"2025-12-04T21:03:22.531676543-08:00","updated_at":"2025-12-04T22:35:24.162735368-08:00","closed_at":"2025-12-04T22:35:24.162735368-08:00"} +{"id":"ops-jrz1-zvh","title":"Fix maubot health check (failing every 5 min)","description":"Health check at /_matrix/maubot/v1/version returns 401 (auth required). Check script doesn't provide auth token. Spamming error logs every 5 minutes.","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-04T22:55:25.755541054-08:00","updated_at":"2025-12-05T02:00:19.284410671-08:00","closed_at":"2025-12-05T02:00:19.284410671-08:00"} diff --git a/.beads/metadata.json b/.beads/metadata.json new file mode 100644 index 0000000..c787975 --- /dev/null +++ b/.beads/metadata.json @@ -0,0 +1,4 @@ +{ + "database": "beads.db", + "jsonl_export": "issues.jsonl" +} \ No newline at end of file diff --git a/.gitignore b/.gitignore index 61a730d..438b561 100644 --- a/.gitignore +++ b/.gitignore @@ -48,6 +48,7 @@ venv/ # Spec-kit framework (auto-updated by framework) .claude/commands/speckit.*.md +.codex/ .specify/memory/ .specify/scripts/ .specify/templates/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..d64d8ba --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,48 @@ +# Beads Issue Tracking + +**Session start**: Run `bd ready` to see available work. + +## Commands +- `bd ready` - Issues with no blockers +- `bd show ` - Issue details +- `bd update --status=in_progress` - Claim work +- `bd close ` - Complete work +- `bd create --title="..." --type=task|bug|feature` - New issue +- `bd dep add ` - Add dependency + +## Session End +Before finishing: `git status`, `git add`, `git commit`. This is an ephemeral branch - merge to main locally. + +# Repository Guidelines + +## Project Structure & Module Organization +- `configuration.nix` holds shared system defaults; adjust service toggles in host overlays instead of editing it directly. +- `hosts/ops-jrz1.nix` and `hosts/ops-jrz1-vm.nix` override environment-specific networking, secrets, and hardware details; mirror changes across both when possible. +- `modules/` contains composable NixOS modules (`matrix-continuwuity.nix`, `mautrix-*.nix`, `security/*`); keep new modules kebab-cased and expose options via `lib.mkOption`. +- `scripts/` provides sanitization utilities. Stage external imports under `staging/`, run `./scripts/sanitize-files.sh SRC staging/modules`, then promote files into `modules/` once validation passes. +- `specs/` and `docs/` capture design intent and runbooks; update the relevant spec when changing feature scope. + +## Build, Test, and Development Commands +- `nix flake check` validates module wiring, options, and formatting before review. +- `nix build .#nixosConfigurations.ops-jrz1` produces the deployable system closure; use this to catch evaluation regressions. +- `nixos-rebuild switch --flake .#ops-jrz1 --target-host root@ops-jrz1` deploys to the VPS; replace the target host when testing elsewhere. +- `./scripts/validate-sanitization.sh modules/` ensures redacted content before commit; rerun after manual edits to sanitized files. + +## Coding Style & Naming Conventions +- Prefer two-space indentation in Nix files; align attribute sets and option blocks for readability. +- Use `lowerCamelCase` for option names, kebab-case for file names, and leave explanatory comments above non-obvious logic paths only. +- Format Nix with `nix fmt` (nixpkgs-fmt) or equivalent before committing to keep diffs minimal. + +## Testing Guidelines +- Treat `nix flake check` as the minimum gate; add targeted VM tests in `hosts/ops-jrz1-vm.nix` when introducing new services. +- Name ad-hoc verification scripts under `scripts/local-*` and avoid committing transient debug helpers. +- Capture manual verification steps in `docs/worklogs/` immediately after deploys for traceability. + +## Commit & Pull Request Guidelines +- Follow the existing Git log style: single-line, capitalized summaries in ~70 characters (e.g., `Tighten bridge secret validation`). +- Reference related specs or worklogs in the body, and list `nix flake check` (and any VM smoke tests) under a short "Validation" block. +- PRs should link the tracked task, summarize scope, highlight sanitization steps, and mention any secrets or infra touchpoints reviewers must provision. + +## Security & Secrets Handling +- Never commit decrypted material; use `sops secrets/secrets.yaml` for edits and confirm `git status` shows only encrypted blobs. +- Replace real domains, IPs, and tokens with repository-safe placeholders. When importing upstream configs, run the sanitize and validate scripts before staging changes. diff --git a/CLAUDE.md b/CLAUDE.md index e7f1f5a..dbd92fd 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -98,6 +98,21 @@ ssh root@45.77.205.49 'sudo -u postgres psql mautrix_slack -c "\dt"' ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack' > backup.sql ``` +### SSH Tunnels +```bash +# Maubot web UI (admin interface for managing bot instances) +ssh -L 29316:localhost:29316 root@45.77.205.49 +# Then access: http://localhost:29316 +# Login: admin / (password from secrets/secrets.yaml) + +# Matrix homeserver (for debugging) +ssh -L 8008:localhost:8008 root@45.77.205.49 +# Then access: http://localhost:8008 + +# Keep tunnel open in background +ssh -fN -L 29316:localhost:29316 root@45.77.205.49 +``` + ## Code Style - Nix 2.x, NixOS 24.05+, Bash 5.x: Follow standard conventions - NixOS modules: Use nixpkgs module pattern (options, config, mkIf) @@ -199,6 +214,7 @@ git branch -d 003-feature-name - Tag releases for deployment milestones ## Recent Changes +- 003-maubot-integration: Added [if applicable, e.g., PostgreSQL, CoreData, files or N/A] - 001-extract-matrix-platform: Added Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts) - 002-slack-bridge-integration: Deployed mautrix-slack bridge with Socket Mode (2025-10-26) - Phase 0-1: Research and design complete @@ -458,4 +474,4 @@ postgresql.service - Fresh database recommended after conduwuit version upgrades - Debug logging currently enabled on conduwuit - \ No newline at end of file + diff --git a/flake.lock b/flake.lock index 00a16f2..208aa95 100644 --- a/flake.lock +++ b/flake.lock @@ -34,11 +34,11 @@ }, "nixpkgs-unstable": { "locked": { - "lastModified": 1761114652, - "narHash": "sha256-f/QCJM/YhrV/lavyCVz8iU3rlZun6d+dAiC3H+CDle4=", + "lastModified": 1764667669, + "narHash": "sha256-7WUCZfmqLAssbDqwg9cUDAXrSoXN79eEEq17qhTNM/Y=", "owner": "NixOS", "repo": "nixpkgs", - "rev": "01f116e4df6a15f4ccdffb1bcd41096869fb385c", + "rev": "418468ac9527e799809c900eda37cbff999199b6", "type": "github" }, "original": { diff --git a/hosts/ops-jrz1-vm.nix b/hosts/ops-jrz1-vm.nix index 5ade6df..3a9a08d 100644 --- a/hosts/ops-jrz1-vm.nix +++ b/hosts/ops-jrz1-vm.nix @@ -3,12 +3,16 @@ { config, pkgs, pkgs-unstable, lib, ... }: { + # Disable built-in NixOS maubot module to use our sops-nix enhanced version + disabledModules = [ "services/matrix/maubot.nix" ]; + imports = [ # Import all modules (same as production) ../modules/matrix-continuwuity.nix ../modules/mautrix-slack.nix ../modules/mautrix-whatsapp.nix ../modules/mautrix-gmessages.nix + ../modules/maubot.nix ../modules/dev-services.nix ../modules/security/fail2ban.nix ../modules/security/ssh-hardening.nix @@ -74,5 +78,11 @@ allowedTCPPorts = [ 22 80 443 8008 3000 ]; }; + # Dummy filesystem for VM evaluation + fileSystems."/" = { + device = "/dev/vda1"; + fsType = "ext4"; + }; + system.stateVersion = "24.05"; } diff --git a/hosts/ops-jrz1.nix b/hosts/ops-jrz1.nix index 7f5587d..4e02367 100644 --- a/hosts/ops-jrz1.nix +++ b/hosts/ops-jrz1.nix @@ -4,6 +4,9 @@ # ops-jrz1 production VPS configuration # Imports extracted Matrix modules from ops-base + # Disable built-in NixOS maubot module to use our sops-nix enhanced version + disabledModules = [ "services/matrix/maubot.nix" ]; + imports = [ # Hardware configuration ../hardware-configuration.nix @@ -11,10 +14,12 @@ # Matrix platform modules ../modules/matrix-continuwuity.nix ../modules/mautrix-slack.nix + ../modules/maubot.nix ../modules/dev-services.nix ../modules/security/fail2ban.nix ../modules/security/ssh-hardening.nix ../modules/matrix-secrets + ../modules/backup.nix ]; # System configuration @@ -35,6 +40,16 @@ mode = "0444"; }; + sops.secrets.maubot-admin-password = { + # Maubot management interface admin password + mode = "0400"; + }; + + sops.secrets.maubot-secret-key = { + # Maubot session secret key + mode = "0400"; + }; + # Matrix homeserver configuration # NOTE: Disabled in favor of dev-platform.matrix which provides integrated # bridge coordination and systemd credential-based secrets management @@ -68,7 +83,16 @@ workspace = "chochacho"; port = 29319; }; + + maubot = { + enable = true; + port = 29316; + plugins = [ ../modules/plugins/sna-instagram-bot.mbp ]; + }; }; + # Local backup service (Phase 1: manual trigger) + services.backup.enable = true; + system.stateVersion = "24.05"; } diff --git a/modules/backup.nix b/modules/backup.nix new file mode 100644 index 0000000..4d3c3f5 --- /dev/null +++ b/modules/backup.nix @@ -0,0 +1,108 @@ +# Local backup service for PostgreSQL and Maubot +# Phase 1: Manual trigger via `systemctl start backup` +# Phase 2: Enable timer for daily automation +{ config, pkgs, lib, ... }: + +with lib; + +let + cfg = config.services.backup; +in +{ + options.services.backup = { + enable = mkEnableOption "local backup service"; + + location = mkOption { + type = types.str; + default = "/var/backup"; + description = "Backup storage directory"; + }; + + retention = mkOption { + type = types.int; + default = 4; + description = "Days to retain backups"; + }; + }; + + config = mkIf cfg.enable { + # Ensure backup directory exists + systemd.tmpfiles.rules = [ + "d ${cfg.location} 0750 root root -" + ]; + + # Backup service (oneshot, manual trigger) + systemd.services.backup = { + description = "Local backup service"; + after = [ "postgresql.service" ]; + requires = [ "postgresql.service" ]; + + serviceConfig = { + Type = "oneshot"; + User = "root"; + # Low priority - don't impact running services + IOSchedulingClass = "idle"; + Nice = 19; + }; + + path = [ + config.services.postgresql.package # pg_dumpall + pkgs.gzip + pkgs.sqlite + pkgs.util-linux # runuser + pkgs.coreutils + pkgs.findutils + ]; + + script = '' + set -euo pipefail + + DATE=$(date +%Y-%m-%d) + BASE="${cfg.location}" + TMP="$BASE/.incomplete-$DATE" + DEST="$BASE/$DATE" + + # Skip if today's backup exists + if [ -d "$DEST" ]; then + echo "Backup already exists: $DEST" + exit 0 + fi + + # Clean up any previous incomplete attempts + rm -rf "$BASE"/.incomplete-* + mkdir -p "$TMP" + + # PostgreSQL (hot, consistent via MVCC) + echo "Backing up PostgreSQL..." + runuser -u postgres -- pg_dumpall | gzip > "$TMP/postgres.sql.gz" + gzip -t "$TMP/postgres.sql.gz" + + # Maubot SQLite (consistent via .backup API) + if [ -f /var/lib/maubot/bot.db ]; then + echo "Backing up Maubot..." + sqlite3 /var/lib/maubot/bot.db ".backup '$TMP/maubot.db'" + else + echo "Maubot DB not found, skipping" + fi + + # Atomic publish + mv "$TMP" "$DEST" + + # Prune old backups (keep ${toString cfg.retention} days) + find "$BASE" -mindepth 1 -maxdepth 1 -type d -mtime +${toString cfg.retention} -exec rm -rf {} + + + echo "Backup complete: $DEST" + ls -lh "$DEST" + ''; + }; + + # Timer (disabled by default, enable for Phase 2) + # systemd.timers.backup = { + # wantedBy = [ "timers.target" ]; + # timerConfig = { + # OnCalendar = "daily"; + # Persistent = true; + # }; + # }; + }; +} diff --git a/modules/dev-services.nix b/modules/dev-services.nix index bea7e71..4a7672b 100644 --- a/modules/dev-services.nix +++ b/modules/dev-services.nix @@ -75,6 +75,26 @@ in description = "Slack bridge port"; }; }; + + maubot = { + enable = mkOption { + type = types.bool; + default = false; + description = "Enable Maubot bot framework"; + }; + + port = mkOption { + type = types.port; + default = 29316; + description = "Maubot management interface port"; + }; + + plugins = mkOption { + type = types.listOf types.path; + default = []; + description = "Maubot plugins to deploy"; + }; + }; }; config = mkIf cfg.enable { @@ -217,8 +237,30 @@ in }; bridge.permissions = { - "${cfg.matrix.serverName}" = "user"; + "${cfg.matrix.serverName}" = "admin"; }; + + encryption.enable = false; + + # Enable relay mode so non-logged-in Matrix users (like bots) + # can send messages to Slack via a logged-in relay account + extraConfig = { + bridge.relay = { + enabled = true; + admin_only = false; # Allow room admins to set relay + }; + }; + }; + + # Maubot bot framework (using custom module with sops-nix integration) + services.maubot = mkIf cfg.maubot.enable { + enable = true; + homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}"; + serverName = cfg.matrix.serverName; + port = cfg.maubot.port; + adminPasswordFile = "/run/secrets/maubot-admin-password"; + secretKeyFile = "/run/secrets/maubot-secret-key"; + plugins = cfg.maubot.plugins; }; # Basic Nginx reverse proxy diff --git a/modules/matrix-continuwuity.nix b/modules/matrix-continuwuity.nix index cbb37d8..faf4009 100644 --- a/modules/matrix-continuwuity.nix +++ b/modules/matrix-continuwuity.nix @@ -20,7 +20,7 @@ let allow_federation = ${boolToString cfg.enableFederation} database_backend = "rocksdb" database_path = "${cfg.dataDir}/db/" - log = "info,continuwuity=debug" + log = "info" ${optionalString cfg.enableFederation '' trusted_servers = ["matrix.org"] ''} diff --git a/modules/maubot.nix b/modules/maubot.nix new file mode 100644 index 0000000..3a5914f --- /dev/null +++ b/modules/maubot.nix @@ -0,0 +1,393 @@ +# Maubot Matrix bot framework module +# Plugin-based Matrix bot system following established infrastructure patterns +{ config, pkgs, lib, ... }: + +with lib; + +let + cfg = config.services.maubot; + + # Python environment with maubot and Instagram bot dependencies + maubotEnv = pkgs.python3.withPackages (ps: with ps; [ + maubot + yt-dlp + # instaloader # Not available in nixpkgs, fallback to yt-dlp only + aiohttp + pillow + ]); +in +{ + options.services.maubot = { + enable = mkEnableOption "Maubot Matrix bot framework"; + + homeserverUrl = mkOption { + type = types.str; + default = "http://127.0.0.1:8008"; + description = "Matrix homeserver URL for bot connections"; + }; + + serverName = mkOption { + type = types.str; + default = "matrix.talu.uno"; + description = "Matrix server name for bot users"; + }; + + port = mkOption { + type = types.port; + default = 29316; + description = "Port for Maubot management interface"; + }; + + adminUser = mkOption { + type = types.str; + default = "admin"; + description = "Admin username for Maubot management interface"; + }; + + adminPasswordFile = mkOption { + type = types.nullOr types.path; + default = null; + description = "Path to file containing admin password (more secure than adminPassword option)"; + }; + + secretKeyFile = mkOption { + type = types.nullOr types.path; + default = null; + description = "Path to file containing Maubot secret key for sessions"; + }; + + registrationSecretFile = mkOption { + type = types.nullOr types.path; + default = null; + description = "Path to file containing Matrix homeserver registration secret"; + }; + + database = mkOption { + type = types.str; + default = "sqlite:/var/lib/maubot/bot.db"; + description = "Database connection string (sqlite:// or postgresql://)"; + }; + + logLevel = mkOption { + type = types.str; + default = "INFO"; + description = "Log level (DEBUG, INFO, WARNING, ERROR)"; + }; + + enableEncryption = mkOption { + type = types.bool; + default = true; + description = "Enable end-to-end encryption support for bots"; + }; + + publicUrl = mkOption { + type = types.str; + default = "http://localhost:29316"; + description = "Public URL where Maubot management interface is accessible"; + }; + + plugins = mkOption { + type = types.listOf types.path; + default = []; + description = "List of maubot plugin .mbp files to deploy"; + }; + }; + + config = mkIf cfg.enable { + # User and group + users.users.maubot = { + isSystemUser = true; + group = "maubot"; + home = "/var/lib/maubot"; + createHome = true; + }; + users.groups.maubot = {}; + + # Configuration file generation + environment.etc."maubot/config.yaml" = { + text = '' + # Maubot configuration - generated by NixOS + # Database configuration + database: "${cfg.database}" + + # Server configuration + server: + hostname: 127.0.0.1 + port: ${toString cfg.port} + public_url: ${cfg.publicUrl} + + # Admin users for management interface + admins: + ${cfg.adminUser}: ${if cfg.adminPasswordFile != null then "REPLACE_ADMIN_PASSWORD" else "changeme-set-password"} + + # Bot configuration + api_features: + login: true + plugin: true + plugin_upload: true + instance: true + instance_database: true + log: true + + # Logging configuration + logging: + version: 1 + formatters: + precise: + format: '[%(levelname)s@%(name)s] %(message)s' + handlers: + console: + class: logging.StreamHandler + formatter: precise + file: + class: logging.handlers.RotatingFileHandler + formatter: precise + filename: /var/log/maubot/maubot.log + maxBytes: 52428800 + backupCount: 10 + loggers: + maubot: + level: ${cfg.logLevel} + mau: + level: ${cfg.logLevel} + aiohttp: + level: WARNING + root: + level: WARNING + handlers: [console, file] + + # Plugin directories - using flat keys as expected by maubot + plugin_directories.upload: /var/lib/maubot/plugins + plugin_directories.load: + - /var/lib/maubot/plugins + plugin_directories.trash: /var/lib/maubot/trash + + # Plugin databases configuration + plugin_databases: + sqlite: /var/lib/maubot/plugins + postgres: null + postgres_max_conns_per_plugin: 3 + postgres_opts: {} + + # Crypto configuration + crypto: + allow: ${if cfg.enableEncryption then "true" else "false"} + allow_level: warn + + # Secret key for sessions + secret_key: ${if cfg.secretKeyFile != null then "REPLACE_SECRET_KEY" else "insecure-default-change-me"} + ''; + user = "maubot"; + group = "maubot"; + mode = "0440"; + }; + + # Systemd service with hardening + systemd.services.maubot = { + description = "Maubot Matrix bot framework"; + after = [ "network.target" ]; + wantedBy = [ "multi-user.target" ]; + + serviceConfig = { + Type = "simple"; + User = "maubot"; + Group = "maubot"; + WorkingDirectory = "/var/lib/maubot"; + + # Use StateDirectory for runtime data + # RuntimeDirectory removed to avoid race condition with manual creation + StateDirectory = "maubot"; + LogsDirectory = "maubot"; + + # LoadCredential directives for secure secret injection + LoadCredential = + (optional (cfg.adminPasswordFile != null) "admin-password:${cfg.adminPasswordFile}") ++ + (optional (cfg.secretKeyFile != null) "secret-key:${cfg.secretKeyFile}"); + + # Pre-start script to generate runtime config with secrets + ExecStartPre = + if (cfg.adminPasswordFile != null || cfg.secretKeyFile != null) then + [ + (pkgs.writeShellScript "maubot-prepare-config" '' + set -e + + # Ensure config directory exists + ${pkgs.coreutils}/bin/mkdir -p /var/lib/maubot/config + + # Use text substitution to preserve YAML structure while injecting secrets + ${pkgs.python3.withPackages (ps: [ ps.pyyaml ])}/bin/python3 << 'EOF' +import os +import re + +# Read base configuration as text +with open('/etc/maubot/config.yaml', 'r') as f: + config_text = f.read() + +# Read secrets from CREDENTIALS_DIRECTORY if available +creds_dir = os.environ.get('CREDENTIALS_DIRECTORY') +if creds_dir: + # Replace admin password placeholder + admin_password_file = os.path.join(creds_dir, 'admin-password') + if os.path.exists(admin_password_file): + with open(admin_password_file, 'r') as f: + admin_password = f.read().strip() + config_text = config_text.replace('REPLACE_ADMIN_PASSWORD', admin_password) + + # Replace secret key placeholder + secret_key_file = os.path.join(creds_dir, 'secret-key') + if os.path.exists(secret_key_file): + with open(secret_key_file, 'r') as f: + secret_key = f.read().strip() + config_text = config_text.replace('REPLACE_SECRET_KEY', secret_key) + +# Write runtime config with restrictive permissions +os.umask(0o077) # Ensure only owner can read +with open('/var/lib/maubot/config/config.yaml', 'w') as f: + f.write(config_text) +EOF + '') + ] + else + [ + (pkgs.writeShellScript "maubot-prepare-config-simple" '' + ${pkgs.coreutils}/bin/mkdir -p /var/lib/maubot/config + ${pkgs.coreutils}/bin/cp /etc/maubot/config.yaml /var/lib/maubot/config/config.yaml + '') + ]; + + # Start Maubot with runtime config + ExecStart = "${maubotEnv}/bin/maubot -c /var/lib/maubot/config/config.yaml"; + + # Restart policy + Restart = "always"; + RestartSec = 10; + + # Security hardening following established patterns + NoNewPrivileges = true; + ProtectSystem = "strict"; + ProtectHome = true; + # PrivateTmp disabled to allow access to /run/maubot + PrivateTmp = false; + PrivateDevices = true; + ProtectKernelTunables = true; + ProtectKernelModules = true; + ProtectControlGroups = true; + + # Allow writing to data, log, and runtime directories + ReadWritePaths = [ + "/var/lib/maubot" + "/var/log/maubot" + "/run/maubot" + ]; + + # Network restrictions + RestrictAddressFamilies = [ "AF_INET" "AF_INET6" "AF_UNIX" ]; + + # System calls - Python application needs broader access + SystemCallArchitectures = "native"; + SystemCallFilter = [ + "@system-service" + "@network-io" + "@file-system" + "~@privileged" + ]; + + # Resource limits + MemoryMax = "512M"; + CPUWeight = 50; # Lower priority than Matrix server + IOWeight = 50; + + # Process security + UMask = "0027"; + LockPersonality = true; + RestrictRealtime = true; + RestrictSUIDSGID = true; + RemoveIPC = true; + + # Logging + StandardOutput = "journal"; + StandardError = "journal"; + SyslogIdentifier = "maubot"; + }; + }; + + # Directory permissions + systemd.tmpfiles.rules = [ + "d /var/lib/maubot 0755 maubot maubot -" + "d /var/lib/maubot/plugins 0755 maubot maubot -" + "d /var/lib/maubot/trash 0755 maubot maubot -" + "d /var/log/maubot 0755 maubot maubot -" + "d /run/maubot 0700 maubot maubot -" + ] ++ (map (plugin: + "L+ /var/lib/maubot/plugins/${baseNameOf plugin} - - - - ${plugin}" + ) cfg.plugins); + + # Health check service + systemd.services.maubot-health = { + description = "Maubot health check"; + after = [ "maubot.service" ]; + + serviceConfig = { + Type = "oneshot"; + User = "nobody"; + Group = "nogroup"; + ExecStart = pkgs.writeShellScript "maubot-health" '' + # Check if Maubot management interface is responding + # Note: All maubot endpoints require auth, so 401 is expected and healthy + HTTP_CODE=$(${pkgs.curl}/bin/curl -s -o /dev/null -w "%{http_code}" "http://localhost:${toString cfg.port}/_matrix/maubot/v1/login" 2>/dev/null) + if [ "$HTTP_CODE" = "401" ] || [ "$HTTP_CODE" = "200" ]; then + echo "Maubot health check: OK (HTTP $HTTP_CODE)" + exit 0 + else + echo "Maubot health check: FAILED (HTTP $HTTP_CODE)" + exit 1 + fi + ''; + StandardOutput = "journal"; + StandardError = "journal"; + }; + }; + + systemd.timers.maubot-health = { + description = "Maubot health check timer"; + wantedBy = [ "timers.target" ]; + timerConfig = { + OnCalendar = "*:0/5"; # Every 5 minutes + Persistent = true; + }; + }; + + # Health check failure handling - restart service if health check fails consistently + systemd.services.maubot-health-restart = { + description = "Restart Maubot on health check failure"; + serviceConfig = { + Type = "oneshot"; + ExecStart = pkgs.writeShellScript "maubot-health-restart" '' + # Check if maubot health service failed recently + if systemctl is-failed maubot-health.service >/dev/null 2>&1; then + echo "Maubot health check failed, restarting maubot service" + systemctl restart maubot.service + + # Reset health check failure state + systemctl reset-failed maubot-health.service + fi + ''; + User = "root"; + StandardOutput = "journal"; + StandardError = "journal"; + }; + }; + + systemd.timers.maubot-health-restart = { + description = "Monitor Maubot health check failures and restart if needed"; + wantedBy = [ "timers.target" ]; + timerConfig = { + OnCalendar = "*:2/10"; # Every 10 minutes, offset from health check + Persistent = true; + }; + }; + + # Maubot management interface only accessible via SSH tunnel (localhost:29316) + # Do NOT expose to internet - admin UI has no rate limiting + }; +} \ No newline at end of file diff --git a/modules/plugins/sna-instagram-bot-src/base-config.yaml b/modules/plugins/sna-instagram-bot-src/base-config.yaml new file mode 100644 index 0000000..1cf10a4 --- /dev/null +++ b/modules/plugins/sna-instagram-bot-src/base-config.yaml @@ -0,0 +1,9 @@ +enabled: true +max_file_size: 50000000 +supported_formats: + - mp4 + - jpg + - jpeg + - png + - webp +allowed_rooms: [] diff --git a/modules/plugins/sna-instagram-bot-src/instagram_bot.py b/modules/plugins/sna-instagram-bot-src/instagram_bot.py new file mode 100644 index 0000000..394cf80 --- /dev/null +++ b/modules/plugins/sna-instagram-bot-src/instagram_bot.py @@ -0,0 +1,553 @@ +import re +import asyncio +import tempfile +import os +from typing import Optional, Tuple +from urllib.parse import urlparse + +from mautrix.types import EventType, MediaMessageEventContent, MessageType, TextMessageEventContent, Format +from mautrix.util.config import BaseProxyConfig +from maubot import Plugin, MessageEvent +from maubot.handlers import event +import aiohttp +import yt_dlp + +try: + import instaloader + HAS_INSTALOADER = True +except ImportError: + HAS_INSTALOADER = False + + +class Config(BaseProxyConfig): + def do_update(self, helper): + helper.copy("enabled") + helper.copy("max_file_size") + helper.copy("supported_formats") + helper.copy("allowed_rooms") + + +class SocialMediaBot(Plugin): + """ + Maubot plugin for automatic social media content extraction. + + Detects Instagram and TikTok URLs in Matrix messages and automatically + extracts and uploads the media content to the room. + """ + + @classmethod + def get_config_class(cls): + return Config + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + + # Set up logging levels + self.log.info("SocialMediaBot initialized") + if self.config is not None: + self.log.debug(f"Configuration: enabled={self.config['enabled']}, " + f"max_file_size={self.config['max_file_size']}, " + f"allowed_rooms={self.config['allowed_rooms']}") + + # Compile URL patterns for Instagram and TikTok + self.instagram_pattern = re.compile( + r'https?://(?:www\.)?instagram\.com/(?:p|reel|stories)/[\w-]+/?' + ) + self.tiktok_pattern = re.compile( + r'https?://(?:www\.)?(?:tiktok\.com|vm\.tiktok\.com)/(?:@[\w.-]+/video/\d+|[\w-]+)/?' + ) + + # Combined pattern for finding any social media URL + self.combined_pattern = re.compile( + r'https?://(?:www\.)?(?:instagram\.com/(?:p|reel|stories)/[\w-]+|' + r'(?:tiktok\.com/@[\w.-]+/video/\d+|vm\.tiktok\.com/[\w-]+))/?' + ) + + @event.on(EventType.ROOM_MESSAGE) + async def handle_message(self, event: MessageEvent) -> None: + """ + Process incoming Matrix room messages for social media URLs. + + Implements: + - FR-021: Only processes messages from allowed_rooms + - FR-022: Silently ignores non-allowed rooms with debug logging + - FR-014: Ignores own messages to prevent loops + - FR-015: Processes only first URL by text position + """ + # FR-021/FR-022: Check allowed_rooms filter (early return with debug logging) + self.log.info(f"Received message in room {event.room_id}") + + if self.config is None: + self.log.info(f"Config not loaded, ignoring message in {event.room_id}") + return + + allowed_rooms = self.config.get("allowed_rooms", []) + self.log.info(f"allowed_rooms config: {allowed_rooms}") + + if not allowed_rooms: + self.log.info(f"Bot disabled (empty allowed_rooms list), ignoring message in {event.room_id}") + return + + if event.room_id not in allowed_rooms: + self.log.info(f"Room {event.room_id} not in allowed_rooms, ignoring message") + return + + # FR-014: Ignore messages sent by the bot itself (prevent loops) + if event.sender == self.client.mxid: + self.log.debug(f"Ignoring own message from {event.sender}") + return + + # Check if message has body content + if not hasattr(event.content, 'body') or not event.content.body: + return + + message_text = event.content.body + + # FR-015: Find first URL by text position (left-to-right scan, platform-agnostic) + url, platform = self._find_first_url(message_text) + + if url: + self.log.info(f"Processing first URL from message: {url} (platform: {platform})") + await self.process_url(event, url, platform) + + def _find_first_url(self, text: str) -> Tuple[Optional[str], Optional[str]]: + """ + Find the first social media URL in text by position (left-to-right). + + Returns: + Tuple of (url, platform) or (None, None) if no URL found + + Implements FR-015: First-URL detection by text position, platform-agnostic + """ + # Find all matches with their positions + instagram_matches = [(m.group(), m.start(), 'instagram') + for m in self.instagram_pattern.finditer(text)] + tiktok_matches = [(m.group(), m.start(), 'tiktok') + for m in self.tiktok_pattern.finditer(text)] + + # Combine and sort by position + all_matches = instagram_matches + tiktok_matches + if not all_matches: + return None, None + + all_matches.sort(key=lambda x: x[1]) # Sort by position + + # Log any additional URLs at debug level (FR-015) + if len(all_matches) > 1: + extra_urls = [m[0] for m in all_matches[1:]] + self.log.debug(f"Found {len(all_matches)} URLs, ignoring extras: {extra_urls}") + + first_url, _, platform = all_matches[0] + return first_url, platform + + async def process_url(self, event: MessageEvent, url: str, platform: str) -> None: + """ + Process a single social media URL with progressive status message editing. + + Implements: + - FR-011: Progressive message editing from "Processing..." to final result + - FR-012: Inline media embedding with fallback to separate message + - FR-013: Error messages via status message editing + + Args: + event: Matrix message event + url: Single URL string to process + platform: Platform identifier ("instagram" or "tiktok") + """ + platform_name = platform.title() + platform_emoji = "📸" if platform == "instagram" else "🎵" + + # FR-011: Send initial "Processing..." message and store event_id + try: + status_msg = await event.respond(f"🔍 Processing {platform_name} content...") + status_event_id = status_msg.event_id + except Exception as e: + self.log.error(f"Failed to send initial status message: {e}") + return + + try: + # Extract content + content_info = await self.extract_content_with_ytdlp(url) + + if content_info: + # Log successful extraction + self.log.info(f"Extracted {platform} content: type={content_info.get('type')}, " + f"size={len(content_info.get('content', []))}, " + f"dimensions={content_info.get('width')}x{content_info.get('height')}, " + f"duration={content_info.get('duration')}") + + # Upload media and edit status message with result + content_info['platform'] = platform + content_info['platform_emoji'] = platform_emoji + await self.upload_media_and_edit_message(event.room_id, status_event_id, content_info) + else: + # FR-013: Edit status message to show error + error_msg = f"❌ Failed to extract content: Content unavailable or private" + await self.edit_status_message(event.room_id, status_event_id, error_msg) + self.log.warning(f"Failed to extract content from {url}") + + except Exception as e: + # FR-013: Edit status message with error description + self.log.exception(f"Error processing {platform} URL {url}") + error_msg = f"❌ Failed to extract content: {str(e)}" + await self.edit_status_message(event.room_id, status_event_id, error_msg) + + async def extract_content_with_ytdlp(self, url: str) -> Optional[dict]: + """ + Extract media content from social media URL using yt-dlp with fallback. + + Implements graceful degradation (Constitution III): + - Primary: yt-dlp (works for both Instagram and TikTok) + - Fallback: instaloader (Instagram only) + - Specific error handling for private content, deleted posts, rate limits + + Args: + url: Social media URL string + + Returns: + dict with content data or None if extraction fails + + Raises: + Exception with specific error messages for known failure cases + """ + # Try primary extraction with yt-dlp + try: + return await self.extract_with_ytdlp(url) + except Exception as e: + error_str = str(e).lower() + + # Check for specific error cases + if any(keyword in error_str for keyword in ['private', 'login', 'auth', 'permission']): + self.log.warning(f"yt-dlp: Private content or authentication required for {url}") + raise Exception("Private content or authentication required") + elif any(keyword in error_str for keyword in ['not found', '404', 'deleted', 'unavailable']): + self.log.warning(f"yt-dlp: Content deleted or unavailable for {url}") + raise Exception("Content deleted or unavailable") + elif any(keyword in error_str for keyword in ['rate limit', 'too many requests', '429']): + self.log.warning(f"yt-dlp: Rate limited for {url}") + raise Exception("Rate limit exceeded - please try again later") + else: + self.log.warning(f"yt-dlp extraction failed for {url}: {e}") + + # For Instagram URLs, try instaloader as fallback + if 'instagram.com' in url and HAS_INSTALOADER: + try: + return await self.extract_with_instaloader(url) + except Exception as e: + error_str = str(e).lower() + + # Check for specific error cases in fallback + if any(keyword in error_str for keyword in ['private', 'login', 'auth']): + raise Exception("Private content or authentication required") + elif any(keyword in error_str for keyword in ['not found', '404', 'deleted']): + raise Exception("Content deleted or unavailable") + elif any(keyword in error_str for keyword in ['rate limit', 'too many']): + raise Exception("Rate limit exceeded - please try again later") + else: + self.log.warning(f"instaloader extraction failed for {url}: {e}") + + return None + + async def extract_with_ytdlp(self, url: str) -> Optional[dict]: + """ + Extract content using yt-dlp library. + + Implements: + - Constitution II: Async-first via run_in_executor + - Constitution IV: Temporary file cleanup via context managers + """ + def _extract(): + # Constitution IV: Context manager ensures cleanup + with tempfile.TemporaryDirectory() as temp_dir: + ydl_opts = { + 'outtmpl': os.path.join(temp_dir, '%(title)s.%(ext)s'), + 'writeinfojson': True, + 'writethumbnail': True, + 'quiet': True, + 'no_warnings': True, + } + + with yt_dlp.YoutubeDL(ydl_opts) as ydl: + info = ydl.extract_info(url, download=True) + + # Find downloaded files + files = os.listdir(temp_dir) + media_file = None + thumbnail_file = None + + # Prioritize video files over images + video_file = None + image_file = None + + for file in files: + full_path = os.path.join(temp_dir, file) + if file.endswith('.mp4'): + video_file = full_path + elif file.endswith(('.jpg', '.jpeg', '.png')) and 'thumbnail' not in file.lower(): + image_file = full_path + elif 'thumbnail' in file.lower() or file.endswith('.webp'): + thumbnail_file = full_path + + # Prefer video over image + media_file = video_file or image_file + + if media_file and os.path.exists(media_file): + # Read file content into memory + with open(media_file, 'rb') as f: + content = f.read() + + thumbnail_content = None + if thumbnail_file and os.path.exists(thumbnail_file): + with open(thumbnail_file, 'rb') as f: + thumbnail_content = f.read() + + return { + 'type': 'video' if media_file.endswith('.mp4') else 'image', + 'content': content, + 'thumbnail': thumbnail_content, + 'filename': os.path.basename(media_file), + 'title': info.get('title', 'Social Media Content'), + 'description': info.get('description', ''), + 'uploader': info.get('uploader', ''), + 'width': info.get('width'), + 'height': info.get('height'), + 'duration': info.get('duration'), + } + + return None + + # Constitution II: Run blocking operation in thread pool (non-blocking) + loop = asyncio.get_event_loop() + return await loop.run_in_executor(None, _extract) + + async def extract_with_instaloader(self, url: str) -> Optional[dict]: + """ + Extract Instagram content using instaloader library (fallback). + + Only called for Instagram URLs when yt-dlp fails. + """ + if not HAS_INSTALOADER: + return None + + def _extract(): + with tempfile.TemporaryDirectory() as temp_dir: + loader = instaloader.Instaloader( + download_videos=True, + download_comments=False, + save_metadata=False, + download_geotags=False, + quiet=True, + ) + + # Extract shortcode from URL + shortcode_match = re.search(r'/(?:p|reel)/([^/]+)', url) + if not shortcode_match: + return None + + shortcode = shortcode_match.group(1) + + try: + post = instaloader.Post.from_shortcode(loader.context, shortcode) + loader.download_post(post, temp_dir) + + # Find downloaded files + files = os.listdir(temp_dir) + media_file = None + + for file in files: + if file.endswith(('.mp4', '.jpg', '.jpeg')): + media_file = os.path.join(temp_dir, file) + break + + if media_file and os.path.exists(media_file): + with open(media_file, 'rb') as f: + content = f.read() + + return { + 'type': 'video' if media_file.endswith('.mp4') else 'image', + 'content': content, + 'filename': os.path.basename(media_file), + 'title': f"Post by @{post.owner_username}", + 'description': post.caption or '', + 'uploader': post.owner_username, + } + except Exception as e: + self.log.error(f"Instaloader error: {e}") + return None + + return None + + # Run in thread pool + loop = asyncio.get_event_loop() + return await loop.run_in_executor(None, _extract) + + async def edit_status_message(self, room_id: str, event_id: str, new_content: str, + media_uri: Optional[str] = None) -> None: + """ + Edit a previously sent status message with updated content. + + Implements FR-011, FR-012: Progressive message editing + + Args: + room_id: Matrix room ID where message was sent + event_id: Event ID of message to edit + new_content: Updated message content (text, formatted with markdown) + media_uri: Optional mxc:// URI for inline media embedding + """ + try: + # Create edited message content + content = TextMessageEventContent( + msgtype=MessageType.TEXT, + body=new_content, + format=Format.HTML, + formatted_body=new_content.replace('\n', '
').replace('**', '').replace('**', ''), + ) + + # Set up the edit relationship + content["m.new_content"] = { + "msgtype": "m.text", + "body": new_content, + } + content["m.relates_to"] = { + "rel_type": "m.replace", + "event_id": event_id, + } + + # FR-012: Attempt inline media embedding if media_uri provided + # Note: This is a SHOULD - fallback handled in upload_media_and_edit_message + if media_uri: + self.log.debug(f"Attempting inline media embedding in edited message") + # Media embedding in edits depends on client/bridge support + # For now, we'll use separate media message as fallback + + await self.client.send_message_event(room_id, EventType.ROOM_MESSAGE, content) + + except Exception as e: + self.log.error(f"Failed to edit status message: {e}") + # Don't raise - graceful degradation + + async def upload_media_and_edit_message(self, room_id: str, event_id: str, + content_info: dict) -> None: + """ + Upload media to Matrix and edit status message to show result. + + Implements: + - FR-008: Upload extracted content to Matrix room + - FR-009: Preserve video metadata (dimensions, duration, thumbnail) + - FR-010: Preserve content metadata (title, caption, uploader) + - FR-012: SHOULD inline embed, MAY use separate message + - FR-016: Enforce file size and format limits + + Args: + room_id: Matrix room ID + event_id: Status message event ID to edit + content_info: Content dictionary from extraction + """ + try: + filename = content_info['filename'] + content_bytes = content_info['content'] + + # FR-016: Validate file size (50MB max by default) + max_file_size = self.config.get("max_file_size", 50000000) if self.config else 50000000 + if len(content_bytes) > max_file_size: + error_msg = f"❌ File too large: {len(content_bytes) / 1000000:.1f}MB (max: {max_file_size / 1000000}MB)" + await self.edit_status_message(room_id, event_id, error_msg) + self.log.warning(f"File size {len(content_bytes)} exceeds limit {max_file_size}") + return + + # FR-016: Validate file format + supported_formats = self.config.get("supported_formats", ["mp4", "jpg", "jpeg", "png", "webp"]) if self.config else ["mp4", "jpg", "jpeg", "png", "webp"] + file_ext = filename.rsplit('.', 1)[-1].lower() if '.' in filename else '' + + if file_ext not in supported_formats: + error_msg = f"❌ Unsupported format: .{file_ext} (supported: {', '.join(supported_formats)})" + await self.edit_status_message(room_id, event_id, error_msg) + self.log.warning(f"File format .{file_ext} not in supported formats {supported_formats}") + return + + # Determine MIME type from filename extension + if filename.endswith('.mp4'): + mime_type = 'video/mp4' + msgtype = MessageType.VIDEO + elif filename.endswith(('.jpg', '.jpeg')): + mime_type = 'image/jpeg' + msgtype = MessageType.IMAGE + elif filename.endswith('.png'): + mime_type = 'image/png' + msgtype = MessageType.IMAGE + elif filename.endswith('.webp'): + mime_type = 'image/webp' + msgtype = MessageType.IMAGE + else: + mime_type = 'application/octet-stream' + msgtype = MessageType.FILE + + # Upload binary content to Matrix homeserver (gets mxc:// URI) + media_uri = await self.client.upload_media( + content_info['content'], + mime_type=mime_type, + filename=filename, + ) + + # FR-012: Attempt inline embedding (SHOULD), fall back to separate message (MAY) + # Current implementation: Use separate media message due to bridge limitations + # Future enhancement: Try inline embedding first, fall back if unsupported + + # Create media message content + media_content = MediaMessageEventContent( + msgtype=msgtype, + body=filename, + url=media_uri, + ) + + # FR-009: Preserve video metadata (dimensions, duration, thumbnail) + if msgtype == MessageType.VIDEO: + media_content.info = { + 'mimetype': mime_type, + 'size': len(content_info['content']), + } + + if content_info.get('width'): + media_content.info['w'] = content_info['width'] + if content_info.get('height'): + media_content.info['h'] = content_info['height'] + if content_info.get('duration'): + # Convert to milliseconds + media_content.info['duration'] = int(content_info['duration'] * 1000) + + # Upload thumbnail if available + if content_info.get('thumbnail'): + thumbnail_uri = await self.client.upload_media( + content_info['thumbnail'], + mime_type='image/jpeg', + filename=f"thumb_{filename}", + ) + media_content.info['thumbnail_url'] = thumbnail_uri + + # Send the media message + await self.client.send_message(room_id, media_content) + + # FR-010: Format final message with platform emoji, title, caption, creator attribution + platform_emoji = content_info.get('platform_emoji', '📸') + title = content_info.get('title', 'Social Media Content') + description = content_info.get('description', '') + uploader = content_info.get('uploader', '') + + # Build caption text + caption_parts = [f"{platform_emoji} **{title}**"] + if description: + caption_parts.append(f"\n\n{description}") + if uploader: + caption_parts.append(f"\n\n👤 By: @{uploader}") + + caption = ''.join(caption_parts) + + # Edit status message to show success + await self.edit_status_message(room_id, event_id, caption) + + self.log.info(f"Successfully uploaded and sent {msgtype} content to {room_id}") + + except Exception as e: + self.log.exception("Error uploading content to Matrix") + error_msg = f"❌ Error uploading content: {str(e)}" + await self.edit_status_message(room_id, event_id, error_msg) diff --git a/modules/plugins/sna-instagram-bot-src/maubot.yaml b/modules/plugins/sna-instagram-bot-src/maubot.yaml new file mode 100644 index 0000000..bab33a6 --- /dev/null +++ b/modules/plugins/sna-instagram-bot-src/maubot.yaml @@ -0,0 +1,61 @@ +# Instagram Content Bot for Maubot +# Automatically detects Instagram URLs in messages and posts the content to Matrix + +# Target maubot version +maubot: 0.1.0 + +# The unique ID for the plugin +id: sna.instagram + +# A PEP 440 compliant version string +version: 1.0.0 + +# The SPDX license identifier for the plugin +license: MIT + +# The list of modules to load from the plugin archive +modules: +- instagram_bot + +# The main class of the plugin +main_class: SocialMediaBot + +# Whether or not instances need a database +database: false + +# Dependencies required for Instagram content extraction +dependencies: +- yt-dlp>=2023.1.6 +- instaloader>=4.9.0 +- aiohttp>=3.8.0 + +# Soft dependencies (optional but recommended) +soft_dependencies: +- pillow>=9.0.0 + +# Extra files to include in the plugin package +extra_files: +- README.md + +# Plugin metadata +meta: + display_name: "Instagram Content Bot" + description: "Automatically detects Instagram URLs and posts the content to Matrix rooms" + author: "Claude Code" + homepage: "https://github.com/maubot/maubot" + +# Plugin configuration +config: + enabled: true + max_file_size: 50000000 # 50MB max file size + supported_formats: + - mp4 + - jpg + - jpeg + - png + - webp + # Room access control (safety feature) + # List of Matrix room IDs where bot is allowed to operate + # Format: ["!roomid1:server.domain", "!roomid2:server.domain"] + # Empty list = bot disabled in all rooms (safety default) + allowed_rooms: [] \ No newline at end of file diff --git a/modules/plugins/sna-instagram-bot.mbp b/modules/plugins/sna-instagram-bot.mbp new file mode 100644 index 0000000..bfa4e81 Binary files /dev/null and b/modules/plugins/sna-instagram-bot.mbp differ diff --git a/secrets/secrets.yaml b/secrets/secrets.yaml index 0cfd242..bcf86ec 100644 --- a/secrets/secrets.yaml +++ b/secrets/secrets.yaml @@ -2,6 +2,8 @@ matrix-registration-token: ENC[AES256_GCM,data:H7BgtpsDLOYcywjOHru+u7t6BCbqhFrmP acme-email: ENC[AES256_GCM,data:+tN+nRfn2kpGLdF3Vg==,iv:uZvSw4viBWCTT35C718cLOCrSLM1EnkmEZH644aVuPI=,tag:tf6+7ubiOLVj7k4rfNI3lQ==,type:str] slack-oauth-token: "" slack-app-token: "" +maubot-admin-password: ENC[AES256_GCM,data:Omh6VFsnlLgS+UktM5qHjj3+VK84YmMgWcQCvkiMchfb621RV0LBg1ZB3tg=,iv:cINVFlHJJGkAcasK8BJr3Sd2zqkpQOyRgF+V0JhBJXE=,tag:PnS9TdtuR/87yQfttJTLow==,type:str] +maubot-secret-key: ENC[AES256_GCM,data:krq8zjZelAYRNrFs+DYqh7j0bDd80YKRkro88hGiAxJOBCuFV6PdyyUKgqdSuGMhoFhZtMPmRKOQvAxKclOBEQ==,iv:PePSXEOcBKcReXYBzicDhGQ/yxJIZ/TNzARg4z9G7dA=,tag:ihVw9PAXScoZgrSzWkAMdQ==,type:str] sops: age: - recipient: age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q @@ -22,7 +24,7 @@ sops: TzI2NGdaVHd1RFZWRE50bjZ0cHhBOXMKRXVYFMNxNIX+8uVxf1X4hu+OfOKKs2TK A2qdAMJIfdy9f7SPVrPnrGMIwl/prxIkbSRwYC/UNK5NNkjMrGoSwg== -----END AGE ENCRYPTED FILE----- - lastmodified: "2025-10-02T21:33:16Z" - mac: ENC[AES256_GCM,data:B/9XWKEYWv00+xfcnsrqqRvM7mf/1/VMxeaW9V0HoD32Wv8EvjUIOptU4VV/iDHb1zGCzd41XVOulowlKfXbcuDbA2Pi8cVT38F9ZuxSyCjpssDnPYj816SvXNp5gwCHxfvIp32ekrQ7PNQLZVWhHzL/H1doalXv9XHO1xUY6X8=,iv:NKjxEOG0SlJQurfb9f2GRYUFDlNk0mjxpci87r0vmX8=,tag:sGrhVfwq18QI6MS7L5x31w==,type:str] + lastmodified: "2025-10-27T04:21:51Z" + mac: ENC[AES256_GCM,data:k1aBVnSUnpgq1y+AQjZFB7AXmQe2r/SpSVl9xVsJku2/lehBfY6vRGZutRHV4iTaB3FmxwgGCOV29gPZ5NGUQDf9tg5hMacZOREJGd7lMWoSlZbCGjjkOQEvpKLq3kJNuV66Lb1LzKQtR6ws5k/EmnXneyDtjuEbFs4AZZi+WRE=,iv:zc58CMvJqPsKbANOCGLBuo+AiUnoF4Wx3Z33j6a+sfI=,tag:ENek+3uial24ladKBqW3sg==,type:str] unencrypted_suffix: _unencrypted version: 3.10.2 diff --git a/specs/003-maubot-integration/plan.md b/specs/003-maubot-integration/plan.md new file mode 100644 index 0000000..85cb31f --- /dev/null +++ b/specs/003-maubot-integration/plan.md @@ -0,0 +1,360 @@ +# Implementation Plan: Maubot Integration + +**Branch**: `003-maubot-integration` | **Date**: 2025-10-26 | **Spec**: [spec.md](./spec.md) +**Input**: Feature specification from `/specs/003-maubot-integration/spec.md` + +## Summary + +Extract maubot bot framework from ops-base and deploy to ops-jrz1 with Instagram bot plugin. Primary approach: adapt proven ops-base maubot.nix module to ops-jrz1 patterns (conduwuit homeserver, sops-nix secrets, dev-platform wrapper), using registration token auth instead of shared secret. Instagram content fetching via yt-dlp (community scraping). Deployment validates single-instance initially, architecture supports 3+ concurrent instances. + +## Technical Context + +**Language/Version**: Python 3.11 (maubot runtime environment) +**Primary Dependencies**: maubot 0.5.2+, yt-dlp >=2023.1.6, aiohttp, SQLite, sops-nix +**Storage**: SQLite `/var/lib/maubot/bot.db` (service state), per-bot databases (plugin-specific) +**Testing**: Manual QA on production VPS (no staging environment), 7-day validation period +**Target Platform**: NixOS 24.05+ on ops-jrz1 VPS (45.77.205.49, x86_64-linux) +**Project Type**: Infrastructure service (NixOS module) +**Performance Goals**: <5 second Instagram content fetch (SC-001), 99% uptime over 7 days (SC-003), <2 second management UI load (SC-007) +**Constraints**: Localhost-only management interface (SSH tunnel required), single Instagram bot instance initially, conduwuit registration token auth (no shared secret) +**Scale/Scope**: 1 Instagram bot instance MVP, architecture validated for 3 concurrent instances (SC-002), small team usage (<20 Instagram fetches/day) + +## Constitution Check + +*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* + +### Principle I: Declarative Infrastructure ✅ PASS + +**Compliance**: +- All maubot configuration defined in NixOS modules (maubot.nix, dev-services.nix) +- No imperative modifications required (service managed via nixos-rebuild) +- Configuration changes deployed declaratively +- Rollback via NixOS generations + +**Evidence**: +- Module adaptation documented in research.md (ops-base → ops-jrz1 pattern) +- Secrets via sops-nix (declarative encryption) +- Runtime config generated from NixOS module options + +### Principle II: Security First ✅ PASS + +**Compliance**: +- All secrets encrypted via sops-nix (maubot-admin-password, maubot-secret-key, registration-token) +- Runtime secrets in /run/secrets/ (tmpfs, ephemeral) +- No secrets in Nix store or configuration files (LoadCredential pattern) +- Management interface localhost-only (SSH tunnel required per FR-003) + +**Evidence**: +- Secrets management pattern documented in data-model.md +- File permissions: 0400 for secrets, 0600 for config with credentials +- Pre-commit hooks scan for secret leaks (inherited from platform) + +### Principle III: Presentable State Over Speed ✅ PASS + +**Compliance**: +- Comprehensive specification (spec.md with 16 functional requirements, 4 user stories) +- Complete documentation suite (research.md, data-model.md, quickstart.md) +- 7-day validation period required before announcement (per constitution) +- Success criteria measurable and testable (SC-001 through SC-008) + +**Evidence**: +- Spec clarification session resolved all ambiguities (5 questions answered) +- Quickstart.md provides deployment runbook with troubleshooting +- Testing checklist in quickstart.md validates all success criteria + +### Principle IV: Quality Over Quick Wins ✅ PASS + +**Compliance**: +- Extracted proven pattern from ops-base (391-line maubot.nix module in production) +- Research phase documented alternatives (yt-dlp vs instaloader, SQLite vs PostgreSQL) +- Follows established ops-jrz1 patterns (mautrix-slack module structure, sops-nix secrets) +- Spec-kit workflow followed (specify → clarify → plan → tasks → implement) + +**Evidence**: +- Research.md documents 3 major technical decisions with rationale +- Module adaptation strategy preserves ops-base proven components +- Constitution check validates pattern consistency + +**Gate Status**: ✅ ALL CHECKS PASS - Proceed to implementation + +## Project Structure + +### Documentation (this feature) + +```text +specs/003-maubot-integration/ +├── spec.md # Feature specification (✅ complete) +├── plan.md # This file (✅ complete) +├── research.md # Phase 0 output (✅ complete) +├── data-model.md # Phase 1 output (✅ complete) +├── quickstart.md # Phase 1 output (✅ complete) +├── checklists/ +│ └── requirements.md # Quality validation (✅ complete) +└── tasks.md # Phase 2 output (/speckit.tasks - pending) +``` + +### Source Code (repository root) + +**Structure Decision**: Infrastructure service (NixOS module) - no application source code + +```text +/home/dan/proj/ops-jrz1/ +├── modules/ +│ ├── maubot.nix # Low-level maubot service module (to create) +│ ├── dev-services.nix # High-level wrapper (to update) +│ ├── mautrix-slack.nix # Reference pattern (existing) +│ └── matrix-continuwuity.nix # Matrix homeserver (existing) +├── hosts/ +│ └── ops-jrz1.nix # VPS configuration (to update: enable maubot) +├── secrets/ +│ └── secrets.yaml # Encrypted secrets (to update: add maubot secrets) +├── specs/ +│ └── 003-maubot-integration/ # This feature directory +└── docs/ + ├── platform-vision.md # North star document (reference) + ├── CLAUDE.md # Development guidelines (to update) + └── worklogs/ # Session logs (to create after deployment) +``` + +**External source files** (to copy/adapt): +```text +/home/dan/proj/ops-base/ +└── vm-configs/modules/ + └── maubot.nix # Source module (391 lines, proven in production) + +/home/dan/proj/sna/ +├── instagram_bot.py # Instagram bot source (11,643 bytes) +└── sna-instagram-bot.mbp # Packaged plugin (ready to upload) +``` + +**Runtime state** (on VPS after deployment): +```text +/var/lib/maubot/ +├── config/ +│ └── config.yaml # Generated runtime config +├── plugins/ +│ └── sna.instagram-v1.0.0.mbp # Uploaded plugin +├── bot.db # SQLite database (service state) +└── trash/ # Deleted plugins + +/run/secrets/ # sops-nix decrypted secrets (tmpfs) +├── maubot-admin-password +├── maubot-secret-key +└── matrix-registration-token +``` + +## Deployment Strategy + +**Context**: ops-jrz1 is a live production server with critical services (Matrix homeserver, Slack bridge, PostgreSQL, Forgejo, nginx). Deployment must be incremental with validation checkpoints. + +### Live Server Risk Assessment + +**Critical Services** (must remain operational): +- conduwuit Matrix homeserver (8008) - All Matrix functionality +- mautrix-slack (29319) - ~50 Slack channels syncing bidirectionally +- PostgreSQL (5432) - Bridge database (172KB, critical state) +- Forgejo (git.clarun.xyz) - Code hosting +- nginx (443) - TLS termination for all public services + +**New Service** (isolated): +- maubot (29316, localhost-only) - New SQLite database, different port, no appservice registration + +### Incremental Deployment Approach + +Deploy in 4 phases with git commits as rollback points: + +**Phase 1: Module Files (No-Op Deployment)** +- Add modules/maubot.nix (adapted from ops-base) +- Add services.dev-platform.maubot wrapper to modules/dev-services.nix (options + config) +- **Do NOT enable**: services.dev-platform.maubot.enable remains unset +- Deploy → Verify no services changed → Git commit +- **Rollback**: nixos-rebuild switch --rollback OR git revert + +**Phase 2: Secrets (Preparation)** +- Add maubot-admin-password, maubot-secret-key to secrets/secrets.yaml +- Add sops.secrets declarations to hosts/ops-jrz1.nix +- **Still disabled**: services.dev-platform.maubot.enable remains unset +- Deploy → Verify secrets decrypt to /run/secrets/ → Git commit +- **Rollback**: nixos-rebuild switch --rollback OR git revert + +**Phase 3: Service Start (Module Only)** +- Enable in hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = true +- Deploy → Verify maubot.service starts → Verify existing services healthy → Git commit +- **Rollback**: Set enable = false + redeploy OR nixos-rebuild switch --rollback + +**Phase 4: Bot Deployment (Manual, Reversible)** +- SSH tunnel to management UI (localhost:29316) +- Create bot Matrix user via registration token +- Upload Instagram plugin (.mbp file) +- Create bot instance (test in private room first) +- **Rollback**: Delete bot instance via web UI (no code changes to revert) + +### Validation Checkpoints + +After each phase deployment: +```bash +# 1. Verify existing services still healthy +ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx' + +# 2. Check for errors in last 5 minutes (excluding maubot) +ssh root@45.77.205.49 'journalctl --since "5 minutes ago" | grep -E "ERR|CRIT|FTL" | grep -v maubot' + +# 3. Test Slack bridge (post in Slack, verify appears in Matrix) + +# Phase-specific validations documented in tasks.md +``` + +### Rollback Procedures + +**NixOS Generation Rollback** (fastest): +```bash +ssh root@45.77.205.49 'nixos-rebuild switch --rollback' +ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack' +``` + +**Git Revert** (if committed): +```bash +git revert HEAD +nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost +``` + +**Service Disable** (Phase 3 specific): +```nix +# In hosts/ops-jrz1.nix +services.dev-platform.maubot.enable = false; # Then redeploy +``` + +### Risk Mitigation + +**Known risks from mautrix-slack deployment** (2025-10-26): +1. IPv4 vs localhost: Always use 127.0.0.1 (not localhost) in homeserverUrl +2. Conduwuit database corruption: Have database wipe procedure ready (low risk - fresh maubot install) +3. Port conflicts: Maubot uses 29316 (unique, no conflicts expected) + +**Blast radius containment**: +- Phase 1 fail → Nix syntax errors only, no runtime impact +- Phase 2 fail → Secrets issue, no services affected +- Phase 3 fail → Maubot won't start, but Matrix/Slack/Forgejo unaffected (different ports, databases) +- Phase 4 fail → Bot instance only, delete via UI + +### Success Criteria Per Phase + +- **Phase 1**: Build succeeds, nixos-rebuild reports "no services changed" +- **Phase 2**: /run/secrets/maubot-* files exist with mode 0400, existing services healthy +- **Phase 3**: systemctl status maubot.service shows "active (running)", management UI accessible via SSH tunnel +- **Phase 4**: Bot responds to Instagram URL in <5 seconds (SC-001) + +### Update/Upgrade Procedure (State-Preserving) + +After initial deployment, future updates must preserve runtime state in `/var/lib/maubot/`: +- `bot.db` - Service state (bot instances, plugin configurations) +- `plugins/` - Uploaded .mbp files +- `config/config.yaml` - Generated runtime config + +**Typical update scenarios**: + +**Scenario 1: Module Configuration Change** (e.g., change port, add new option) +```bash +# 1. Edit modules/dev-services.nix or hosts/ops-jrz1.nix +# 2. Deploy +nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost + +# 3. Verify service restarted cleanly +ssh root@45.77.205.49 'systemctl status maubot.service' +ssh root@45.77.205.49 'journalctl -u maubot.service -n 50' + +# 4. Verify bot instances still running (check management UI) +# StateDirectory persists across service restarts +``` + +**Scenario 2: Maubot Version Upgrade** (nixpkgs update) +```bash +# 1. Update flake.lock or nixpkgs input +nix flake update + +# 2. Review maubot changelog for breaking changes +# Check: https://github.com/maubot/maubot/releases + +# 3. Deploy with build test first +nixos-rebuild build --flake .#ops-jrz1 + +# 4. If build succeeds, deploy +nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost + +# 5. Monitor service restart +ssh root@45.77.205.49 'journalctl -u maubot.service -f' + +# 6. Verify bot instances reconnected (check Matrix room for bot presence) +``` + +**Scenario 3: Plugin Update** (new Instagram bot version) +```bash +# Manual via web UI: +# 1. Upload new .mbp file (Plugins tab → Upload) +# 2. Maubot detects version change +# 3. Restart affected bot instances (Instances tab → Stop → Start) +# 4. Test in private room before production use + +# No nixos-rebuild needed - plugin is runtime state +``` + +**Scenario 4: Add New Bot Instance** (e.g., second Instagram bot or new bot type) +```bash +# Manual via web UI: +# 1. Create bot Matrix user (via registration token) +# 2. Upload plugin if new type (Plugins tab) +# 3. Create bot instance (Instances tab → Add instance) +# 4. Configure and enable + +# No nixos-rebuild needed - bot instances are runtime state +``` + +**State Preservation Guarantees**: +- NixOS StateDirectory (`/var/lib/maubot/`) persists across: + - Service restarts (systemctl restart maubot.service) + - System reboots + - Module configuration changes + - Maubot version upgrades (unless database schema incompatible) +- StateDirectory only wiped if: + - Explicitly deleted manually + - Service definition changes StateDirectory path + - Major maubot version with incompatible schema (rare, documented in release notes) + +**Rollback with State**: +```bash +# NixOS generation rollback preserves StateDirectory +ssh root@45.77.205.49 'nixos-rebuild switch --rollback' + +# Bot instances resume with previous configuration +# Database and plugins unchanged +``` + +**When to wipe database** (rare, destructive): +```bash +# Only if: +# 1. Database corruption detected +# 2. Major version migration requires clean slate (check release notes) +# 3. Testing fresh deployment + +# Backup first: +ssh root@45.77.205.49 'tar czf /root/maubot-backup-$(date +%Y%m%d).tar.gz /var/lib/maubot/' + +# Wipe: +ssh root@45.77.205.49 'systemctl stop maubot.service' +ssh root@45.77.205.49 'rm -rf /var/lib/maubot/bot.db' +ssh root@45.77.205.49 'systemctl start maubot.service' + +# Reconfigure all bot instances via web UI +``` + +## Complexity Tracking + +**No violations** - All constitution principles satisfied. + +This feature follows established patterns: +- Declarative infrastructure (NixOS modules) +- Security first (sops-nix encrypted secrets) +- Presentable state (comprehensive spec, 7-day validation) +- Quality over speed (extract proven ops-base module, document alternatives) + +**No simpler alternatives rejected** - Chosen approach is the simplest that meets requirements while maintaining quality standards. diff --git a/specs/003-maubot-integration/quickstart.md b/specs/003-maubot-integration/quickstart.md new file mode 100644 index 0000000..556e299 --- /dev/null +++ b/specs/003-maubot-integration/quickstart.md @@ -0,0 +1,667 @@ +# Quickstart: Maubot Integration Deployment + +**Feature**: 003-maubot-integration +**Target**: ops-jrz1 VPS (45.77.205.49) +**Estimated time**: 2-3 hours + +## Prerequisites + +- [x] ops-jrz1 VPS operational with conduwuit Matrix homeserver +- [x] SSH access to VPS as root +- [x] sops-nix configured with server SSH host key +- [x] Local machine with Nix/NixOS +- [ ] Instagram bot .mbp file available (`/home/dan/proj/sna/sna-instagram-bot.mbp`) + +--- + +## Phase 0: Secrets Preparation + +### 1. Generate Maubot Secrets + +```bash +# Generate admin password (32 characters) +MAUBOT_ADMIN_PW=$(openssl rand -base64 32) + +# Generate secret key (48 bytes base64-encoded) +MAUBOT_SECRET=$(openssl rand -base64 48) + +echo "Admin Password: $MAUBOT_ADMIN_PW" +echo "Secret Key: $MAUBOT_SECRET" +``` + +### 2. Add Secrets to sops-nix + +```bash +cd /home/dan/proj/ops-jrz1 + +# Edit encrypted secrets +sops secrets/secrets.yaml +``` + +Add these entries: +```yaml +maubot-admin-password: "" +maubot-secret-key: "" +# matrix-registration-token already exists - reuse for bot creation +``` + +### 3. Declare Secrets in NixOS Config + +Edit `hosts/ops-jrz1.nix`: +```nix +sops.secrets.maubot-admin-password = { mode = "0400"; }; +sops.secrets.maubot-secret-key = { mode = "0400"; }; +``` + +--- + +## Phase 1: Module Extraction and Adaptation + +### 1. Extract maubot.nix from ops-base + +```bash +cd /home/dan/proj/ops-jrz1 + +# Copy module from ops-base +cp /home/dan/proj/ops-base/vm-configs/modules/maubot.nix \ + modules/maubot.nix +``` + +### 2. Adapt Module Namespace + +Edit `modules/maubot.nix`: + +**Change module namespace**: +```nix +# From: +options.services.matrix-vm.maubot = { ... }; + +# To: +options.services.maubot = { ... }; +``` + +**Update homeserver URL**: +```nix +# From: +homeserverUrl = mkOption { + default = "http://127.0.0.1:6167"; # ops-base continuwuity port +}; + +# To: +homeserverUrl = mkOption { + default = "http://127.0.0.1:8008"; # ops-jrz1 conduwuit port +}; +``` + +**Remove registration_secrets** (conduwuit doesn't support this): +```nix +# REMOVE this section from config generation (around line 140-150): +# registration_secrets: +# ${cfg.serverName}: +# url: ${cfg.homeserverUrl} +# secret: REPLACE_REGISTRATION_SECRET +``` + +**Update StateDirectory** (move from /run to /var/lib): +```nix +# Change config path from: +/run/maubot/config.yaml + +# To: +/var/lib/maubot/config/config.yaml +``` + +### 3. Add dev-platform Wrapper + +Edit `modules/dev-services.nix`: + +Add options section: +```nix +options.services.dev-platform.maubot = { + enable = mkEnableOption "maubot bot framework"; + + port = mkOption { + type = types.port; + default = 29316; + description = "Management interface port"; + }; +}; +``` + +Add config section: +```nix +config = mkIf cfg.maubot.enable { + services.maubot = { + enable = true; + homeserverUrl = "http://127.0.0.1:${toString cfg.matrix.port}"; + serverName = cfg.matrix.serverName; + port = cfg.maubot.port; + + adminPasswordFile = config.sops.secrets.maubot-admin-password.path; + secretKeyFile = config.sops.secrets.maubot-secret-key.path; + }; +}; +``` + +--- + +## Phase 2: Incremental Deployment (Live Server) + +⚠️ **IMPORTANT**: ops-jrz1 is a live production server with critical services: +- conduwuit Matrix homeserver - All Matrix functionality +- mautrix-slack bridge - ~50 Slack channels syncing +- PostgreSQL, Forgejo, nginx - Core infrastructure + +Deploy incrementally with validation checkpoints. Each phase creates a git commit as a rollback point. + +--- + +### Phase 2.1: Module Files Only (No-Op Deployment) + +**Goal**: Add maubot module without starting any services + +**Steps**: + +1. Verify services.dev-platform.maubot.enable is NOT set in `hosts/ops-jrz1.nix` + +2. Deploy: +```bash +cd /home/dan/proj/ops-jrz1 +nixos-rebuild switch --flake .#ops-jrz1 \ + --target-host root@45.77.205.49 \ + --build-host localhost +``` + +**Validation**: +```bash +# Should report "no services changed" or only unrelated restarts +ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack' +# Expected: Both active (running), no recent restarts +``` + +**Git checkpoint**: +```bash +git add modules/maubot.nix modules/dev-services.nix +git commit -m "Add maubot module files (service disabled)" +``` + +**Rollback if needed**: +```bash +ssh root@45.77.205.49 'nixos-rebuild switch --rollback' +``` + +--- + +### Phase 2.2: Secrets Preparation + +**Goal**: Add secrets without starting service + +**Steps**: + +1. Verify services.dev-platform.maubot.enable is still NOT set + +2. Deploy (secrets added in Phase 0 and Phase 1 config): +```bash +nixos-rebuild switch --flake .#ops-jrz1 \ + --target-host root@45.77.205.49 \ + --build-host localhost +``` + +**Validation**: +```bash +# Verify secrets decrypted +ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*' +# Expected: +# -r-------- 1 root root ... /run/secrets/maubot-admin-password +# -r-------- 1 root root ... /run/secrets/maubot-secret-key + +# Verify existing services healthy +ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx' +``` + +**Git checkpoint**: +```bash +git add hosts/ops-jrz1.nix secrets/secrets.yaml +git commit -m "Add maubot secrets (service not enabled)" +``` + +--- + +### Phase 2.3: Enable Maubot Service + +**Goal**: Start maubot service, verify isolation from existing services + +**Steps**: + +1. Enable in `hosts/ops-jrz1.nix`: +```nix +services.dev-platform.maubot = { + enable = true; + port = 29316; +}; +``` + +2. Deploy: +```bash +nixos-rebuild switch --flake .#ops-jrz1 \ + --target-host root@45.77.205.49 \ + --build-host localhost +``` + +**Validation**: +```bash +# 1. Verify maubot service started +ssh root@45.77.205.49 'systemctl status maubot.service' +# Expected: active (running) + +# 2. Check logs for errors +ssh root@45.77.205.49 'journalctl -u maubot.service -n 50' +# Look for: "Starting maubot on port 29316", "Connected to homeserver" +# No ERROR or CRITICAL messages + +# 3. Verify existing services still healthy +ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx' + +# 4. Test Slack bridge (critical validation) +# Post message in Slack → verify appears in Matrix within 5 seconds + +# 5. Test management UI access +ssh -L 29316:localhost:29316 root@45.77.205.49 +# In browser: http://localhost:29316/_matrix/maubot +# Should load login page +``` + +**Git checkpoint**: +```bash +git add hosts/ops-jrz1.nix +git commit -m "Enable maubot service (no bots deployed yet)" +``` + +**Rollback if needed**: +```bash +# Option 1: NixOS generation rollback (fastest) +ssh root@45.77.205.49 'nixos-rebuild switch --rollback' + +# Option 2: Disable service (if you want to keep other changes) +# Edit hosts/ops-jrz1.nix: services.dev-platform.maubot.enable = false +# Then redeploy +``` + +--- + +### Rollback Procedures + +**If ANY deployment phase fails or breaks existing services**: + +1. **Immediate rollback** (restores last working state): +```bash +ssh root@45.77.205.49 'nixos-rebuild switch --rollback' +``` + +2. **Verify services restored**: +```bash +ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack' +# Test Slack bridge: post message, verify in Matrix +``` + +3. **Investigate issue** before retrying: +```bash +# Check what changed +ssh root@45.77.205.49 'journalctl --since "10 minutes ago" | grep -E "ERR|CRIT|FTL"' + +# Review deployment logs +ssh root@45.77.205.49 'journalctl -u nixos-rebuild -n 100' +``` + +**Git-based rollback** (if committed but want to revert): +```bash +git log --oneline -5 # Find commit to revert +git revert +nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost +``` + +--- + +### Phase 2.4: Deployment Success Criteria + +Before proceeding to bot configuration, verify: +- [ ] maubot.service is active (running) +- [ ] Management UI loads at http://localhost:29316/_matrix/maubot (via SSH tunnel) +- [ ] No errors in maubot service logs +- [ ] All existing services healthy (Matrix, Slack bridge, Forgejo, PostgreSQL, nginx) +- [ ] Slack bridge functional (test message flow Slack ↔ Matrix) +- [ ] Phase 2.3 git commit created + +If all criteria pass, proceed to Phase 3 (Bot Registration). Otherwise, rollback and investigate. + +--- + +## Phase 3: Bot Registration and Configuration + +### 1. Access Management Interface + +```bash +# Create SSH tunnel +ssh -L 29316:localhost:29316 root@45.77.205.49 + +# In browser: +# Navigate to: http://localhost:29316/_matrix/maubot +``` + +### 2. Login to Maubot + +- Username: `admin` +- Password: `` + +### 3. Create Bot Matrix User + +**Option A: Registration Token** (recommended): + +1. Configure conduwuit registration token (if not already set) +2. In Maubot UI: Clients → Add client +3. Enter Matrix user ID: `@instagram-bot:clarun.xyz` +4. Select "Register" and provide registration token +5. Bot user created automatically + +**Option B: Admin Room Commands**: + +1. Access Matrix homeserver admin room +2. Run: `!admin users create-user instagram-bot` +3. Copy generated password +4. In Maubot UI: Create client with username/password + +### 4. Upload Instagram Plugin + +```bash +# Copy plugin to VPS +scp /home/dan/proj/sna/sna-instagram-bot.mbp \ + root@45.77.205.49:/tmp/ + +# Or upload via web UI: +# - Plugins tab → Upload +# - Select sna-instagram-bot.mbp +``` + +### 5. Create Bot Instance + +In Maubot UI: +1. Instances tab → Add instance +2. **ID**: `instagram-bot-1` +3. **Type**: `sna.instagram` +4. **Primary user**: Select `@instagram-bot:clarun.xyz` +5. **Enabled**: ✓ +6. **Config**: +```json +{ + "enabled": true, + "max_file_size": 50000000, + "room_subscriptions": [] +} +``` +7. Save + +### 6. Configure Room Subscriptions + +**Get Matrix room ID**: +```bash +# In Element or Matrix client: +# Room Settings → Advanced → Internal Room ID +# Example: !abc123def:clarun.xyz +``` + +**Add to bot config** (per FR-010): + +Edit bot instance config in Maubot UI: +```json +{ + "enabled": true, + "max_file_size": 50000000, + "room_subscriptions": [ + "!abc123def:clarun.xyz" + ] +} +``` + +**Restart bot instance**: Stop → Start in Maubot UI + +--- + +## Phase 4: Testing + +### 1. Invite Bot to Test Room + +In Matrix client: +``` +/invite @instagram-bot:clarun.xyz +``` + +### 2. Test Instagram URL Fetching + +Post in the room: +``` +https://www.instagram.com/p/EXAMPLE123/ +``` + +**Expected behavior**: +- Bot responds within 5 seconds (SC-001) +- Image/video appears in room +- Caption and metadata posted as text message + +### 3. Test Room Subscription Enforcement + +Post Instagram URL in a room NOT in `room_subscriptions`: + +**Expected behavior**: +- Bot ignores URL (no response) + +### 4. Monitor Logs + +```bash +ssh root@45.77.205.49 'journalctl -u maubot.service -f --since "5 minutes ago"' + +# Check for: +# - Instagram URL detection +# - yt-dlp extraction +# - Matrix upload +# - Any ERROR/CRITICAL logs +``` + +--- + +## Phase 5: Health Monitoring + +### 1. Verify Health Check Timer + +```bash +ssh root@45.77.205.49 'systemctl list-timers | grep maubot' + +# Expected: +# maubot-health.timer (runs every 5 minutes) +# maubot-health-restart.timer (runs every 10 minutes) +``` + +### 2. Manual Health Check + +```bash +ssh root@45.77.205.49 'curl -s http://localhost:29316/_matrix/maubot/v1/version | jq .' + +# Expected output: +# { +# "version": "0.5.2", +# "server": "maubot" +# } +``` + +### 3. Check Bot Instance Status + +In Maubot UI: +- Instances tab +- Verify `instagram-bot-1` shows green "Running" status +- Check "Last Sync" timestamp (should be <10 minutes) + +--- + +## Troubleshooting + +### Bot Not Responding to Instagram URLs + +**Check**: +1. Room ID is in `room_subscriptions` config +2. Bot has joined the room (`/invite @instagram-bot:clarun.xyz`) +3. URL is public Instagram post (not private/story) +4. Logs show URL detection: `journalctl -u maubot.service | grep -i instagram` + +**Fix**: +- Update room_subscriptions config +- Restart bot instance in Maubot UI + +### Service Won't Start + +**Check**: +```bash +ssh root@45.77.205.49 'journalctl -u maubot.service -n 50' +``` + +**Common issues**: +- Port 29316 already in use → Check `ss -tlnp | grep 29316` +- Database permissions → Check `/var/lib/maubot/` ownership +- Secrets not decrypted → Check `/run/secrets/maubot-*` exists + +### Bot Can't Connect to Matrix + +**Check**: +1. conduwuit is running: `systemctl status matrix-continuwuity` +2. Homeserver URL is correct: `http://127.0.0.1:8008` (IPv4) +3. Bot Matrix user exists and has valid access token + +**Fix**: +- Recreate bot client in Maubot UI +- Check Matrix homeserver logs: `journalctl -u matrix-continuwuity | grep instagram` + +### Instagram Content Fetch Fails + +**Check logs**: +```bash +ssh root@45.77.205.49 'journalctl -u maubot.service | grep -A 10 "yt-dlp"' +``` + +**Common issues**: +- Instagram rate limiting (429 error) → Wait 30 minutes, reduce request frequency +- Private post → Can't fetch (expected behavior) +- yt-dlp outdated → Update nixpkgs, redeploy + +--- + +## Rollback Procedure + +If deployment fails: + +```bash +# List NixOS generations +ssh root@45.77.205.49 'nixos-rebuild list-generations' + +# Rollback to previous generation +ssh root@45.77.205.49 'nixos-rebuild switch --rollback' + +# Verify services restored +ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack' +``` + +--- + +## Success Criteria Validation + +Verify all success criteria before marking feature complete: + +- [ ] **SC-001**: Instagram bot responds within 5 seconds +- [ ] **SC-002**: System supports 3 concurrent bot instances (test by creating 2 more instances) +- [ ] **SC-003**: Service maintains 99% uptime over 7 days +- [ ] **SC-004**: Auto-recovery within 2 minutes after restart +- [ ] **SC-005**: New bot deployment completes in <10 minutes +- [ ] **SC-006**: 95% success rate for public Instagram URLs +- [ ] **SC-007**: Management interface loads in <2 seconds +- [ ] **SC-008**: Server reboot without data loss (test with `reboot`) + +**Testing period**: 7 days operational before merging to main (per constitution Principle III) + +--- + +## Post-Deployment + +### 1. Update Documentation + +```bash +# Update CLAUDE.md with maubot commands +# Example section to add: + +### Maubot Management +- Management UI: http://localhost:29316/_matrix/maubot (via SSH tunnel) +- Bot registration: Use conduwuit registration token +- Room subscriptions: Edit config JSON, restart instance +- Logs: journalctl -u maubot.service -f +``` + +### 2. Commit and Tag + +```bash +git add modules/maubot.nix modules/dev-services.nix hosts/ops-jrz1.nix +git commit -m "Add maubot bot framework with Instagram bot + +- Extract and adapt maubot.nix from ops-base +- Configure for conduwuit (registration token auth) +- Deploy Instagram bot with room-based activation +- Add health monitoring timers + +Implements feature 003-maubot-integration +" + +git tag -a v0.3.0 -m "Release v0.3.0: Maubot Integration + +Features: +- Maubot bot framework service +- Instagram content fetcher bot +- Room-based bot activation +- Management web interface (localhost only) +- Health monitoring and auto-recovery + +Success criteria validated (SC-001 through SC-008) +Constitution compliance verified +" + +git push origin main --tags +``` + +### 3. Create Worklog + +Document the deployment session: +```bash +# Create worklog +docs/worklogs/2025-10-26-maubot-deployment.org +``` + +--- + +## Reference Files + +**Module locations**: +- `/home/dan/proj/ops-jrz1/modules/maubot.nix` (service module) +- `/home/dan/proj/ops-jrz1/modules/dev-services.nix` (high-level wrapper) + +**Secrets**: +- `/home/dan/proj/ops-jrz1/secrets/secrets.yaml` (encrypted) +- `/run/secrets/maubot-*` (runtime, on VPS) + +**Runtime state** (on VPS): +- `/var/lib/maubot/bot.db` (SQLite database) +- `/var/lib/maubot/config/config.yaml` (generated config) +- `/var/lib/maubot/plugins/` (uploaded .mbp files) + +**Source reference**: +- ops-base module: `/home/dan/proj/ops-base/vm-configs/modules/maubot.nix` +- Instagram plugin: `/home/dan/proj/sna/sna-instagram-bot.mbp` +- ops-base docs: `/home/dan/proj/ops-base/docs/maubot-*.md` + +--- + +**Deployment time estimate**: 2-3 hours (including testing and validation) +**Status**: Ready for Phase 2 (implementation) diff --git a/specs/003-maubot-integration/spec.md b/specs/003-maubot-integration/spec.md new file mode 100644 index 0000000..08808d7 --- /dev/null +++ b/specs/003-maubot-integration/spec.md @@ -0,0 +1,287 @@ +# Feature Specification: Matrix Bot Framework (Maubot) Integration + +**Feature Branch**: `003-maubot-integration` +**Created**: 2025-10-26 +**Status**: Draft +**Input**: User description: "Begin maubot feature spec. instagram bot is one of our goals." + +## Clarifications + +### Session 2025-10-26 + +- Q: Instagram bot activation behavior - should it respond to all Instagram URLs, only when mentioned, or in designated rooms? → A: Bot responds to Instagram URLs only in designated bot-enabled rooms +- Q: Bot error notification method - how should errors be communicated to administrators? → A: Error notification behavior based on severity levels (DEBUG/INFO logs only, WARN logs + dashboard visibility, ERROR/CRITICAL logs + dashboard + Matrix admin room notifications) +- Q: Room enablement mechanism - how do administrators enable bot in specific rooms? → A: Edit bot configuration file with room IDs, restart bot instance +- Q: Admin notification room configuration - should each bot have dedicated admin room, shared room, or reuse homeserver admin room? → A: Reuse Matrix homeserver admin room for bot ERROR/CRITICAL notifications +- Q: Management interface authentication - single shared account, multi-user, or Matrix homeserver auth? → A: Single shared admin account (username/password configured in sops-nix secrets) + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 - Instagram Content Sharing to Matrix (Priority: P1) + +A team member shares an Instagram post URL in a Matrix room, and the bot automatically fetches and displays the content (image, caption, metadata) directly in the chat, allowing team members to view and discuss Instagram content without leaving Matrix. + +**Why this priority**: This is the core value proposition - bringing Instagram content into team communication. Demonstrates immediate utility of the bot framework and validates the integration works correctly. + +**Independent Test**: Can be fully tested by posting an Instagram URL in a Matrix room and verifying the bot responds with content preview, delivering immediate value as an Instagram content viewer. + +**Acceptance Scenarios**: + +1. **Given** Instagram bot is enabled in a specific Matrix room, **When** user posts "https://instagram.com/p/ABC123/" in that room, **Then** bot responds within 5 seconds with image, caption, and post metadata (likes, comments count) +2. **Given** Instagram bot is NOT enabled in a Matrix room, **When** user posts Instagram URL in that room, **Then** bot ignores the URL and does not respond +3. **Given** bot receives Instagram URL in enabled room, **When** content is a video, **Then** bot provides video thumbnail, caption, and download link +4. **Given** bot receives Instagram URL in enabled room, **When** content is a carousel (multiple images), **Then** bot displays all images in sequence with navigation +5. **Given** bot receives Instagram profile URL in enabled room, **When** URL is "https://instagram.com/username", **Then** bot displays profile info (bio, follower count, recent posts preview) +6. **Given** bot encounters rate limiting in enabled room, **When** too many requests in short period, **Then** bot queues request and notifies user of delay + +--- + +### User Story 2 - Bot Management Interface (Priority: P2) + +Platform administrators can configure, start, stop, and monitor bots through a web-based management interface without editing configuration files or restarting services. + +**Why this priority**: Essential for operational management and enables non-developer administrators to manage bots. Required for long-term maintainability but bot can work without it initially. + +**Independent Test**: Can be tested by accessing management interface, creating a test bot instance, and verifying it appears in Matrix - demonstrates full bot lifecycle management. + +**Acceptance Scenarios**: + +1. **Given** administrator accesses Maubot management UI, **When** they log in with shared admin credentials, **Then** dashboard displays all bot instances, their status, and health metrics +2. **Given** administrator wants to deploy Instagram bot, **When** they upload maubot plugin file (.mbp), **Then** plugin appears in available plugins list +3. **Given** plugin is uploaded, **When** administrator creates new bot instance with Matrix user credentials and room subscription list, **Then** bot appears online in Matrix within 30 seconds and only responds in configured rooms +4. **Given** administrator wants to change enabled rooms, **When** they edit bot configuration file with new room IDs and restart bot instance, **Then** bot begins responding only in newly configured rooms +5. **Given** bot is running, **When** administrator clicks "Stop" button, **Then** bot goes offline and stops responding to commands +6. **Given** bot encounters error, **When** viewing bot logs in UI, **Then** error messages are displayed with timestamps, severity level, and context +7. **Given** bot experiences CRITICAL error, **When** error occurs, **Then** notification is sent to Matrix homeserver admin room with error details and affected bot instance + +--- + +### User Story 3 - Bot Framework Service Reliability (Priority: P2) + +The Maubot service starts automatically on server boot, maintains bot instances across restarts, and recovers from failures without manual intervention. + +**Why this priority**: Critical for production use but can be validated after basic functionality works. Prevents the bot framework from being a maintenance burden. + +**Independent Test**: Can be tested by rebooting the server and verifying Maubot service auto-starts and all bot instances resume operation automatically. + +**Acceptance Scenarios**: + +1. **Given** server reboots, **When** system comes back online, **Then** Maubot service starts automatically within 2 minutes and all bot instances reconnect to Matrix +2. **Given** Matrix homeserver restarts, **When** homeserver is available again, **Then** bot instances re-establish connections and resume operation without manual intervention +3. **Given** bot instance crashes, **When** Maubot detects failure, **Then** service attempts automatic restart with exponential backoff +4. **Given** bot encounters persistent error (ERROR/CRITICAL severity), **When** restart attempts fail, **Then** service logs detailed diagnostics, updates dashboard status, and sends notification to Matrix homeserver admin room +5. **Given** database connection lost, **When** connectivity is restored, **Then** Maubot reconnects automatically and restores bot state + +--- + +### User Story 4 - Additional Bot Deployment (Priority: P3) + +Platform administrators can deploy additional custom bots beyond Instagram bot by uploading plugin files and configuring bot instances, enabling extensible bot functionality for future team needs. + +**Why this priority**: Demonstrates platform extensibility and future-proofs the investment, but not required for initial value delivery. Can be added after Instagram bot proves value. + +**Independent Test**: Can be tested by deploying a simple echo bot or reaction bot from maubot plugin repository and verifying it works independently. + +**Acceptance Scenarios**: + +1. **Given** administrator has custom maubot plugin (.mbp file), **When** they upload via management interface, **Then** plugin is validated and added to available plugins +2. **Given** plugin requires configuration, **When** creating bot instance, **Then** administrator can provide plugin-specific settings through UI +3. **Given** multiple bot instances exist, **When** administrator views dashboard, **Then** all bots are clearly listed with their types, status, and resource usage +4. **Given** bot requires database storage, **When** bot instance is created, **Then** Maubot automatically provisions isolated database for that bot +5. **Given** plugin has dependencies, **When** uploading plugin, **Then** Maubot validates dependencies and reports missing requirements + +--- + +## Requirements *(mandatory)* + +### Functional Requirements + +- **FR-001**: System MUST extract and deploy maubot module from ops-base repository to ops-jrz1 infrastructure +- **FR-002**: System MUST integrate Maubot with existing conduwuit Matrix homeserver on clarun.xyz +- **FR-003**: System MUST provide web-based management interface on dedicated port (default: 29316) accessible to platform administrators via single shared admin account credentials stored in sops-nix secrets +- **FR-004**: Maubot service MUST support automatic startup on system boot and auto-recovery from failures +- **FR-005**: System MUST support Instagram bot plugin deployment with content fetching capabilities +- **FR-006**: Instagram bot MUST fetch and display images, videos, captions, and metadata from Instagram URLs posted only in designated bot-enabled Matrix rooms (bot ignores URLs in rooms where it is not explicitly enabled) +- **FR-007**: Instagram bot MUST handle rate limiting gracefully with user-friendly error messages +- **FR-008**: System MUST support multiple bot instances running concurrently with isolated configurations (architecture supports 3+ instances per SC-002, production deploys 1 instance initially per quickstart.md) +- **FR-009**: System MUST persist bot configurations and state to survive service restarts +- **FR-010**: Administrators MUST be able to configure bot room subscriptions by editing bot configuration file with Matrix room IDs and restarting the bot instance +- **FR-011**: System MUST provide health monitoring for bot instances with status indicators (health check API endpoint and dashboard status display via management interface) +- **FR-012**: System MUST integrate with existing sops-nix secrets management for bot credentials +- **FR-013**: System MUST support uploading and deploying additional maubot plugins (.mbp files) - functionality inherited from ops-base maubot.nix module, validated in T029 +- **FR-014**: System MUST provide logging capabilities for bot activity and errors accessible via management interface with severity-based propagation (DEBUG/INFO to logs only, WARN to logs and dashboard, ERROR/CRITICAL to logs, dashboard, and Matrix homeserver admin room) +- **FR-015**: Bot instances MUST authenticate with Matrix homeserver using registration tokens (conduwuit compatibility requirement, shared secret not supported) +- **FR-016**: System MUST support per-bot database storage with automatic provisioning + +### Key Entities + +- **Maubot Service**: Plugin-based Matrix bot framework that manages multiple bot instances, provides management interface, and handles Matrix homeserver integration +- **Bot Instance**: Individual bot deployment with specific configuration, Matrix user account, and plugin assignment (e.g., "instagram-bot-1") +- **Plugin**: Packaged bot functionality (.mbp file) containing code, metadata, and dependencies (e.g., Instagram content fetcher, echo bot, reaction bot) +- **Bot Configuration**: Settings specific to bot instance including Matrix credentials, plugin settings, room subscriptions (list of enabled room IDs), and command prefixes +- **Management Interface**: Web UI for administrators to create, configure, monitor, and control bot instances, displaying logs with severity levels and real-time status updates +- **Admin Notification**: ERROR and CRITICAL level bot notifications sent to existing Matrix homeserver admin room (shared with other platform notifications) +- **Bot Database**: Per-instance isolated SQLite database for plugin state and data persistence + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001**: Instagram bot responds to Instagram URLs with content preview within 5 seconds under normal conditions +- **SC-002**: System supports at least 3 concurrent bot instances without performance degradation +- **SC-003**: Maubot service maintains 99% uptime over 7-day testing period +- **SC-004**: Bot instances automatically recover within 2 minutes after service restart +- **SC-005**: Administrators can deploy a new bot instance from scratch in under 10 minutes +- **SC-006**: Instagram bot successfully fetches content for 95% of public Instagram post URLs +- **SC-007**: Management interface loads and displays bot status within 2 seconds +- **SC-008**: System handles server reboot without data loss or manual intervention required + +**Validation Note**: SC-001, SC-002, SC-003, SC-004, SC-008 have explicit task validation (T026, T042, T038, T034, T034). SC-005, SC-006, SC-007 are measured during the 7-day operational validation period (T038) and documented in deployment worklog (T044). + +## Scope *(mandatory)* + +### In Scope + +- Extract and adapt maubot.nix module from ops-base to ops-jrz1 +- Configure Maubot to integrate with conduwuit Matrix homeserver +- Deploy Instagram bot plugin as primary use case +- Set up management web interface with authentication +- Implement health monitoring and auto-recovery mechanisms +- Configure sops-nix secrets for bot credentials +- Document bot deployment and management procedures including room subscription configuration workflow +- Support for uploading additional maubot plugins + +### Out of Scope + +- Custom Instagram bot development (use existing maubot Instagram plugin from community) +- Migration of other bots from ops-base besides Instagram bot +- Advanced analytics or metrics dashboard for bot performance +- Multi-homeserver support (only clarun.xyz) +- Custom plugin development beyond Instagram bot deployment +- Mobile app for bot management (web interface only) +- Automatic Instagram authentication (manual token provisioning acceptable) +- Real-time Instagram feed monitoring or notifications + +## Constraints *(mandatory)* + +### Technical Constraints + +- Must work with conduwuit Matrix homeserver (ops-base used continuwuity, may require compatibility testing) +- Limited to Python 3.11 for maubot runtime (nixpkgs availability) +- Instagram bot functionality depends on Instagram API/scraping availability and rate limits +- Must adapt from ops-base VM-based deployment pattern to ops-jrz1 VPS single-host pattern +- Dependent on deprecated olm-3.2.16 library for Matrix encryption (known CVEs, acceptable risk documented in ops-base) + +### Operational Constraints + +- Deployment must not disrupt existing services (Matrix homeserver, Slack bridge, Forgejo) +- Management interface must be secured (single admin account authentication, localhost-only access) +- Management interface credentials must be stored in sops-nix encrypted secrets +- Bot Matrix accounts require registration tokens from homeserver +- Instagram tokens may require periodic renewal based on Instagram API policies + +### Resource Constraints + +- Maubot service limited to 512M memory (as per ops-base configuration) +- Additional database space required for bot state (estimated <100MB initially) +- Management interface port 29316 must not conflict with existing services + +## Dependencies *(mandatory)* + +### External Dependencies + +- ops-base repository access to extract maubot.nix module and documentation +- Instagram bot plugin from maubot community or ops-base implementation +- Instagram authentication tokens (if required by current Instagram API policies) +- Matrix homeserver registration token for bot user creation + +### Internal Dependencies + +- conduwuit Matrix homeserver must be operational on clarun.xyz +- sops-nix secrets management must be configured for bot credentials +- SQLite for bot state storage (decision per plan.md research: lightweight isolation better than shared PostgreSQL) +- Existing NixOS infrastructure and deployment patterns + +### Blocking Issues + +- Need to verify conduwuit compatibility with maubot (ops-base used continuwuity) +- Need to assess current Instagram API access requirements and scraping feasibility +- Need to extract and adapt ops-base module configuration options from `services.matrix-vm.maubot` to `services.dev-platform.maubot` + +## Assumptions *(mandatory)* + +- Instagram content fetching remains technically feasible (no major Instagram API changes blocking access) +- Maubot works with conduwuit Matrix homeserver with minimal or no modifications +- ops-base maubot module can be adapted to VPS deployment with reasonable effort +- Instagram bot plugin from ops-base is functional and can be reused or community plugin exists +- Team accepts olm-3.2.16 security risk with documented mitigation plan (migration to vodozemac when available) +- Bot traffic will remain under Instagram rate limits for small team usage (<100 requests/hour) +- Single VPS deployment sufficient (no distributed bot architecture needed) +- Single shared admin account sufficient for initial deployment (no multi-user management required) + +## Non-Goals *(optional)* + +- Automated Instagram post monitoring or scheduled fetching +- Direct posting to Instagram from Matrix (read-only integration) +- Instagram DM integration or two-way messaging +- Advanced content moderation or filtering +- Custom Instagram analytics or engagement tracking +- Multi-tenant bot hosting for external teams +- Commercial Instagram API integration (acceptable to use community scraping approaches) +- Real-time Instagram notifications or webhooks + +## Known Limitations *(optional)* + +The following edge cases are known limitations not addressed in MVP scope: + +- **Deleted/private Instagram posts**: Bot does not handle posts that become private or deleted after initial fetch (content remains in Matrix chat history) +- **Instagram rate limiting**: System may experience delays during high-traffic periods (429 responses). FR-007 requires graceful handling with user notifications. +- **Matrix account credential expiry**: Bot user account credentials are managed via registration tokens and do not expire automatically. Manual re-authentication required if revoked. +- **Instagram story URLs**: 24-hour expiry stories not supported (yt-dlp limitation for ephemeral content) +- **Command collision**: Multiple bot instances in same room may respond to overlapping triggers. Recommendation: enable only one bot per room or use distinct command prefixes. +- **Age-restricted/geo-blocked content**: Instagram content with access restrictions may fail to fetch depending on VPS location and yt-dlp capabilities +- **Management interface connection loss**: If Maubot loses connection to Matrix homeserver, bot instances stop responding until connection restored (monitored via health checks in FR-011) +- **Database corruption**: No automated backup/recovery. Recommendation: implement manual backup procedure for /var/lib/maubot/ during operational period. + +## Risks *(optional)* + +### Technical Risks + +- **Risk**: Instagram API/scraping methods may break with Instagram updates + - **Mitigation**: Document bot as best-effort, plan for periodic maintenance, monitor Instagram bot community for updates + +- **Risk**: Conduwuit compatibility issues with maubot not discovered until integration + - **Mitigation**: Test maubot registration and basic functionality early in implementation phase + +- **Risk**: olm-3.2.16 vulnerabilities may be exploited + - **Mitigation**: Follow ops-base mitigation strategy - monitor for vodozemac migration, limit bot network exposure, document accepted risk + +### Operational Risks + +- **Risk**: Instagram rate limiting may impact bot responsiveness during high usage + - **Mitigation**: Implement request queuing, user notifications for delays, consider rate limit monitoring + +- **Risk**: Bot management interface security breach could compromise Matrix homeserver + - **Mitigation**: Require strong authentication, limit network exposure, regular security audits, use sops-nix for credential storage + +- **Risk**: Bot instance failure may go unnoticed without monitoring + - **Mitigation**: Implement health checks, automated restarts, log monitoring, administrator alerts for persistent failures + +## Clarified Requirements *(resolved 2025-10-26)* + +### Instagram Authentication Approach +**Decision**: Use community scraping methods (instaloader, yt-dlp) for Instagram content fetching. + +**Rationale**: Easier to set up immediately without requiring Facebook developer account approval. Acceptable for internal team use with understanding that scraping methods may require periodic updates if Instagram changes their interface. + +### Management Interface Network Exposure +**Decision**: Restrict management interface to localhost only, requiring SSH tunnel for remote administration. + +**Rationale**: Maximizes security by eliminating network attack surface. Administrators already have SSH access for deployment, so tunnel setup is acceptable operational overhead for the security benefit. + +### Bot Instance Quantity Planning +**Decision**: Support single Instagram bot instance initially (1 instance). + +**Rationale**: Minimal resource requirements, proves concept quickly, demonstrates value before scaling. Architecture can support additional instances later if needed without major rework. + +**Note**: SC-002 requires validating 3-instance capability during testing to ensure architecture can scale when needed, but production deployment starts with single instance. diff --git a/specs/003-maubot-integration/tasks.md b/specs/003-maubot-integration/tasks.md new file mode 100644 index 0000000..1dc1243 --- /dev/null +++ b/specs/003-maubot-integration/tasks.md @@ -0,0 +1,348 @@ +# Implementation Tasks: Maubot Integration + +**Feature**: 003-maubot-integration +**Branch**: `003-maubot-integration` +**Target**: ops-jrz1 VPS (45.77.205.49) +**Estimated Duration**: 2-3 hours deployment + 7 days validation + +## Task Summary + +- **Total Tasks**: 47 (updated for incremental deployment strategy) +- **Setup Phase**: 4 tasks +- **Foundational Phase**: 6 tasks +- **User Story 1 (P1)**: 20 tasks - Instagram content sharing (MVP) + - Infrastructure: 3 tasks (T011-T013) + - Phase 1 deployment: 4 tasks (T013a-d) + - Phase 2 deployment: 4 tasks (T013e-h) + - Phase 3 deployment: 6 tasks (T014-T017c) + - Phase 4 bot config: 6 tasks (T018-T023) + - Testing: 4 tasks (T024-T027) +- **User Story 2 (P2)**: 6 tasks - Management interface +- **User Story 3 (P2)**: 5 tasks - Service reliability +- **User Story 4 (P3)**: 3 tasks - Additional bot deployment +- **Polish Phase**: 3 tasks + +**MVP Scope**: User Story 1 (20 tasks) - validates core value proposition with incremental deployment + +--- + +## Phase 1: Setup (Project Initialization) + +**Goal**: Prepare development environment and extract source modules from ops-base + +- [X] T001 Create feature branch 003-maubot-integration from main +- [X] T002 Copy maubot.nix module from /home/dan/proj/ops-base/vm-configs/modules/maubot.nix to modules/maubot.nix +- [X] T003 Copy Instagram bot plugin from /home/dan/proj/sna/sna-instagram-bot.mbp to local working directory +- [X] T004 Generate maubot secrets (admin password 32 chars, secret key 48 bytes) using openssl rand -base64 + +**Checkpoint**: Source files ready for adaptation + +--- + +## Phase 2: Foundational (Blocking Prerequisites) + +**Goal**: Adapt maubot module for ops-jrz1 and configure secrets + +**Independent Test**: Deploy adapted module and verify service starts without errors + +### Module Adaptation + +- [X] T005 Update module namespace from services.matrix-vm.maubot to services.maubot in modules/maubot.nix +- [X] T006 Update homeserver URL from http://127.0.0.1:6167 to http://127.0.0.1:8008 in modules/maubot.nix +- [X] T007 Remove registration_secrets section from config generation in modules/maubot.nix (lines ~140-150, conduwuit doesn't support shared secret) +- [X] T008 Change config path from /run/maubot/config.yaml to /var/lib/maubot/config/config.yaml in modules/maubot.nix +- [X] T009 Add LoadCredential removal for registration-secret (keep admin-password and secret-key only) in modules/maubot.nix systemd service section +- [X] T010 [P] Add maubot secrets to secrets/secrets.yaml (maubot-admin-password, maubot-secret-key) using sops secrets/secrets.yaml + +**Checkpoint**: Module adapted for conduwuit, secrets encrypted + +--- + +## Phase 3: User Story 1 - Instagram Content Sharing to Matrix (Priority: P1) + +**Goal**: Deploy maubot service with Instagram bot and validate content fetching + +**Independent Test**: Post Instagram URL in enabled Matrix room and verify bot responds with image/video/caption within 5 seconds + +**Why MVP**: Core value proposition - brings Instagram content into team communication, validates integration works + +### Infrastructure Deployment + +- [X] T011 [US1] Add sops secret declarations to hosts/ops-jrz1.nix (sops.secrets.maubot-admin-password, sops.secrets.maubot-secret-key) +- [X] T012 [US1] Create dev-platform wrapper options in modules/dev-services.nix (services.dev-platform.maubot with enable and port options) +- [X] T013 [US1] Add dev-platform config block in modules/dev-services.nix (maps to services.maubot with homeserverUrl, serverName, port, secret paths) + +### Service Deployment - Phase 1: Module Files + +- [ ] T013a [US1] Deploy Phase 1 to VPS (modules added, service disabled) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost +- [ ] T013b [US1] Verify Phase 1: Check nixos-rebuild output reports "no services changed" or only unrelated service restarts +- [ ] T013c [US1] Verify existing services healthy: ssh root@45.77.205.49 'systemctl status matrix-continuwuity mautrix-slack forgejo postgresql nginx' +- [ ] T013d [US1] Git commit Phase 1 with message "Add maubot module files (service disabled)" + +### Service Deployment - Phase 2: Secrets + +- [ ] T013e [US1] Deploy Phase 2 to VPS (secrets added in Phase 0 and Phase 1, service still disabled) using nixos-rebuild switch +- [ ] T013f [US1] Verify Phase 2: Check secrets decrypted via ssh root@45.77.205.49 'ls -la /run/secrets/maubot-*' (expect 0400 permissions) +- [ ] T013g [US1] Verify existing services healthy (same command as T013c) +- [ ] T013h [US1] Git commit Phase 2 with message "Add maubot secrets (service not enabled)" + +### Service Deployment - Phase 3: Enable Service + +- [ ] T014 [US1] Enable maubot service in hosts/ops-jrz1.nix (services.dev-platform.maubot.enable = true, port = 29316) +- [ ] T015 [US1] Deploy Phase 3 to VPS (enable maubot service) using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost +- [ ] T016 [US1] Verify service status via ssh root@45.77.205.49 'systemctl status maubot.service' (expect active running) +- [ ] T017 [US1] Check logs for errors via ssh root@45.77.205.49 'journalctl -u maubot.service -n 50' +- [ ] T017a [US1] Verify existing services still healthy after maubot deployment (same command as T013c) +- [ ] T017b [US1] Test Slack bridge functionality (post message in Slack, verify appears in Matrix within 5 seconds) +- [ ] T017c [US1] Git commit Phase 3 with message "Enable maubot service (no bots deployed yet)" + +### Bot Configuration - Phase 4: Manual Deployment + +- [ ] T018 [US1] Create SSH tunnel to management interface: ssh -L 29316:localhost:29316 root@45.77.205.49 +- [ ] T019 [US1] Login to maubot web UI at http://localhost:29316/_matrix/maubot (username: admin, password from sops secrets) +- [ ] T020 [US1] Create bot Matrix user @instagram-bot:clarun.xyz via conduwuit registration token (Clients tab → Add client → Register) +- [ ] T021 [US1] Upload Instagram plugin sna-instagram-bot.mbp via web UI (Plugins tab → Upload) +- [ ] T022 [US1] Create bot instance instagram-bot-1 (type: sna.instagram, primary_user: @instagram-bot:clarun.xyz, config: {"enabled": true, "max_file_size": 50000000, "room_subscriptions": []}) +- [ ] T023 [US1] Invite bot to test Matrix room via /invite @instagram-bot:clarun.xyz + +### Testing & Validation + +- [ ] T024 [US1] Add test room ID to bot config room_subscriptions in maubot web UI +- [ ] T025 [US1] Restart bot instance (Stop → Start in web UI) +- [ ] T026 [US1] Post public Instagram URL in test room and verify bot responds within 5 seconds with image/video/caption (SC-001) +- [ ] T027 [US1] Post Instagram URL in non-subscribed room and verify bot ignores it (FR-006 enforcement) + +**Acceptance Criteria**: +- ✅ Bot responds to Instagram URLs in subscribed rooms only +- ✅ Content fetched within 5 seconds (SC-001) +- ✅ Images, videos, and captions displayed correctly +- ✅ Bot ignores URLs in non-subscribed rooms + +**MVP Checkpoint**: Core functionality working - Instagram content visible in Matrix + +--- + +## Phase 4: User Story 2 - Bot Management Interface (Priority: P2) + +**Goal**: Validate management interface functionality for bot lifecycle operations + +**Independent Test**: Access management UI, create/stop/restart bot instance, view logs and status + +**Why this priority**: Essential for operations but bot works without admin features initially + +### Management Interface Validation + +- [ ] T028 [US2] Access management dashboard via SSH tunnel and verify all bot instances listed with status (instances tab) +- [ ] T029 [US2] Test plugin upload via web UI (upload test .mbp file, verify appears in plugins list) +- [ ] T030 [US2] Test bot instance creation via web UI (create test instance, verify appears online in Matrix within 30 seconds) +- [ ] T031 [US2] Test bot configuration edit (edit room_subscriptions via config JSON, restart instance, verify bot responds only in new rooms) +- [ ] T032 [US2] Test bot stop/start via web UI (click Stop button, verify bot goes offline, click Start, verify reconnects) +- [ ] T033 [US2] View bot logs in UI and verify error messages display with timestamps and severity levels + +**Acceptance Criteria**: +- ✅ Dashboard displays all bot instances with status +- ✅ Plugin upload succeeds and validates +- ✅ Bot lifecycle operations (create/stop/start) work via UI +- ✅ Configuration changes take effect after restart +- ✅ Logs visible with proper formatting + +--- + +## Phase 5: User Story 3 - Bot Framework Service Reliability (Priority: P2) + +**Goal**: Validate auto-start, auto-recovery, and failure handling + +**Independent Test**: Reboot server and verify maubot service and all bot instances resume automatically + +**Why this priority**: Critical for production reliability but can be validated after basic functionality proven + +### Reliability Testing + +- [ ] T034 [US3] Test server reboot recovery (ssh root@45.77.205.49 'reboot', wait 2 minutes, verify service auto-starts via systemctl status maubot) +- [ ] T035 [US3] Test Matrix homeserver restart handling (restart matrix-continuwuity service, verify bot reconnects automatically without manual intervention) +- [ ] T036 [US3] Verify health check timers active (ssh root@45.77.205.49 'systemctl list-timers | grep maubot', expect maubot-health.timer and maubot-health-restart.timer) +- [ ] T037 [US3] Test manual health check (curl http://localhost:29316/_matrix/maubot/v1/version, verify JSON response with version field) +- [ ] T038 [US3] Monitor 7-day uptime for SC-003 validation (99% uptime target, check periodically: uptime -p, journalctl -u maubot | grep -i error) + +**Acceptance Criteria**: +- ✅ Service auto-starts on server boot within 2 minutes +- ✅ Bot instances reconnect after Matrix homeserver restart +- ✅ Health timers operational +- ✅ 99% uptime achieved over 7-day period + +--- + +## Phase 6: User Story 4 - Additional Bot Deployment (Priority: P3) + +**Goal**: Demonstrate platform extensibility by deploying a second bot type + +**Independent Test**: Deploy echo bot or reaction bot from maubot plugin repository and verify independent operation + +**Why this priority**: Future-proofs investment, not required for initial Instagram bot value + +### Extensibility Validation + +- [ ] T039 [US4] Download additional maubot plugin from community repository (e.g., echo bot, reaction bot) +- [ ] T040 [US4] Upload second plugin via management UI and verify validation succeeds +- [ ] T041 [US4] Create second bot instance using new plugin and verify appears in dashboard with type, status, and resource usage +- [ ] T042 [US4] Test SC-002 multi-instance validation (run 3 concurrent bot instances, verify no performance degradation) + +**Acceptance Criteria**: +- ✅ Multiple plugin types supported +- ✅ Dashboard shows all bots with clear differentiation +- ✅ 3+ concurrent instances run without degradation (SC-002) + +--- + +## Phase 7: Polish & Cross-Cutting Concerns + +**Goal**: Complete documentation and prepare for merge + +### Documentation + +- [ ] T043 Update CLAUDE.md with maubot management commands (service status, logs, SSH tunnel, room subscription workflow) +- [ ] T044 Create deployment worklog in docs/worklogs/2025-10-26-maubot-deployment.org documenting session +- [ ] T045 Commit changes and tag release v0.3.0 (message: "Add maubot bot framework with Instagram bot - Implements 003-maubot-integration") + +**Final Checkpoint**: All documentation complete, ready for 7-day validation period + +--- + +## Dependencies & Execution Order + +### User Story Dependencies + +``` +Phase 1 (Setup) + ↓ +Phase 2 (Foundational) ← BLOCKING for all user stories + ↓ +├─→ User Story 1 (P1) ← MVP, no dependencies +├─→ User Story 2 (P2) ← depends on US1 (needs running bot to manage) +├─→ User Story 3 (P2) ← depends on US1 (needs service deployed to test reliability) +└─→ User Story 4 (P3) ← depends on US2 (needs management UI working) + ↓ +Phase 7 (Polish) ← depends on all user stories complete +``` + +### Critical Path + +1. Setup (T001-T004) +2. Foundational (T005-T010) - **MUST complete before user stories** +3. User Story 1 (T011-T027) - **MVP - Deploy first, validate before continuing** +4. Validate MVP success before proceeding to US2/US3/US4 +5. User Stories 2, 3, 4 can proceed in parallel after US1 validates +6. Polish (T043-T045) after all user stories complete + +--- + +## Parallel Execution Opportunities + +### Phase 2 (Foundational) + +**Parallel**: +- T010 can run in parallel with T005-T009 (secrets vs module editing, different files) + +### Phase 3 (User Story 1) + +**Parallel**: +- T011, T012, T013 can run in parallel (different files: hosts/ops-jrz1.nix, modules/dev-services.nix) +- After T015 deploys: T016, T017 can run in parallel (both read-only checks) + +**Sequential**: +- T014 depends on T011, T012, T013 (needs config in place) +- T015 depends on T014 (deployment needs config) +- T018-T027 must run sequentially (UI workflow dependencies) + +### Phase 4-6 (User Stories 2, 3, 4) + +**Parallel after US1**: +- US2 tasks (T028-T033) can run in parallel with US3 tasks (T034-T038) if US1 validates +- US4 tasks (T039-T042) should wait for US2 to confirm management UI working + +--- + +## Implementation Strategy + +### MVP-First Approach + +**Week 1**: Focus exclusively on User Story 1 (T001-T027) +- Goal: Working Instagram bot responding to URLs in designated rooms +- Success: Can demo "post Instagram URL → see content in Matrix" +- Decision point: If MVP fails, stop and reassess before continuing + +**Week 2**: Expand to User Stories 2 & 3 (T028-T038) in parallel +- Goal: Operational management and reliability validated +- Success: Admins can manage bots via UI, service survives restarts + +**Week 3**: Add extensibility (User Story 4) if needed (T039-T042) +- Goal: Prove multi-bot capability +- Success: 3 concurrent bot instances running + +**Week 4+**: 7-day validation period +- Monitor uptime (SC-003: 99% target) +- Monitor Instagram fetch success rate (SC-006: 95% target) +- Collect user feedback + +### Incremental Delivery + +Each user story delivers independently testable value: +- **US1**: Instagram content in Matrix (core value) +- **US2**: Self-service bot management (operational efficiency) +- **US3**: Production reliability (reduces maintenance burden) +- **US4**: Platform extensibility (future-proofing) + +Can stop after any user story and still have working system. + +--- + +## Testing Strategy + +**Manual QA** (no automated tests per plan.md): +- Each user story has "Independent Test" criteria +- Acceptance scenarios from spec.md validated manually +- Success criteria (SC-001 through SC-008) checked via quickstart.md checklist + +**Validation Period**: +- 7 days operational before merging to main (per constitution Principle III) +- Monitor metrics: uptime, response time, fetch success rate +- Document issues in worklog + +--- + +## Risk Mitigation + +**High-risk tasks**: +- T007: Removing registration_secrets (conduwuit incompatibility) - carefully test bot registration after change +- T015: Initial deployment (first time on ops-jrz1) - have rollback ready via nixos-rebuild switch --rollback +- T020: Bot user registration (new auth pattern) - document exact steps in worklog for repeatability + +**Rollback points**: +- After T010: Can rollback before deployment if module adaptation fails +- After T015: NixOS generation rollback if service won't start +- After T027: Can remove bot and redeploy if issues found + +--- + +## Success Metrics + +**Per User Story**: +- US1: Bot responds to Instagram URLs within 5 seconds (SC-001) +- US2: Management UI loads within 2 seconds (SC-007) +- US3: 99% uptime over 7 days (SC-003), auto-recovery within 2 minutes (SC-004) +- US4: 3 concurrent instances without degradation (SC-002) + +**Overall**: +- [ ] All 8 success criteria validated (SC-001 through SC-008) +- [ ] Constitution check passes (all 4 principles compliant) +- [ ] 7-day stability period completed without critical issues +- [ ] Documentation complete (spec, plan, quickstart, worklog, CLAUDE.md updated) + +--- + +**Estimated Timeline**: +- **MVP (US1)**: 2-3 hours deployment + testing +- **Full Feature (US1-4)**: 1 week implementation + 1 week validation +- **Production Ready**: 2 weeks total (including 7-day stability period) + +**Next Command**: `/speckit.implement` to begin execution (start with T001) diff --git a/specs/004-browser-dev-environment/design.md b/specs/004-browser-dev-environment/design.md new file mode 100644 index 0000000..70fec80 --- /dev/null +++ b/specs/004-browser-dev-environment/design.md @@ -0,0 +1,532 @@ +# Browser-Based Development Environment + +## Overview + +Provide VS Code in the browser via code-server, with: +- **opencode** AI coding agent pre-installed (CLI + VS Code extension) +- Container-based isolation for security against LLM-generated code risks +- Zero-setup experience for users of varying skill levels + +## User Personas + +| Persona | Description | Needs | +|---------|-------------|-------| +| **Non-programmer** | Learning to code with AI assistance | GUI-first, minimal friction, no terminal knowledge required | +| **Programmer (testing)** | Evaluating AI coding tools | Fast setup, full terminal access, multiple language support | +| **Learner** | Learning AI-assisted dev or new languages | Gentle on-ramp, room to grow, pre-configured tools | + +## Requirements + +| Requirement | Value | +|-------------|-------| +| Users | 1-5, separate workspaces | +| Inter-user isolation | Not required | +| Security model | Container sandbox per user | +| Access | HTTPS via existing nginx | +| Persistence | User workspaces survive restarts | +| AI tooling | opencode pre-installed and configured | + +## Architecture + +**Routing**: Subdomain-based (`dan.code.clarun.xyz`) for clean isolation. + +Path-based routing (`/code/dan/`) was considered but rejected: +- VS Code extensions assume root path, break with subpaths +- Cookie scoping issues across users +- PWA installation fails +- WebSocket URL construction breaks + +``` + ┌──────────────────────────────────────────┐ + │ DNS (Vultr) │ + │ *.code.clarun.xyz → 45.77.205.49 │ + └────────────────────┬─────────────────────┘ + │ +┌─────────────────────────────────────────────┴─────────────────────────────────┐ +│ nginx :443 │ +│ (wildcard ACME cert for *.code.clarun.xyz) │ +└─────────────────────┬───────────────────────────────────────────────┬─────────┘ + │ │ + ┌────────────────┼────────────────┬────────────────┐ │ + ▼ ▼ ▼ ▼ ▼ +dan.code. alice.code. bob.code. *.code. clarun.xyz +clarun.xyz clarun.xyz clarun.xyz clarun.xyz (existing) + │ │ │ │ + ▼ ▼ ▼ ▼ +┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────────┐ +│ Podman │ │ Podman │ │ Podman │ │ 404 landing │ +│ Container │ │ Container │ │ Container │ │ page │ +│ │ │ │ │ │ │ (unknown user)│ +│ code- │ │ code- │ │ code- │ └───────────────┘ +│ server │ │ server │ │ server │ +│ +opencode │ │ +opencode │ │ +opencode │ +│ :8081 │ │ :8082 │ │ :8083 │ +└─────┬─────┘ └─────┬─────┘ └─────┬─────┘ + │ │ │ + ▼ ▼ ▼ +┌───────────┐ ┌───────────┐ ┌───────────┐ +│ /var/lib/ │ │ /var/lib/ │ │ /var/lib/ │ +│ vscode/ │ │ vscode/ │ │ vscode/ │ +│ dan/ │ │ alice/ │ │ bob/ │ +└───────────┘ └───────────┘ └───────────┘ + (bind mount) +``` + +### User Experience Flow + +``` +User opens browser + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ VS Code (in browser) │ +│ │ +│ ┌─────────────────────────┐ ┌──────────────────────────────┐ │ +│ │ Editor Pane │ │ opencode Panel │ │ +│ │ │ │ (Ctrl+Esc to open) │ │ +│ │ [select code] ────────┼──► Context auto-shared │ │ +│ │ │ │ │ │ +│ │ ◄─────────────────────┼── AI suggests/edits │ │ +│ │ │ │ │ │ +│ └─────────────────────────┘ └──────────────────────────────┘ │ +│ │ +│ Keybindings: │ +│ • Ctrl+Esc → Open opencode in split terminal │ +│ • Ctrl+Shift+Esc → New opencode session │ +│ • Alt+Ctrl+K → Insert file reference (@File#L37-42) │ +│ │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Technology Choices + +### code-server (not openvscode-server) + +| Factor | code-server | openvscode-server | +|--------|-------------|-------------------| +| Built-in auth | ✅ Password | ❌ Need proxy | +| Maintenance | Active (Coder) | Active (Gitpod) | +| NixOS module | ✅ `services.code-server` | ❌ Manual | +| Features | More batteries | Pure VS Code | + +**Decision**: code-server for built-in auth and NixOS integration. + +### Podman Rootless (not Docker) + +| Factor | Podman | Docker | +|--------|--------|--------| +| Rootless | ✅ Native | ⚠️ Requires setup | +| Daemonless | ✅ Yes | ❌ dockerd required | +| NixOS integration | ✅ `virtualisation.oci-containers` | ✅ Also supported | +| Security | Container root → unprivileged user | Root unless configured | + +**Decision**: Podman rootless for better security defaults and systemd integration. + +### Bind Mounts (not Docker volumes) + +| Factor | Bind Mounts | Docker Volumes | +|--------|-------------|----------------| +| Transparency | Standard directories | Opaque blobs | +| Backup | rsync, restic, tar | docker cp required | +| Recovery | Host filesystem tools | Volume commands | +| Permissions | Standard Unix perms | Volume driver dependent | + +**Decision**: Bind mounts to `/var/lib/vscode//` for simplicity and backup compatibility. + +### Authentication + +| Option | Pros | Cons | +|--------|------|------| +| code-server password | Simple, per-user | Manual password management | +| nginx basic auth | Centralized | WebSocket conflicts, breaks PWA | +| OAuth proxy | SSO, enterprise | Complexity, RAM overhead | + +**Decision**: code-server password auth, managed via sops-nix. nginx handles HTTPS only. + +## Resource Planning + +### Per-Container Limits + +| Resource | Limit | Rationale | +|----------|-------|-----------| +| Memory (soft) | 2.5GB | Normal operation headroom for VS Code + opencode | +| Memory (hard) | 3GB | Comfortable for AI agent workloads, prevents OOM | +| CPU | 1.5 cores | Fair share, prevent monopolization | + +### Server Sizing + +| Users | RAM Required | CPU | Recommendation | +|-------|--------------|-----|----------------| +| 1 | ~3.5GB (3GB container + system) | 1-2 | Tight on 2GB VPS | +| 2-3 | ~7-10GB | 2 | Upgrade to 8GB | +| 4-5 | ~12-16GB | 2-4 | Upgrade to 16GB | + +**Action**: Upgrade VPS to 8GB RAM before deployment (supports 2 users comfortably). + +## Storage Layout + +``` +/var/lib/vscode/ +├── dan/ +│ ├── workspace/ # Project files (bind mount → container /home/coder/project) +│ └── config/ # VS Code settings, extensions (bind mount → container ~/.local/share/code-server) +├── alice/ +│ ├── workspace/ +│ └── config/ +└── ... +``` + +### Backup Integration + +Existing backup service (`modules/backup.nix`) can be extended: + +```bash +# Add to backup script +tar czf "$TMP/vscode-workspaces.tar.gz" /var/lib/vscode/ +``` + +## NixOS Implementation + +### Module Structure + +``` +modules/ +└── code-server-containers.nix # New module +``` + +### Configuration Interface + +```nix +services.code-server-multi = { + enable = true; + + users = { + dan = { + port = 8081; + passwordFile = config.sops.secrets.code-server-dan.path; + memoryLimit = "2G"; + cpuLimit = "1.5"; + }; + alice = { + port = 8082; + passwordFile = config.sops.secrets.code-server-alice.path; + }; + }; + + # Shared settings + baseImage = "codercom/code-server:latest"; # Or custom image with Nix + workspaceBase = "/var/lib/vscode"; +}; +``` + +### Generated Resources + +For each user, the module generates: + +1. **Podman container** via `virtualisation.oci-containers` +2. **Storage directories** via `systemd.tmpfiles.rules` +3. **nginx virtual host** (`.code.clarun.xyz`) with WebSocket support +4. **sops secret** reference for password + +**DNS requirement**: Wildcard A record `*.code.clarun.xyz` → server IP (configured in Vultr DNS) + +### nginx Configuration + +Per-user virtual hosts generated by module (one per user): + +```nix +# Generated for each user (e.g., dan) +services.nginx.virtualHosts."dan.code.clarun.xyz" = { + forceSSL = true; + useACMEHost = "code.clarun.xyz"; # Wildcard cert + + locations."/" = { + proxyPass = "http://127.0.0.1:8081"; # User's port + proxyWebsockets = true; + extraConfig = '' + proxy_set_header Host $host; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection upgrade; + proxy_set_header Accept-Encoding gzip; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + ''; + }; +}; + +# Wildcard cert for all subdomains +security.acme.certs."code.clarun.xyz" = { + domain = "code.clarun.xyz"; + extraDomainNames = [ "*.code.clarun.xyz" ]; + dnsProvider = "vultr"; # Requires DNS-01 challenge for wildcard + credentialsFile = config.sops.secrets.vultr-api-key.path; +}; + +# Catch-all for unknown subdomains +services.nginx.virtualHosts."*.code.clarun.xyz" = { + useACMEHost = "code.clarun.xyz"; + locations."/" = { + return = "404"; + }; +}; +``` + +**Note**: Wildcard certs require DNS-01 challenge (HTTP-01 won't work). Need Vultr API key for DNS automation. + +## API Key Management + +opencode requires API keys for AI providers (Anthropic, OpenAI). Strategy for managing these in multi-user environment: + +### Phase 1: Shared Keys (MVP) + +For 1-5 trusted users, inject shared API keys via environment variables: + +```nix +# Per-user container gets keys from sops-nix +services.code-server-multi.users.dan = { + # ... other config ... + environment = { + ANTHROPIC_API_KEY = config.sops.secrets.opencode-anthropic.path; + OPENAI_API_KEY = config.sops.secrets.opencode-openai.path; + }; +}; +``` + +**Cost control at provider level:** +- Set monthly spend limits on API keys ($50-100/month) +- Create project-specific keys for this use case +- Monitor usage via provider dashboards + +**Pros**: Simple, no additional infrastructure +**Cons**: Users can see keys via `env`, no per-user tracking + +### Phase 2: Proxy with BYOK (Future) + +If scale or cost becomes an issue: + +``` +┌──────────────┐ ┌──────────────┐ ┌──────────────┐ +│ Container │────►│ API Proxy │────►│ AI Provider │ +│ (opencode) │ │ (host) │ │ │ +│ │ │ - Rate limit │ │ │ +│ Base URL: │ │ - Log usage │ │ │ +│ proxy:8080 │ │ - Add API key│ │ │ +└──────────────┘ └──────────────┘ └──────────────┘ +``` + +Options: +- **litellm**: Proxy supporting multiple providers, usage tracking +- **Custom**: Minimal proxy that adds keys and logs requests + +**Bring Your Own Key (BYOK)**: Users provide their own API keys, stored in their container's persistent config. + +### Decision: Phase 1 for MVP + +For initial deployment with 1-5 users: +1. Shared keys injected via sops-nix environment variables +2. Per-key spend limits set at provider level (OpenAI: $50, Anthropic: $50) +3. Trust model: users are known/trusted, not adversarial +4. Re-evaluate when hitting limits or adding untrusted users + +## Port Forwarding + +### Phase 1: No User-Controlled Ports (MVP) + +Users cannot expose their own web apps externally. Dev servers run inside container, accessible only via VS Code's built-in port forwarding (localhost within the browser session). + +**Rationale**: Simplifies security model, avoids wildcard subdomain proliferation, reduces attack surface. + +### Phase 2: Platform-Controlled Ports (Future) + +If needed, platform team can expose specific user apps: + +``` +# Per-user app subdomain (requires platform team to configure) +dan-app.code.clarun.xyz → container port 8080 + +# Or numbered ports per user +dan.code.clarun.xyz:8080 → container port 8080 +``` + +**Design consideration**: Reserve subdomain/port space in DNS and nginx config for future expansion without architectural changes. + +## Security Model + +### Container Isolation + +| Threat | Mitigation | +|--------|------------| +| Filesystem escape | Bind mounts limit visible paths | +| Credential theft | Don't mount ~/.ssh, secrets | +| Host process access | Container namespaces | +| Resource exhaustion | Memory/CPU limits, OOM targets container | +| Network exfil | Possible future: network policy | + +### What Containers Don't Prevent + +- Malicious code running inside container +- Package supply chain attacks (npm, pip) +- Data exfiltration via allowed network +- Container escape via kernel vulnerability (rare) + +### Defense in Depth + +1. **Container**: Limits blast radius +2. **No host secrets**: ~/.ssh, AWS creds not mounted +3. **Resource limits**: Can't fork bomb host +4. **Easy reset**: Nuke container, keep workspace +5. **Backup**: Restore workspace from backup if compromised + +## Image Strategy + +### Custom Image with opencode (Required) + +Since we need opencode pre-installed, a custom image is required: + +```dockerfile +FROM codercom/code-server:latest + +# Install opencode CLI +RUN curl -fsSL https://opencode.ai/install | bash + +# Pre-install opencode VS Code extension (from Open VSX) +RUN code-server --install-extension sst-dev.opencode + +# Install common language toolchains +RUN apt-get update && apt-get install -y \ + python3 python3-pip \ + nodejs npm \ + git \ + && rm -rf /var/lib/apt/lists/* + +# Optional: Install Nix for on-demand packages +# RUN curl -L https://nixos.org/nix/install | sh +# ENV PATH="/root/.nix-profile/bin:$PATH" +``` + +### Container Contents + +| Component | Purpose | +|-----------|---------| +| code-server | VS Code in browser | +| opencode CLI | AI coding agent | +| sst-dev.opencode extension | VS Code integration for opencode | +| Python 3 | Common language | +| Node.js | Common language | +| Git | Version control | + +### Image Management + +Options for keeping image updated: + +1. **Manual rebuild**: Rebuild and redeploy periodically +2. **CI/CD**: Auto-rebuild on Dockerfile changes +3. **Watchtower equivalent**: Auto-pull new tags (risky for stability) + +**Decision**: Manual rebuild initially, automate via CI later if needed. + +### Extension Pre-Installation + +The opencode extension is available on Open VSX (required for code-server): +- Registry: [open-vsx.org/extension/sst-dev/opencode](https://open-vsx.org/extension/sst-dev/opencode) +- Install command: `code-server --install-extension sst-dev.opencode` + +## Rollout Plan + +### Phase 1: Single User (SSH Tunnel) + +1. Deploy one container for testing +2. Access via SSH tunnel only +3. Validate WebSocket, extensions, terminal +4. Test memory usage under load + +### Phase 2: nginx Integration + +1. Add nginx reverse proxy route +2. Enable HTTPS via ACME +3. Test from external network +4. Validate PWA install works + +### Phase 3: Multi-User + +1. Add additional users +2. Upgrade server RAM if needed +3. Test concurrent usage +4. Document onboarding + +### Phase 4: Hardening + +1. Custom image with Nix (if needed) +2. Network policies (if needed) +3. Automated backup of workspaces +4. Monitoring/alerting + +## Open Questions + +1. ~~**Domain**: `code.clarun.xyz` or path under existing domain?~~ → Resolved: Subdomain routing (`dan.code.clarun.xyz`) +2. ~~**API keys**: How to provision opencode API keys (OpenAI, Anthropic, etc.) per user?~~ → Resolved: Phase 1 shared keys via sops-nix, provider-level spend limits +3. ~~**Git credentials**: How do users authenticate to git remotes?~~ → Resolved: Deferred - local-only projects initially, add git auth in Phase 2 if needed +4. **Onboarding docs**: What documentation do non-programmers need? + +## References + +### code-server +- [code-server GitHub](https://github.com/coder/code-server) +- [code-server multi-user blog](https://coder.com/blog/code-server-multiple-users) +- [NixOS oci-containers](https://nixos.wiki/wiki/Podman) + +### opencode +- [opencode.ai](https://opencode.ai/) +- [opencode GitHub](https://github.com/sst/opencode) +- [opencode VS Code extension (Open VSX)](https://open-vsx.org/extension/sst-dev/opencode) +- [opencode VS Code extension (MS Marketplace)](https://marketplace.visualstudio.com/items?itemName=sst-dev.opencode) + +### Other +- [Tailscale code-server guide](https://tailscale.com/kb/1166/vscode-ipad) (for iPad/PWA patterns) + +## Appendix: Alternatives Considered + +### VS Code Remote SSH + +Users run VS Code locally, SSH to server for compute. + +| Pros | Cons | +|------|------| +| Less server RAM (UI on laptop) | Not browser-only | +| Native VS Code experience | Requires local VS Code install | +| No container complexity | Less isolation | +| Better keyboard shortcuts | Higher barrier for non-programmers | + +**Why not chosen**: Non-programmer users need zero-install browser access. + +### openvscode-server (instead of code-server) + +| Factor | code-server | openvscode-server | +|--------|-------------|-------------------| +| Built-in auth | ✅ | ❌ | +| NixOS module | ✅ | ❌ | +| Maintenance | Active | Active | + +**Why not chosen**: code-server has built-in auth and better NixOS integration. + +### Coder Platform (instead of DIY) + +Enterprise platform for provisioning dev environments. + +| Pros | Cons | +|------|------| +| Multi-user built-in | Terraform complexity | +| SSO, audit logs | Overkill for 1-5 users | +| Auto-shutdown | Designed for cloud provisioning | + +**Why not chosen**: We have existing infrastructure; Coder adds unnecessary complexity. + +### Terminal-Only (SSH + tmux + neovim) + +| Pros | Cons | +|------|------| +| Minimal resources | High learning curve | +| Power user friendly | Non-programmers excluded | + +**Why not chosen**: Must support non-programmer learners with GUI.