Dan 8826d62bcc Add maubot integration and infrastructure updates

- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)

2025-12-08 15:55:12 -08:00

20 KiB

Raw Blame History

Browser-Based Development Environment

Overview

Provide VS Code in the browser via code-server, with:

opencode AI coding agent pre-installed (CLI + VS Code extension)
Container-based isolation for security against LLM-generated code risks
Zero-setup experience for users of varying skill levels

User Personas

Persona	Description	Needs
Non-programmer	Learning to code with AI assistance	GUI-first, minimal friction, no terminal knowledge required
Programmer (testing)	Evaluating AI coding tools	Fast setup, full terminal access, multiple language support
Learner	Learning AI-assisted dev or new languages	Gentle on-ramp, room to grow, pre-configured tools

Requirements

Requirement	Value
Users	1-5, separate workspaces
Inter-user isolation	Not required
Security model	Container sandbox per user
Access	HTTPS via existing nginx
Persistence	User workspaces survive restarts
AI tooling	opencode pre-installed and configured

Architecture

Routing: Subdomain-based (dan.code.clarun.xyz) for clean isolation.

Path-based routing (/code/dan/) was considered but rejected:

VS Code extensions assume root path, break with subpaths
Cookie scoping issues across users
PWA installation fails
WebSocket URL construction breaks

                         ┌──────────────────────────────────────────┐
                         │              DNS (Vultr)                  │
                         │   *.code.clarun.xyz → 45.77.205.49       │
                         └────────────────────┬─────────────────────┘
                                              │
┌─────────────────────────────────────────────┴─────────────────────────────────┐
│                            nginx :443                                          │
│                   (wildcard ACME cert for *.code.clarun.xyz)                   │
└─────────────────────┬───────────────────────────────────────────────┬─────────┘
                      │                                               │
     ┌────────────────┼────────────────┬────────────────┐            │
     ▼                ▼                ▼                ▼            ▼
dan.code.        alice.code.      bob.code.       *.code.       clarun.xyz
clarun.xyz       clarun.xyz       clarun.xyz      clarun.xyz    (existing)
     │                │                │                │
     ▼                ▼                ▼                ▼
┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────────┐
│ Podman    │  │ Podman    │  │ Podman    │  │ 404 landing   │
│ Container │  │ Container │  │ Container │  │ page          │
│           │  │           │  │           │  │ (unknown user)│
│ code-     │  │ code-     │  │ code-     │  └───────────────┘
│ server    │  │ server    │  │ server    │
│ +opencode │  │ +opencode │  │ +opencode │
│ :8081     │  │ :8082     │  │ :8083     │
└─────┬─────┘  └─────┬─────┘  └─────┬─────┘
      │              │              │
      ▼              ▼              ▼
┌───────────┐  ┌───────────┐  ┌───────────┐
│ /var/lib/ │  │ /var/lib/ │  │ /var/lib/ │
│ vscode/   │  │ vscode/   │  │ vscode/   │
│ dan/      │  │ alice/    │  │ bob/      │
└───────────┘  └───────────┘  └───────────┘
   (bind mount)

User Experience Flow

User opens browser
       │
       ▼
┌─────────────────────────────────────────────────────────────────┐
│                     VS Code (in browser)                         │
│                                                                  │
│  ┌─────────────────────────┐  ┌──────────────────────────────┐  │
│  │      Editor Pane        │  │   opencode Panel             │  │
│  │                         │  │   (Ctrl+Esc to open)         │  │
│  │   [select code] ────────┼──►  Context auto-shared         │  │
│  │                         │  │                              │  │
│  │   ◄─────────────────────┼──   AI suggests/edits           │  │
│  │                         │  │                              │  │
│  └─────────────────────────┘  └──────────────────────────────┘  │
│                                                                  │
│  Keybindings:                                                    │
│  • Ctrl+Esc       → Open opencode in split terminal              │
│  • Ctrl+Shift+Esc → New opencode session                         │
│  • Alt+Ctrl+K     → Insert file reference (@File#L37-42)         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Technology Choices

code-server (not openvscode-server)

Factor	code-server	openvscode-server
Built-in auth	✅ Password	❌ Need proxy
Maintenance	Active (Coder)	Active (Gitpod)
NixOS module	✅ `services.code-server`	❌ Manual
Features	More batteries	Pure VS Code

Decision: code-server for built-in auth and NixOS integration.

Podman Rootless (not Docker)

Factor	Podman	Docker
Rootless	✅ Native	⚠️ Requires setup
Daemonless	✅ Yes	❌ dockerd required
NixOS integration	✅ `virtualisation.oci-containers`	✅ Also supported
Security	Container root → unprivileged user	Root unless configured

Decision: Podman rootless for better security defaults and systemd integration.

Bind Mounts (not Docker volumes)

Factor	Bind Mounts	Docker Volumes
Transparency	Standard directories	Opaque blobs
Backup	rsync, restic, tar	docker cp required
Recovery	Host filesystem tools	Volume commands
Permissions	Standard Unix perms	Volume driver dependent

Decision: Bind mounts to /var/lib/vscode/<user>/ for simplicity and backup compatibility.

Authentication

Option	Pros	Cons
code-server password	Simple, per-user	Manual password management
nginx basic auth	Centralized	WebSocket conflicts, breaks PWA
OAuth proxy	SSO, enterprise	Complexity, RAM overhead

Decision: code-server password auth, managed via sops-nix. nginx handles HTTPS only.

Resource Planning

Per-Container Limits

Resource	Limit	Rationale
Memory (soft)	2.5GB	Normal operation headroom for VS Code + opencode
Memory (hard)	3GB	Comfortable for AI agent workloads, prevents OOM
CPU	1.5 cores	Fair share, prevent monopolization

Server Sizing

Users	RAM Required	CPU	Recommendation
1	~3.5GB (3GB container + system)	1-2	Tight on 2GB VPS
2-3	~7-10GB	2	Upgrade to 8GB
4-5	~12-16GB	2-4	Upgrade to 16GB

Action: Upgrade VPS to 8GB RAM before deployment (supports 2 users comfortably).

Storage Layout

/var/lib/vscode/
├── dan/
│   ├── workspace/          # Project files (bind mount → container /home/coder/project)
│   └── config/             # VS Code settings, extensions (bind mount → container ~/.local/share/code-server)
├── alice/
│   ├── workspace/
│   └── config/
└── ...

Backup Integration

Existing backup service (modules/backup.nix) can be extended:

# Add to backup script
tar czf "$TMP/vscode-workspaces.tar.gz" /var/lib/vscode/

NixOS Implementation

Module Structure

modules/
└── code-server-containers.nix    # New module

Configuration Interface

services.code-server-multi = {
  enable = true;

  users = {
    dan = {
      port = 8081;
      passwordFile = config.sops.secrets.code-server-dan.path;
      memoryLimit = "2G";
      cpuLimit = "1.5";
    };
    alice = {
      port = 8082;
      passwordFile = config.sops.secrets.code-server-alice.path;
    };
  };

  # Shared settings
  baseImage = "codercom/code-server:latest";  # Or custom image with Nix
  workspaceBase = "/var/lib/vscode";
};

Generated Resources

For each user, the module generates:

Podman container via virtualisation.oci-containers
Storage directories via systemd.tmpfiles.rules
nginx virtual host (<user>.code.clarun.xyz) with WebSocket support
sops secret reference for password

DNS requirement: Wildcard A record *.code.clarun.xyz → server IP (configured in Vultr DNS)

nginx Configuration

Per-user virtual hosts generated by module (one per user):

# Generated for each user (e.g., dan)
services.nginx.virtualHosts."dan.code.clarun.xyz" = {
  forceSSL = true;
  useACMEHost = "code.clarun.xyz";  # Wildcard cert

  locations."/" = {
    proxyPass = "http://127.0.0.1:8081";  # User's port
    proxyWebsockets = true;
    extraConfig = ''
      proxy_set_header Host $host;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
    '';
  };
};

# Wildcard cert for all subdomains
security.acme.certs."code.clarun.xyz" = {
  domain = "code.clarun.xyz";
  extraDomainNames = [ "*.code.clarun.xyz" ];
  dnsProvider = "vultr";  # Requires DNS-01 challenge for wildcard
  credentialsFile = config.sops.secrets.vultr-api-key.path;
};

# Catch-all for unknown subdomains
services.nginx.virtualHosts."*.code.clarun.xyz" = {
  useACMEHost = "code.clarun.xyz";
  locations."/" = {
    return = "404";
  };
};

Note: Wildcard certs require DNS-01 challenge (HTTP-01 won't work). Need Vultr API key for DNS automation.

API Key Management

opencode requires API keys for AI providers (Anthropic, OpenAI). Strategy for managing these in multi-user environment:

Phase 1: Shared Keys (MVP)

For 1-5 trusted users, inject shared API keys via environment variables:

# Per-user container gets keys from sops-nix
services.code-server-multi.users.dan = {
  # ... other config ...
  environment = {
    ANTHROPIC_API_KEY = config.sops.secrets.opencode-anthropic.path;
    OPENAI_API_KEY = config.sops.secrets.opencode-openai.path;
  };
};

Cost control at provider level:

Set monthly spend limits on API keys ($50-100/month)
Create project-specific keys for this use case
Monitor usage via provider dashboards

Pros: Simple, no additional infrastructure Cons: Users can see keys via env, no per-user tracking

Phase 2: Proxy with BYOK (Future)

If scale or cost becomes an issue:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Container    │────►│ API Proxy    │────►│ AI Provider  │
│ (opencode)   │     │ (host)       │     │              │
│              │     │ - Rate limit │     │              │
│ Base URL:    │     │ - Log usage  │     │              │
│ proxy:8080   │     │ - Add API key│     │              │
└──────────────┘     └──────────────┘     └──────────────┘

Options:

litellm: Proxy supporting multiple providers, usage tracking
Custom: Minimal proxy that adds keys and logs requests

Bring Your Own Key (BYOK): Users provide their own API keys, stored in their container's persistent config.

Decision: Phase 1 for MVP

For initial deployment with 1-5 users:

Shared keys injected via sops-nix environment variables
Per-key spend limits set at provider level (OpenAI: $50, Anthropic: $50)
Trust model: users are known/trusted, not adversarial
Re-evaluate when hitting limits or adding untrusted users

Port Forwarding

Phase 1: No User-Controlled Ports (MVP)

Users cannot expose their own web apps externally. Dev servers run inside container, accessible only via VS Code's built-in port forwarding (localhost within the browser session).

Rationale: Simplifies security model, avoids wildcard subdomain proliferation, reduces attack surface.

Phase 2: Platform-Controlled Ports (Future)

If needed, platform team can expose specific user apps:

# Per-user app subdomain (requires platform team to configure)
dan-app.code.clarun.xyz → container port 8080

# Or numbered ports per user
dan.code.clarun.xyz:8080 → container port 8080

Design consideration: Reserve subdomain/port space in DNS and nginx config for future expansion without architectural changes.

Security Model

Container Isolation

Threat	Mitigation
Filesystem escape	Bind mounts limit visible paths
Credential theft	Don't mount ~/.ssh, secrets
Host process access	Container namespaces
Resource exhaustion	Memory/CPU limits, OOM targets container
Network exfil	Possible future: network policy

What Containers Don't Prevent

Malicious code running inside container
Package supply chain attacks (npm, pip)
Data exfiltration via allowed network
Container escape via kernel vulnerability (rare)

Defense in Depth

Container: Limits blast radius
No host secrets: ~/.ssh, AWS creds not mounted
Resource limits: Can't fork bomb host
Easy reset: Nuke container, keep workspace
Backup: Restore workspace from backup if compromised

Image Strategy

Custom Image with opencode (Required)

Since we need opencode pre-installed, a custom image is required:

FROM codercom/code-server:latest

# Install opencode CLI
RUN curl -fsSL https://opencode.ai/install | bash

# Pre-install opencode VS Code extension (from Open VSX)
RUN code-server --install-extension sst-dev.opencode

# Install common language toolchains
RUN apt-get update && apt-get install -y \
    python3 python3-pip \
    nodejs npm \
    git \
    && rm -rf /var/lib/apt/lists/*

# Optional: Install Nix for on-demand packages
# RUN curl -L https://nixos.org/nix/install | sh
# ENV PATH="/root/.nix-profile/bin:$PATH"

Container Contents

Component	Purpose
code-server	VS Code in browser
opencode CLI	AI coding agent
sst-dev.opencode extension	VS Code integration for opencode
Python 3	Common language
Node.js	Common language
Git	Version control

Image Management

Options for keeping image updated:

Manual rebuild: Rebuild and redeploy periodically
CI/CD: Auto-rebuild on Dockerfile changes
Watchtower equivalent: Auto-pull new tags (risky for stability)

Decision: Manual rebuild initially, automate via CI later if needed.

Extension Pre-Installation

The opencode extension is available on Open VSX (required for code-server):

Registry: open-vsx.org/extension/sst-dev/opencode
Install command: code-server --install-extension sst-dev.opencode

Rollout Plan

Phase 1: Single User (SSH Tunnel)

Deploy one container for testing
Access via SSH tunnel only
Validate WebSocket, extensions, terminal
Test memory usage under load

Phase 2: nginx Integration

Add nginx reverse proxy route
Enable HTTPS via ACME
Test from external network
Validate PWA install works

Phase 3: Multi-User

Add additional users
Upgrade server RAM if needed
Test concurrent usage
Document onboarding

Phase 4: Hardening

Custom image with Nix (if needed)
Network policies (if needed)
Automated backup of workspaces
Monitoring/alerting

Open Questions

Domain: code.clarun.xyz or path under existing domain? → Resolved: Subdomain routing (dan.code.clarun.xyz)
API keys: How to provision opencode API keys (OpenAI, Anthropic, etc.) per user? → Resolved: Phase 1 shared keys via sops-nix, provider-level spend limits
Git credentials: How do users authenticate to git remotes? → Resolved: Deferred - local-only projects initially, add git auth in Phase 2 if needed
Onboarding docs: What documentation do non-programmers need?

References

code-server

opencode

Other

Tailscale code-server guide (for iPad/PWA patterns)

Appendix: Alternatives Considered

VS Code Remote SSH

Users run VS Code locally, SSH to server for compute.

Pros	Cons
Less server RAM (UI on laptop)	Not browser-only
Native VS Code experience	Requires local VS Code install
No container complexity	Less isolation
Better keyboard shortcuts	Higher barrier for non-programmers

Why not chosen: Non-programmer users need zero-install browser access.

openvscode-server (instead of code-server)

Factor	code-server	openvscode-server
Built-in auth	✅	❌
NixOS module	✅	❌
Maintenance	Active	Active

Why not chosen: code-server has built-in auth and better NixOS integration.

Coder Platform (instead of DIY)

Enterprise platform for provisioning dev environments.

Pros	Cons
Multi-user built-in	Terraform complexity
SSO, audit logs	Overkill for 1-5 users
Auto-shutdown	Designed for cloud provisioning

Why not chosen: We have existing infrastructure; Coder adds unnecessary complexity.

Terminal-Only (SSH + tmux + neovim)

Pros	Cons
Minimal resources	High learning curve
Power user friendly	Non-programmers excluded

Why not chosen: Must support non-programmer learners with GUI.

20 KiB Raw Blame History