ops-jrz1/specs/004-browser-dev-environment/design.md
Dan 8826d62bcc Add maubot integration and infrastructure updates
- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
2025-12-08 15:55:12 -08:00

20 KiB

Browser-Based Development Environment

Overview

Provide VS Code in the browser via code-server, with:

  • opencode AI coding agent pre-installed (CLI + VS Code extension)
  • Container-based isolation for security against LLM-generated code risks
  • Zero-setup experience for users of varying skill levels

User Personas

Persona Description Needs
Non-programmer Learning to code with AI assistance GUI-first, minimal friction, no terminal knowledge required
Programmer (testing) Evaluating AI coding tools Fast setup, full terminal access, multiple language support
Learner Learning AI-assisted dev or new languages Gentle on-ramp, room to grow, pre-configured tools

Requirements

Requirement Value
Users 1-5, separate workspaces
Inter-user isolation Not required
Security model Container sandbox per user
Access HTTPS via existing nginx
Persistence User workspaces survive restarts
AI tooling opencode pre-installed and configured

Architecture

Routing: Subdomain-based (dan.code.clarun.xyz) for clean isolation.

Path-based routing (/code/dan/) was considered but rejected:

  • VS Code extensions assume root path, break with subpaths
  • Cookie scoping issues across users
  • PWA installation fails
  • WebSocket URL construction breaks
                         ┌──────────────────────────────────────────┐
                         │              DNS (Vultr)                  │
                         │   *.code.clarun.xyz → 45.77.205.49       │
                         └────────────────────┬─────────────────────┘
                                              │
┌─────────────────────────────────────────────┴─────────────────────────────────┐
│                            nginx :443                                          │
│                   (wildcard ACME cert for *.code.clarun.xyz)                   │
└─────────────────────┬───────────────────────────────────────────────┬─────────┘
                      │                                               │
     ┌────────────────┼────────────────┬────────────────┐            │
     ▼                ▼                ▼                ▼            ▼
dan.code.        alice.code.      bob.code.       *.code.       clarun.xyz
clarun.xyz       clarun.xyz       clarun.xyz      clarun.xyz    (existing)
     │                │                │                │
     ▼                ▼                ▼                ▼
┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────────┐
│ Podman    │  │ Podman    │  │ Podman    │  │ 404 landing   │
│ Container │  │ Container │  │ Container │  │ page          │
│           │  │           │  │           │  │ (unknown user)│
│ code-     │  │ code-     │  │ code-     │  └───────────────┘
│ server    │  │ server    │  │ server    │
│ +opencode │  │ +opencode │  │ +opencode │
│ :8081     │  │ :8082     │  │ :8083     │
└─────┬─────┘  └─────┬─────┘  └─────┬─────┘
      │              │              │
      ▼              ▼              ▼
┌───────────┐  ┌───────────┐  ┌───────────┐
│ /var/lib/ │  │ /var/lib/ │  │ /var/lib/ │
│ vscode/   │  │ vscode/   │  │ vscode/   │
│ dan/      │  │ alice/    │  │ bob/      │
└───────────┘  └───────────┘  └───────────┘
   (bind mount)

User Experience Flow

User opens browser
       │
       ▼
┌─────────────────────────────────────────────────────────────────┐
│                     VS Code (in browser)                         │
│                                                                  │
│  ┌─────────────────────────┐  ┌──────────────────────────────┐  │
│  │      Editor Pane        │  │   opencode Panel             │  │
│  │                         │  │   (Ctrl+Esc to open)         │  │
│  │   [select code] ────────┼──►  Context auto-shared         │  │
│  │                         │  │                              │  │
│  │   ◄─────────────────────┼──   AI suggests/edits           │  │
│  │                         │  │                              │  │
│  └─────────────────────────┘  └──────────────────────────────┘  │
│                                                                  │
│  Keybindings:                                                    │
│  • Ctrl+Esc       → Open opencode in split terminal              │
│  • Ctrl+Shift+Esc → New opencode session                         │
│  • Alt+Ctrl+K     → Insert file reference (@File#L37-42)         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Technology Choices

code-server (not openvscode-server)

Factor code-server openvscode-server
Built-in auth Password Need proxy
Maintenance Active (Coder) Active (Gitpod)
NixOS module services.code-server Manual
Features More batteries Pure VS Code

Decision: code-server for built-in auth and NixOS integration.

Podman Rootless (not Docker)

Factor Podman Docker
Rootless Native ⚠️ Requires setup
Daemonless Yes dockerd required
NixOS integration virtualisation.oci-containers Also supported
Security Container root → unprivileged user Root unless configured

Decision: Podman rootless for better security defaults and systemd integration.

Bind Mounts (not Docker volumes)

Factor Bind Mounts Docker Volumes
Transparency Standard directories Opaque blobs
Backup rsync, restic, tar docker cp required
Recovery Host filesystem tools Volume commands
Permissions Standard Unix perms Volume driver dependent

Decision: Bind mounts to /var/lib/vscode/<user>/ for simplicity and backup compatibility.

Authentication

Option Pros Cons
code-server password Simple, per-user Manual password management
nginx basic auth Centralized WebSocket conflicts, breaks PWA
OAuth proxy SSO, enterprise Complexity, RAM overhead

Decision: code-server password auth, managed via sops-nix. nginx handles HTTPS only.

Resource Planning

Per-Container Limits

Resource Limit Rationale
Memory (soft) 2.5GB Normal operation headroom for VS Code + opencode
Memory (hard) 3GB Comfortable for AI agent workloads, prevents OOM
CPU 1.5 cores Fair share, prevent monopolization

Server Sizing

Users RAM Required CPU Recommendation
1 ~3.5GB (3GB container + system) 1-2 Tight on 2GB VPS
2-3 ~7-10GB 2 Upgrade to 8GB
4-5 ~12-16GB 2-4 Upgrade to 16GB

Action: Upgrade VPS to 8GB RAM before deployment (supports 2 users comfortably).

Storage Layout

/var/lib/vscode/
├── dan/
│   ├── workspace/          # Project files (bind mount → container /home/coder/project)
│   └── config/             # VS Code settings, extensions (bind mount → container ~/.local/share/code-server)
├── alice/
│   ├── workspace/
│   └── config/
└── ...

Backup Integration

Existing backup service (modules/backup.nix) can be extended:

# Add to backup script
tar czf "$TMP/vscode-workspaces.tar.gz" /var/lib/vscode/

NixOS Implementation

Module Structure

modules/
└── code-server-containers.nix    # New module

Configuration Interface

services.code-server-multi = {
  enable = true;

  users = {
    dan = {
      port = 8081;
      passwordFile = config.sops.secrets.code-server-dan.path;
      memoryLimit = "2G";
      cpuLimit = "1.5";
    };
    alice = {
      port = 8082;
      passwordFile = config.sops.secrets.code-server-alice.path;
    };
  };

  # Shared settings
  baseImage = "codercom/code-server:latest";  # Or custom image with Nix
  workspaceBase = "/var/lib/vscode";
};

Generated Resources

For each user, the module generates:

  1. Podman container via virtualisation.oci-containers
  2. Storage directories via systemd.tmpfiles.rules
  3. nginx virtual host (<user>.code.clarun.xyz) with WebSocket support
  4. sops secret reference for password

DNS requirement: Wildcard A record *.code.clarun.xyz → server IP (configured in Vultr DNS)

nginx Configuration

Per-user virtual hosts generated by module (one per user):

# Generated for each user (e.g., dan)
services.nginx.virtualHosts."dan.code.clarun.xyz" = {
  forceSSL = true;
  useACMEHost = "code.clarun.xyz";  # Wildcard cert

  locations."/" = {
    proxyPass = "http://127.0.0.1:8081";  # User's port
    proxyWebsockets = true;
    extraConfig = ''
      proxy_set_header Host $host;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
    '';
  };
};

# Wildcard cert for all subdomains
security.acme.certs."code.clarun.xyz" = {
  domain = "code.clarun.xyz";
  extraDomainNames = [ "*.code.clarun.xyz" ];
  dnsProvider = "vultr";  # Requires DNS-01 challenge for wildcard
  credentialsFile = config.sops.secrets.vultr-api-key.path;
};

# Catch-all for unknown subdomains
services.nginx.virtualHosts."*.code.clarun.xyz" = {
  useACMEHost = "code.clarun.xyz";
  locations."/" = {
    return = "404";
  };
};

Note: Wildcard certs require DNS-01 challenge (HTTP-01 won't work). Need Vultr API key for DNS automation.

API Key Management

opencode requires API keys for AI providers (Anthropic, OpenAI). Strategy for managing these in multi-user environment:

Phase 1: Shared Keys (MVP)

For 1-5 trusted users, inject shared API keys via environment variables:

# Per-user container gets keys from sops-nix
services.code-server-multi.users.dan = {
  # ... other config ...
  environment = {
    ANTHROPIC_API_KEY = config.sops.secrets.opencode-anthropic.path;
    OPENAI_API_KEY = config.sops.secrets.opencode-openai.path;
  };
};

Cost control at provider level:

  • Set monthly spend limits on API keys ($50-100/month)
  • Create project-specific keys for this use case
  • Monitor usage via provider dashboards

Pros: Simple, no additional infrastructure Cons: Users can see keys via env, no per-user tracking

Phase 2: Proxy with BYOK (Future)

If scale or cost becomes an issue:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Container    │────►│ API Proxy    │────►│ AI Provider  │
│ (opencode)   │     │ (host)       │     │              │
│              │     │ - Rate limit │     │              │
│ Base URL:    │     │ - Log usage  │     │              │
│ proxy:8080   │     │ - Add API key│     │              │
└──────────────┘     └──────────────┘     └──────────────┘

Options:

  • litellm: Proxy supporting multiple providers, usage tracking
  • Custom: Minimal proxy that adds keys and logs requests

Bring Your Own Key (BYOK): Users provide their own API keys, stored in their container's persistent config.

Decision: Phase 1 for MVP

For initial deployment with 1-5 users:

  1. Shared keys injected via sops-nix environment variables
  2. Per-key spend limits set at provider level (OpenAI: $50, Anthropic: $50)
  3. Trust model: users are known/trusted, not adversarial
  4. Re-evaluate when hitting limits or adding untrusted users

Port Forwarding

Phase 1: No User-Controlled Ports (MVP)

Users cannot expose their own web apps externally. Dev servers run inside container, accessible only via VS Code's built-in port forwarding (localhost within the browser session).

Rationale: Simplifies security model, avoids wildcard subdomain proliferation, reduces attack surface.

Phase 2: Platform-Controlled Ports (Future)

If needed, platform team can expose specific user apps:

# Per-user app subdomain (requires platform team to configure)
dan-app.code.clarun.xyz → container port 8080

# Or numbered ports per user
dan.code.clarun.xyz:8080 → container port 8080

Design consideration: Reserve subdomain/port space in DNS and nginx config for future expansion without architectural changes.

Security Model

Container Isolation

Threat Mitigation
Filesystem escape Bind mounts limit visible paths
Credential theft Don't mount ~/.ssh, secrets
Host process access Container namespaces
Resource exhaustion Memory/CPU limits, OOM targets container
Network exfil Possible future: network policy

What Containers Don't Prevent

  • Malicious code running inside container
  • Package supply chain attacks (npm, pip)
  • Data exfiltration via allowed network
  • Container escape via kernel vulnerability (rare)

Defense in Depth

  1. Container: Limits blast radius
  2. No host secrets: ~/.ssh, AWS creds not mounted
  3. Resource limits: Can't fork bomb host
  4. Easy reset: Nuke container, keep workspace
  5. Backup: Restore workspace from backup if compromised

Image Strategy

Custom Image with opencode (Required)

Since we need opencode pre-installed, a custom image is required:

FROM codercom/code-server:latest

# Install opencode CLI
RUN curl -fsSL https://opencode.ai/install | bash

# Pre-install opencode VS Code extension (from Open VSX)
RUN code-server --install-extension sst-dev.opencode

# Install common language toolchains
RUN apt-get update && apt-get install -y \
    python3 python3-pip \
    nodejs npm \
    git \
    && rm -rf /var/lib/apt/lists/*

# Optional: Install Nix for on-demand packages
# RUN curl -L https://nixos.org/nix/install | sh
# ENV PATH="/root/.nix-profile/bin:$PATH"

Container Contents

Component Purpose
code-server VS Code in browser
opencode CLI AI coding agent
sst-dev.opencode extension VS Code integration for opencode
Python 3 Common language
Node.js Common language
Git Version control

Image Management

Options for keeping image updated:

  1. Manual rebuild: Rebuild and redeploy periodically
  2. CI/CD: Auto-rebuild on Dockerfile changes
  3. Watchtower equivalent: Auto-pull new tags (risky for stability)

Decision: Manual rebuild initially, automate via CI later if needed.

Extension Pre-Installation

The opencode extension is available on Open VSX (required for code-server):

Rollout Plan

Phase 1: Single User (SSH Tunnel)

  1. Deploy one container for testing
  2. Access via SSH tunnel only
  3. Validate WebSocket, extensions, terminal
  4. Test memory usage under load

Phase 2: nginx Integration

  1. Add nginx reverse proxy route
  2. Enable HTTPS via ACME
  3. Test from external network
  4. Validate PWA install works

Phase 3: Multi-User

  1. Add additional users
  2. Upgrade server RAM if needed
  3. Test concurrent usage
  4. Document onboarding

Phase 4: Hardening

  1. Custom image with Nix (if needed)
  2. Network policies (if needed)
  3. Automated backup of workspaces
  4. Monitoring/alerting

Open Questions

  1. Domain: code.clarun.xyz or path under existing domain? → Resolved: Subdomain routing (dan.code.clarun.xyz)
  2. API keys: How to provision opencode API keys (OpenAI, Anthropic, etc.) per user? → Resolved: Phase 1 shared keys via sops-nix, provider-level spend limits
  3. Git credentials: How do users authenticate to git remotes? → Resolved: Deferred - local-only projects initially, add git auth in Phase 2 if needed
  4. Onboarding docs: What documentation do non-programmers need?

References

code-server

opencode

Other

Appendix: Alternatives Considered

VS Code Remote SSH

Users run VS Code locally, SSH to server for compute.

Pros Cons
Less server RAM (UI on laptop) Not browser-only
Native VS Code experience Requires local VS Code install
No container complexity Less isolation
Better keyboard shortcuts Higher barrier for non-programmers

Why not chosen: Non-programmer users need zero-install browser access.

openvscode-server (instead of code-server)

Factor code-server openvscode-server
Built-in auth
NixOS module
Maintenance Active Active

Why not chosen: code-server has built-in auth and better NixOS integration.

Coder Platform (instead of DIY)

Enterprise platform for provisioning dev environments.

Pros Cons
Multi-user built-in Terraform complexity
SSO, audit logs Overkill for 1-5 users
Auto-shutdown Designed for cloud provisioning

Why not chosen: We have existing infrastructure; Coder adds unnecessary complexity.

Terminal-Only (SSH + tmux + neovim)

Pros Cons
Minimal resources High learning curve
Power user friendly Non-programmers excluded

Why not chosen: Must support non-programmer learners with GUI.