ops-jrz1/specs/004-browser-dev-environment/design.md
Dan 8826d62bcc Add maubot integration and infrastructure updates
- maubot.nix: Declarative bot framework with plugin deployment
- backup.nix: Local backup service for Matrix/bridge data
- sna-instagram-bot: Instagram content bridge plugin
- beads: Issue tracking workflow integrated
- spec 004: Browser-based dev environment design
- nixpkgs bump: Oct 22 → Dec 2
- Fix maubot health check (401 = healthy)
2025-12-08 15:55:12 -08:00

533 lines
20 KiB
Markdown

# Browser-Based Development Environment
## Overview
Provide VS Code in the browser via code-server, with:
- **opencode** AI coding agent pre-installed (CLI + VS Code extension)
- Container-based isolation for security against LLM-generated code risks
- Zero-setup experience for users of varying skill levels
## User Personas
| Persona | Description | Needs |
|---------|-------------|-------|
| **Non-programmer** | Learning to code with AI assistance | GUI-first, minimal friction, no terminal knowledge required |
| **Programmer (testing)** | Evaluating AI coding tools | Fast setup, full terminal access, multiple language support |
| **Learner** | Learning AI-assisted dev or new languages | Gentle on-ramp, room to grow, pre-configured tools |
## Requirements
| Requirement | Value |
|-------------|-------|
| Users | 1-5, separate workspaces |
| Inter-user isolation | Not required |
| Security model | Container sandbox per user |
| Access | HTTPS via existing nginx |
| Persistence | User workspaces survive restarts |
| AI tooling | opencode pre-installed and configured |
## Architecture
**Routing**: Subdomain-based (`dan.code.clarun.xyz`) for clean isolation.
Path-based routing (`/code/dan/`) was considered but rejected:
- VS Code extensions assume root path, break with subpaths
- Cookie scoping issues across users
- PWA installation fails
- WebSocket URL construction breaks
```
┌──────────────────────────────────────────┐
│ DNS (Vultr) │
│ *.code.clarun.xyz → 45.77.205.49 │
└────────────────────┬─────────────────────┘
┌─────────────────────────────────────────────┴─────────────────────────────────┐
│ nginx :443 │
│ (wildcard ACME cert for *.code.clarun.xyz) │
└─────────────────────┬───────────────────────────────────────────────┬─────────┘
│ │
┌────────────────┼────────────────┬────────────────┐ │
▼ ▼ ▼ ▼ ▼
dan.code. alice.code. bob.code. *.code. clarun.xyz
clarun.xyz clarun.xyz clarun.xyz clarun.xyz (existing)
│ │ │ │
▼ ▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────────┐
│ Podman │ │ Podman │ │ Podman │ │ 404 landing │
│ Container │ │ Container │ │ Container │ │ page │
│ │ │ │ │ │ │ (unknown user)│
│ code- │ │ code- │ │ code- │ └───────────────┘
│ server │ │ server │ │ server │
│ +opencode │ │ +opencode │ │ +opencode │
│ :8081 │ │ :8082 │ │ :8083 │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ /var/lib/ │ │ /var/lib/ │ │ /var/lib/ │
│ vscode/ │ │ vscode/ │ │ vscode/ │
│ dan/ │ │ alice/ │ │ bob/ │
└───────────┘ └───────────┘ └───────────┘
(bind mount)
```
### User Experience Flow
```
User opens browser
┌─────────────────────────────────────────────────────────────────┐
│ VS Code (in browser) │
│ │
│ ┌─────────────────────────┐ ┌──────────────────────────────┐ │
│ │ Editor Pane │ │ opencode Panel │ │
│ │ │ │ (Ctrl+Esc to open) │ │
│ │ [select code] ────────┼──► Context auto-shared │ │
│ │ │ │ │ │
│ │ ◄─────────────────────┼── AI suggests/edits │ │
│ │ │ │ │ │
│ └─────────────────────────┘ └──────────────────────────────┘ │
│ │
│ Keybindings: │
│ • Ctrl+Esc → Open opencode in split terminal │
│ • Ctrl+Shift+Esc → New opencode session │
│ • Alt+Ctrl+K → Insert file reference (@File#L37-42) │
│ │
└─────────────────────────────────────────────────────────────────┘
```
## Technology Choices
### code-server (not openvscode-server)
| Factor | code-server | openvscode-server |
|--------|-------------|-------------------|
| Built-in auth | ✅ Password | ❌ Need proxy |
| Maintenance | Active (Coder) | Active (Gitpod) |
| NixOS module | ✅ `services.code-server` | ❌ Manual |
| Features | More batteries | Pure VS Code |
**Decision**: code-server for built-in auth and NixOS integration.
### Podman Rootless (not Docker)
| Factor | Podman | Docker |
|--------|--------|--------|
| Rootless | ✅ Native | ⚠️ Requires setup |
| Daemonless | ✅ Yes | ❌ dockerd required |
| NixOS integration | ✅ `virtualisation.oci-containers` | ✅ Also supported |
| Security | Container root → unprivileged user | Root unless configured |
**Decision**: Podman rootless for better security defaults and systemd integration.
### Bind Mounts (not Docker volumes)
| Factor | Bind Mounts | Docker Volumes |
|--------|-------------|----------------|
| Transparency | Standard directories | Opaque blobs |
| Backup | rsync, restic, tar | docker cp required |
| Recovery | Host filesystem tools | Volume commands |
| Permissions | Standard Unix perms | Volume driver dependent |
**Decision**: Bind mounts to `/var/lib/vscode/<user>/` for simplicity and backup compatibility.
### Authentication
| Option | Pros | Cons |
|--------|------|------|
| code-server password | Simple, per-user | Manual password management |
| nginx basic auth | Centralized | WebSocket conflicts, breaks PWA |
| OAuth proxy | SSO, enterprise | Complexity, RAM overhead |
**Decision**: code-server password auth, managed via sops-nix. nginx handles HTTPS only.
## Resource Planning
### Per-Container Limits
| Resource | Limit | Rationale |
|----------|-------|-----------|
| Memory (soft) | 2.5GB | Normal operation headroom for VS Code + opencode |
| Memory (hard) | 3GB | Comfortable for AI agent workloads, prevents OOM |
| CPU | 1.5 cores | Fair share, prevent monopolization |
### Server Sizing
| Users | RAM Required | CPU | Recommendation |
|-------|--------------|-----|----------------|
| 1 | ~3.5GB (3GB container + system) | 1-2 | Tight on 2GB VPS |
| 2-3 | ~7-10GB | 2 | Upgrade to 8GB |
| 4-5 | ~12-16GB | 2-4 | Upgrade to 16GB |
**Action**: Upgrade VPS to 8GB RAM before deployment (supports 2 users comfortably).
## Storage Layout
```
/var/lib/vscode/
├── dan/
│ ├── workspace/ # Project files (bind mount → container /home/coder/project)
│ └── config/ # VS Code settings, extensions (bind mount → container ~/.local/share/code-server)
├── alice/
│ ├── workspace/
│ └── config/
└── ...
```
### Backup Integration
Existing backup service (`modules/backup.nix`) can be extended:
```bash
# Add to backup script
tar czf "$TMP/vscode-workspaces.tar.gz" /var/lib/vscode/
```
## NixOS Implementation
### Module Structure
```
modules/
└── code-server-containers.nix # New module
```
### Configuration Interface
```nix
services.code-server-multi = {
enable = true;
users = {
dan = {
port = 8081;
passwordFile = config.sops.secrets.code-server-dan.path;
memoryLimit = "2G";
cpuLimit = "1.5";
};
alice = {
port = 8082;
passwordFile = config.sops.secrets.code-server-alice.path;
};
};
# Shared settings
baseImage = "codercom/code-server:latest"; # Or custom image with Nix
workspaceBase = "/var/lib/vscode";
};
```
### Generated Resources
For each user, the module generates:
1. **Podman container** via `virtualisation.oci-containers`
2. **Storage directories** via `systemd.tmpfiles.rules`
3. **nginx virtual host** (`<user>.code.clarun.xyz`) with WebSocket support
4. **sops secret** reference for password
**DNS requirement**: Wildcard A record `*.code.clarun.xyz` → server IP (configured in Vultr DNS)
### nginx Configuration
Per-user virtual hosts generated by module (one per user):
```nix
# Generated for each user (e.g., dan)
services.nginx.virtualHosts."dan.code.clarun.xyz" = {
forceSSL = true;
useACMEHost = "code.clarun.xyz"; # Wildcard cert
locations."/" = {
proxyPass = "http://127.0.0.1:8081"; # User's port
proxyWebsockets = true;
extraConfig = ''
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection upgrade;
proxy_set_header Accept-Encoding gzip;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
'';
};
};
# Wildcard cert for all subdomains
security.acme.certs."code.clarun.xyz" = {
domain = "code.clarun.xyz";
extraDomainNames = [ "*.code.clarun.xyz" ];
dnsProvider = "vultr"; # Requires DNS-01 challenge for wildcard
credentialsFile = config.sops.secrets.vultr-api-key.path;
};
# Catch-all for unknown subdomains
services.nginx.virtualHosts."*.code.clarun.xyz" = {
useACMEHost = "code.clarun.xyz";
locations."/" = {
return = "404";
};
};
```
**Note**: Wildcard certs require DNS-01 challenge (HTTP-01 won't work). Need Vultr API key for DNS automation.
## API Key Management
opencode requires API keys for AI providers (Anthropic, OpenAI). Strategy for managing these in multi-user environment:
### Phase 1: Shared Keys (MVP)
For 1-5 trusted users, inject shared API keys via environment variables:
```nix
# Per-user container gets keys from sops-nix
services.code-server-multi.users.dan = {
# ... other config ...
environment = {
ANTHROPIC_API_KEY = config.sops.secrets.opencode-anthropic.path;
OPENAI_API_KEY = config.sops.secrets.opencode-openai.path;
};
};
```
**Cost control at provider level:**
- Set monthly spend limits on API keys ($50-100/month)
- Create project-specific keys for this use case
- Monitor usage via provider dashboards
**Pros**: Simple, no additional infrastructure
**Cons**: Users can see keys via `env`, no per-user tracking
### Phase 2: Proxy with BYOK (Future)
If scale or cost becomes an issue:
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Container │────►│ API Proxy │────►│ AI Provider │
│ (opencode) │ │ (host) │ │ │
│ │ │ - Rate limit │ │ │
│ Base URL: │ │ - Log usage │ │ │
│ proxy:8080 │ │ - Add API key│ │ │
└──────────────┘ └──────────────┘ └──────────────┘
```
Options:
- **litellm**: Proxy supporting multiple providers, usage tracking
- **Custom**: Minimal proxy that adds keys and logs requests
**Bring Your Own Key (BYOK)**: Users provide their own API keys, stored in their container's persistent config.
### Decision: Phase 1 for MVP
For initial deployment with 1-5 users:
1. Shared keys injected via sops-nix environment variables
2. Per-key spend limits set at provider level (OpenAI: $50, Anthropic: $50)
3. Trust model: users are known/trusted, not adversarial
4. Re-evaluate when hitting limits or adding untrusted users
## Port Forwarding
### Phase 1: No User-Controlled Ports (MVP)
Users cannot expose their own web apps externally. Dev servers run inside container, accessible only via VS Code's built-in port forwarding (localhost within the browser session).
**Rationale**: Simplifies security model, avoids wildcard subdomain proliferation, reduces attack surface.
### Phase 2: Platform-Controlled Ports (Future)
If needed, platform team can expose specific user apps:
```
# Per-user app subdomain (requires platform team to configure)
dan-app.code.clarun.xyz → container port 8080
# Or numbered ports per user
dan.code.clarun.xyz:8080 → container port 8080
```
**Design consideration**: Reserve subdomain/port space in DNS and nginx config for future expansion without architectural changes.
## Security Model
### Container Isolation
| Threat | Mitigation |
|--------|------------|
| Filesystem escape | Bind mounts limit visible paths |
| Credential theft | Don't mount ~/.ssh, secrets |
| Host process access | Container namespaces |
| Resource exhaustion | Memory/CPU limits, OOM targets container |
| Network exfil | Possible future: network policy |
### What Containers Don't Prevent
- Malicious code running inside container
- Package supply chain attacks (npm, pip)
- Data exfiltration via allowed network
- Container escape via kernel vulnerability (rare)
### Defense in Depth
1. **Container**: Limits blast radius
2. **No host secrets**: ~/.ssh, AWS creds not mounted
3. **Resource limits**: Can't fork bomb host
4. **Easy reset**: Nuke container, keep workspace
5. **Backup**: Restore workspace from backup if compromised
## Image Strategy
### Custom Image with opencode (Required)
Since we need opencode pre-installed, a custom image is required:
```dockerfile
FROM codercom/code-server:latest
# Install opencode CLI
RUN curl -fsSL https://opencode.ai/install | bash
# Pre-install opencode VS Code extension (from Open VSX)
RUN code-server --install-extension sst-dev.opencode
# Install common language toolchains
RUN apt-get update && apt-get install -y \
python3 python3-pip \
nodejs npm \
git \
&& rm -rf /var/lib/apt/lists/*
# Optional: Install Nix for on-demand packages
# RUN curl -L https://nixos.org/nix/install | sh
# ENV PATH="/root/.nix-profile/bin:$PATH"
```
### Container Contents
| Component | Purpose |
|-----------|---------|
| code-server | VS Code in browser |
| opencode CLI | AI coding agent |
| sst-dev.opencode extension | VS Code integration for opencode |
| Python 3 | Common language |
| Node.js | Common language |
| Git | Version control |
### Image Management
Options for keeping image updated:
1. **Manual rebuild**: Rebuild and redeploy periodically
2. **CI/CD**: Auto-rebuild on Dockerfile changes
3. **Watchtower equivalent**: Auto-pull new tags (risky for stability)
**Decision**: Manual rebuild initially, automate via CI later if needed.
### Extension Pre-Installation
The opencode extension is available on Open VSX (required for code-server):
- Registry: [open-vsx.org/extension/sst-dev/opencode](https://open-vsx.org/extension/sst-dev/opencode)
- Install command: `code-server --install-extension sst-dev.opencode`
## Rollout Plan
### Phase 1: Single User (SSH Tunnel)
1. Deploy one container for testing
2. Access via SSH tunnel only
3. Validate WebSocket, extensions, terminal
4. Test memory usage under load
### Phase 2: nginx Integration
1. Add nginx reverse proxy route
2. Enable HTTPS via ACME
3. Test from external network
4. Validate PWA install works
### Phase 3: Multi-User
1. Add additional users
2. Upgrade server RAM if needed
3. Test concurrent usage
4. Document onboarding
### Phase 4: Hardening
1. Custom image with Nix (if needed)
2. Network policies (if needed)
3. Automated backup of workspaces
4. Monitoring/alerting
## Open Questions
1. ~~**Domain**: `code.clarun.xyz` or path under existing domain?~~ → Resolved: Subdomain routing (`dan.code.clarun.xyz`)
2. ~~**API keys**: How to provision opencode API keys (OpenAI, Anthropic, etc.) per user?~~ → Resolved: Phase 1 shared keys via sops-nix, provider-level spend limits
3. ~~**Git credentials**: How do users authenticate to git remotes?~~ → Resolved: Deferred - local-only projects initially, add git auth in Phase 2 if needed
4. **Onboarding docs**: What documentation do non-programmers need?
## References
### code-server
- [code-server GitHub](https://github.com/coder/code-server)
- [code-server multi-user blog](https://coder.com/blog/code-server-multiple-users)
- [NixOS oci-containers](https://nixos.wiki/wiki/Podman)
### opencode
- [opencode.ai](https://opencode.ai/)
- [opencode GitHub](https://github.com/sst/opencode)
- [opencode VS Code extension (Open VSX)](https://open-vsx.org/extension/sst-dev/opencode)
- [opencode VS Code extension (MS Marketplace)](https://marketplace.visualstudio.com/items?itemName=sst-dev.opencode)
### Other
- [Tailscale code-server guide](https://tailscale.com/kb/1166/vscode-ipad) (for iPad/PWA patterns)
## Appendix: Alternatives Considered
### VS Code Remote SSH
Users run VS Code locally, SSH to server for compute.
| Pros | Cons |
|------|------|
| Less server RAM (UI on laptop) | Not browser-only |
| Native VS Code experience | Requires local VS Code install |
| No container complexity | Less isolation |
| Better keyboard shortcuts | Higher barrier for non-programmers |
**Why not chosen**: Non-programmer users need zero-install browser access.
### openvscode-server (instead of code-server)
| Factor | code-server | openvscode-server |
|--------|-------------|-------------------|
| Built-in auth | ✅ | ❌ |
| NixOS module | ✅ | ❌ |
| Maintenance | Active | Active |
**Why not chosen**: code-server has built-in auth and better NixOS integration.
### Coder Platform (instead of DIY)
Enterprise platform for provisioning dev environments.
| Pros | Cons |
|------|------|
| Multi-user built-in | Terraform complexity |
| SSO, audit logs | Overkill for 1-5 users |
| Auto-shutdown | Designed for cloud provisioning |
**Why not chosen**: We have existing infrastructure; Coder adds unnecessary complexity.
### Terminal-Only (SSH + tmux + neovim)
| Pros | Cons |
|------|------|
| Minimal resources | High learning curve |
| Power user friendly | Non-programmers excluded |
**Why not chosen**: Must support non-programmer learners with GUI.