ops-jrz1/specs/004-browser-dev-environment/design.md

# Browser-Based Development Environment

## Overview

Provide VS Code in the browser via code-server, with:
- **opencode** AI coding agent pre-installed (CLI + VS Code extension)
- Container-based isolation for security against LLM-generated code risks
- Zero-setup experience for users of varying skill levels

## User Personas

| Persona | Description | Needs |
|---------|-------------|-------|
| **Non-programmer** | Learning to code with AI assistance | GUI-first, minimal friction, no terminal knowledge required |
| **Programmer (testing)** | Evaluating AI coding tools | Fast setup, full terminal access, multiple language support |
| **Learner** | Learning AI-assisted dev or new languages | Gentle on-ramp, room to grow, pre-configured tools |

## Requirements

| Requirement | Value |
|-------------|-------|
| Users | 1-5, separate workspaces |
| Inter-user isolation | Not required |
| Security model | Container sandbox per user |
| Access | HTTPS via existing nginx |
| Persistence | User workspaces survive restarts |
| AI tooling | opencode pre-installed and configured |

## Architecture

**Routing**: Subdomain-based (`dan.code.clarun.xyz`) for clean isolation.

Path-based routing (`/code/dan/`) was considered but rejected:
- VS Code extensions assume root path, break with subpaths
- Cookie scoping issues across users
- PWA installation fails
- WebSocket URL construction breaks

```
                         ┌──────────────────────────────────────────┐
                         │              DNS (Vultr)                  │
                         │   *.code.clarun.xyz → 45.77.205.49       │
                         └────────────────────┬─────────────────────┘
                                              │
┌─────────────────────────────────────────────┴─────────────────────────────────┐
│                            nginx :443                                          │
│                   (wildcard ACME cert for *.code.clarun.xyz)                   │
└─────────────────────┬───────────────────────────────────────────────┬─────────┘
                      │                                               │
     ┌────────────────┼────────────────┬────────────────┐            │
     ▼                ▼                ▼                ▼            ▼
dan.code.        alice.code.      bob.code.       *.code.       clarun.xyz
clarun.xyz       clarun.xyz       clarun.xyz      clarun.xyz    (existing)
     │                │                │                │
     ▼                ▼                ▼                ▼
┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────────┐
│ Podman    │  │ Podman    │  │ Podman    │  │ 404 landing   │
│ Container │  │ Container │  │ Container │  │ page          │
│           │  │           │  │           │  │ (unknown user)│
│ code-     │  │ code-     │  │ code-     │  └───────────────┘
│ server    │  │ server    │  │ server    │
│ +opencode │  │ +opencode │  │ +opencode │
│ :8081     │  │ :8082     │  │ :8083     │
└─────┬─────┘  └─────┬─────┘  └─────┬─────┘
      │              │              │
      ▼              ▼              ▼
┌───────────┐  ┌───────────┐  ┌───────────┐
│ /var/lib/ │  │ /var/lib/ │  │ /var/lib/ │
│ vscode/   │  │ vscode/   │  │ vscode/   │
│ dan/      │  │ alice/    │  │ bob/      │
└───────────┘  └───────────┘  └───────────┘
   (bind mount)
```

### User Experience Flow

```
User opens browser
       │
       ▼
┌─────────────────────────────────────────────────────────────────┐
│                     VS Code (in browser)                         │
│                                                                  │
│  ┌─────────────────────────┐  ┌──────────────────────────────┐  │
│  │      Editor Pane        │  │   opencode Panel             │  │
│  │                         │  │   (Ctrl+Esc to open)         │  │
│  │   [select code] ────────┼──►  Context auto-shared         │  │
│  │                         │  │                              │  │
│  │   ◄─────────────────────┼──   AI suggests/edits           │  │
│  │                         │  │                              │  │
│  └─────────────────────────┘  └──────────────────────────────┘  │
│                                                                  │
│  Keybindings:                                                    │
│  • Ctrl+Esc       → Open opencode in split terminal              │
│  • Ctrl+Shift+Esc → New opencode session                         │
│  • Alt+Ctrl+K     → Insert file reference (@File#L37-42)         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
```

## Technology Choices

### code-server (not openvscode-server)

| Factor | code-server | openvscode-server |
|--------|-------------|-------------------|
| Built-in auth | ✅ Password | ❌ Need proxy |
| Maintenance | Active (Coder) | Active (Gitpod) |
| NixOS module | ✅ `services.code-server` | ❌ Manual |
| Features | More batteries | Pure VS Code |

**Decision**: code-server for built-in auth and NixOS integration.

### Podman Rootless (not Docker)

| Factor | Podman | Docker |
|--------|--------|--------|
| Rootless | ✅ Native | ⚠️ Requires setup |
| Daemonless | ✅ Yes | ❌ dockerd required |
| NixOS integration | ✅ `virtualisation.oci-containers` | ✅ Also supported |
| Security | Container root → unprivileged user | Root unless configured |

**Decision**: Podman rootless for better security defaults and systemd integration.

### Bind Mounts (not Docker volumes)

| Factor | Bind Mounts | Docker Volumes |
|--------|-------------|----------------|
| Transparency | Standard directories | Opaque blobs |
| Backup | rsync, restic, tar | docker cp required |
| Recovery | Host filesystem tools | Volume commands |
| Permissions | Standard Unix perms | Volume driver dependent |

**Decision**: Bind mounts to `/var/lib/vscode/<user>/` for simplicity and backup compatibility.

### Authentication

| Option | Pros | Cons |
|--------|------|------|
| code-server password | Simple, per-user | Manual password management |
| nginx basic auth | Centralized | WebSocket conflicts, breaks PWA |
| OAuth proxy | SSO, enterprise | Complexity, RAM overhead |

**Decision**: code-server password auth, managed via sops-nix. nginx handles HTTPS only.

## Resource Planning

### Per-Container Limits

| Resource | Limit | Rationale |
|----------|-------|-----------|
| Memory (soft) | 2.5GB | Normal operation headroom for VS Code + opencode |
| Memory (hard) | 3GB | Comfortable for AI agent workloads, prevents OOM |
| CPU | 1.5 cores | Fair share, prevent monopolization |

### Server Sizing

| Users | RAM Required | CPU | Recommendation |
|-------|--------------|-----|----------------|
| 1 | ~3.5GB (3GB container + system) | 1-2 | Tight on 2GB VPS |
| 2-3 | ~7-10GB | 2 | Upgrade to 8GB |
| 4-5 | ~12-16GB | 2-4 | Upgrade to 16GB |

**Action**: Upgrade VPS to 8GB RAM before deployment (supports 2 users comfortably).

## Storage Layout

```
/var/lib/vscode/
├── dan/
│   ├── workspace/          # Project files (bind mount → container /home/coder/project)
│   └── config/             # VS Code settings, extensions (bind mount → container ~/.local/share/code-server)
├── alice/
│   ├── workspace/
│   └── config/
└── ...
```

### Backup Integration

Existing backup service (`modules/backup.nix`) can be extended:

```bash
# Add to backup script
tar czf "$TMP/vscode-workspaces.tar.gz" /var/lib/vscode/
```

## NixOS Implementation

### Module Structure

```
modules/
└── code-server-containers.nix    # New module
```

### Configuration Interface

```nix
services.code-server-multi = {
  enable = true;

  users = {
    dan = {
      port = 8081;
      passwordFile = config.sops.secrets.code-server-dan.path;
      memoryLimit = "2G";
      cpuLimit = "1.5";
    };
    alice = {
      port = 8082;
      passwordFile = config.sops.secrets.code-server-alice.path;
    };
  };

  # Shared settings
  baseImage = "codercom/code-server:latest";  # Or custom image with Nix
  workspaceBase = "/var/lib/vscode";
};
```

### Generated Resources

For each user, the module generates:

1. **Podman container** via `virtualisation.oci-containers`
2. **Storage directories** via `systemd.tmpfiles.rules`
3. **nginx virtual host** (`<user>.code.clarun.xyz`) with WebSocket support
4. **sops secret** reference for password

**DNS requirement**: Wildcard A record `*.code.clarun.xyz` → server IP (configured in Vultr DNS)

### nginx Configuration

Per-user virtual hosts generated by module (one per user):

```nix
# Generated for each user (e.g., dan)
services.nginx.virtualHosts."dan.code.clarun.xyz" = {
  forceSSL = true;
  useACMEHost = "code.clarun.xyz";  # Wildcard cert

  locations."/" = {
    proxyPass = "http://127.0.0.1:8081";  # User's port
    proxyWebsockets = true;
    extraConfig = ''
      proxy_set_header Host $host;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
    '';
  };
};

# Wildcard cert for all subdomains
security.acme.certs."code.clarun.xyz" = {
  domain = "code.clarun.xyz";
  extraDomainNames = [ "*.code.clarun.xyz" ];
  dnsProvider = "vultr";  # Requires DNS-01 challenge for wildcard
  credentialsFile = config.sops.secrets.vultr-api-key.path;
};

# Catch-all for unknown subdomains
services.nginx.virtualHosts."*.code.clarun.xyz" = {
  useACMEHost = "code.clarun.xyz";
  locations."/" = {
    return = "404";
  };
};
```

**Note**: Wildcard certs require DNS-01 challenge (HTTP-01 won't work). Need Vultr API key for DNS automation.

## API Key Management

opencode requires API keys for AI providers (Anthropic, OpenAI). Strategy for managing these in multi-user environment:

### Phase 1: Shared Keys (MVP)

For 1-5 trusted users, inject shared API keys via environment variables:

```nix
# Per-user container gets keys from sops-nix
services.code-server-multi.users.dan = {
  # ... other config ...
  environment = {
    ANTHROPIC_API_KEY = config.sops.secrets.opencode-anthropic.path;
    OPENAI_API_KEY = config.sops.secrets.opencode-openai.path;
  };
};
```

**Cost control at provider level:**
- Set monthly spend limits on API keys ($50-100/month)
- Create project-specific keys for this use case
- Monitor usage via provider dashboards

**Pros**: Simple, no additional infrastructure
**Cons**: Users can see keys via `env`, no per-user tracking

### Phase 2: Proxy with BYOK (Future)

If scale or cost becomes an issue:

```
┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Container    │────►│ API Proxy    │────►│ AI Provider  │
│ (opencode)   │     │ (host)       │     │              │
│              │     │ - Rate limit │     │              │
│ Base URL:    │     │ - Log usage  │     │              │
│ proxy:8080   │     │ - Add API key│     │              │
└──────────────┘     └──────────────┘     └──────────────┘
```

Options:
- **litellm**: Proxy supporting multiple providers, usage tracking
- **Custom**: Minimal proxy that adds keys and logs requests

**Bring Your Own Key (BYOK)**: Users provide their own API keys, stored in their container's persistent config.

### Decision: Phase 1 for MVP

For initial deployment with 1-5 users:
1. Shared keys injected via sops-nix environment variables
2. Per-key spend limits set at provider level (OpenAI: $50, Anthropic: $50)
3. Trust model: users are known/trusted, not adversarial
4. Re-evaluate when hitting limits or adding untrusted users

## Port Forwarding

### Phase 1: No User-Controlled Ports (MVP)

Users cannot expose their own web apps externally. Dev servers run inside container, accessible only via VS Code's built-in port forwarding (localhost within the browser session).

**Rationale**: Simplifies security model, avoids wildcard subdomain proliferation, reduces attack surface.

### Phase 2: Platform-Controlled Ports (Future)

If needed, platform team can expose specific user apps:

```
# Per-user app subdomain (requires platform team to configure)
dan-app.code.clarun.xyz → container port 8080

# Or numbered ports per user
dan.code.clarun.xyz:8080 → container port 8080
```

**Design consideration**: Reserve subdomain/port space in DNS and nginx config for future expansion without architectural changes.

## Security Model

### Container Isolation

| Threat | Mitigation |
|--------|------------|
| Filesystem escape | Bind mounts limit visible paths |
| Credential theft | Don't mount ~/.ssh, secrets |
| Host process access | Container namespaces |
| Resource exhaustion | Memory/CPU limits, OOM targets container |
| Network exfil | Possible future: network policy |

### What Containers Don't Prevent

- Malicious code running inside container
- Package supply chain attacks (npm, pip)
- Data exfiltration via allowed network
- Container escape via kernel vulnerability (rare)

### Defense in Depth

1. **Container**: Limits blast radius
2. **No host secrets**: ~/.ssh, AWS creds not mounted
3. **Resource limits**: Can't fork bomb host
4. **Easy reset**: Nuke container, keep workspace
5. **Backup**: Restore workspace from backup if compromised

## Image Strategy

### Custom Image with opencode (Required)

Since we need opencode pre-installed, a custom image is required:

```dockerfile
FROM codercom/code-server:latest

# Install opencode CLI
RUN curl -fsSL https://opencode.ai/install | bash

# Pre-install opencode VS Code extension (from Open VSX)
RUN code-server --install-extension sst-dev.opencode

# Install common language toolchains
RUN apt-get update && apt-get install -y \
    python3 python3-pip \
    nodejs npm \
    git \
    && rm -rf /var/lib/apt/lists/*

# Optional: Install Nix for on-demand packages
# RUN curl -L https://nixos.org/nix/install | sh
# ENV PATH="/root/.nix-profile/bin:$PATH"
```

### Container Contents

| Component | Purpose |
|-----------|---------|
| code-server | VS Code in browser |
| opencode CLI | AI coding agent |
| sst-dev.opencode extension | VS Code integration for opencode |
| Python 3 | Common language |
| Node.js | Common language |
| Git | Version control |

### Image Management

Options for keeping image updated:

1. **Manual rebuild**: Rebuild and redeploy periodically
2. **CI/CD**: Auto-rebuild on Dockerfile changes
3. **Watchtower equivalent**: Auto-pull new tags (risky for stability)

**Decision**: Manual rebuild initially, automate via CI later if needed.

### Extension Pre-Installation

The opencode extension is available on Open VSX (required for code-server):
- Registry: [open-vsx.org/extension/sst-dev/opencode](https://open-vsx.org/extension/sst-dev/opencode)
- Install command: `code-server --install-extension sst-dev.opencode`

## Rollout Plan

### Phase 1: Single User (SSH Tunnel)

1. Deploy one container for testing
2. Access via SSH tunnel only
3. Validate WebSocket, extensions, terminal
4. Test memory usage under load

### Phase 2: nginx Integration

1. Add nginx reverse proxy route
2. Enable HTTPS via ACME
3. Test from external network
4. Validate PWA install works

### Phase 3: Multi-User

1. Add additional users
2. Upgrade server RAM if needed
3. Test concurrent usage
4. Document onboarding

### Phase 4: Hardening

1. Custom image with Nix (if needed)
2. Network policies (if needed)
3. Automated backup of workspaces
4. Monitoring/alerting

## Open Questions

1. ~~**Domain**: `code.clarun.xyz` or path under existing domain?~~ → Resolved: Subdomain routing (`dan.code.clarun.xyz`)
2. ~~**API keys**: How to provision opencode API keys (OpenAI, Anthropic, etc.) per user?~~ → Resolved: Phase 1 shared keys via sops-nix, provider-level spend limits
3. ~~**Git credentials**: How do users authenticate to git remotes?~~ → Resolved: Deferred - local-only projects initially, add git auth in Phase 2 if needed
4. **Onboarding docs**: What documentation do non-programmers need?

## References

### code-server
- [code-server GitHub](https://github.com/coder/code-server)
- [code-server multi-user blog](https://coder.com/blog/code-server-multiple-users)
- [NixOS oci-containers](https://nixos.wiki/wiki/Podman)

### opencode
- [opencode.ai](https://opencode.ai/)
- [opencode GitHub](https://github.com/sst/opencode)
- [opencode VS Code extension (Open VSX)](https://open-vsx.org/extension/sst-dev/opencode)
- [opencode VS Code extension (MS Marketplace)](https://marketplace.visualstudio.com/items?itemName=sst-dev.opencode)

### Other
- [Tailscale code-server guide](https://tailscale.com/kb/1166/vscode-ipad) (for iPad/PWA patterns)

## Appendix: Alternatives Considered

### VS Code Remote SSH

Users run VS Code locally, SSH to server for compute.

| Pros | Cons |
|------|------|
| Less server RAM (UI on laptop) | Not browser-only |
| Native VS Code experience | Requires local VS Code install |
| No container complexity | Less isolation |
| Better keyboard shortcuts | Higher barrier for non-programmers |

**Why not chosen**: Non-programmer users need zero-install browser access.

### openvscode-server (instead of code-server)

| Factor | code-server | openvscode-server |
|--------|-------------|-------------------|
| Built-in auth | ✅ | ❌ |
| NixOS module | ✅ | ❌ |
| Maintenance | Active | Active |

**Why not chosen**: code-server has built-in auth and better NixOS integration.

### Coder Platform (instead of DIY)

Enterprise platform for provisioning dev environments.

| Pros | Cons |
|------|------|
| Multi-user built-in | Terraform complexity |
| SSO, audit logs | Overkill for 1-5 users |
| Auto-shutdown | Designed for cloud provisioning |

**Why not chosen**: We have existing infrastructure; Coder adds unnecessary complexity.

### Terminal-Only (SSH + tmux + neovim)

| Pros | Cons |
|------|------|
| Minimal resources | High learning curve |
| Power user friendly | Non-programmers excluded |

**Why not chosen**: Must support non-programmer learners with GUI.