ops-jrz1/docs/worklogs/2025-10-22-deployment-generation-31.md
Dan 64246a6615 Deploy Generation 31 with sops-nix secrets management
Successfully deployed ops-jrz1 Matrix platform to production VPS using
extracted modules from ops-base. Validated deployment workflow following
ops-base best practices: boot -> reboot -> verify.

Changes:
- Pin sops-nix to June 2024 version for nixpkgs 24.05 compatibility
- Configure sops secrets for Matrix registration token and ACME email
- Add encrypted secrets.yaml (safe to commit, encrypted with age)
- Document deployment process and lessons learned

All services verified running:
- Matrix homeserver (matrix-continuwuity): conduwuit 0.5.0-rc.8
- nginx: Proxying Matrix and Forgejo
- PostgreSQL 15.10: Database services
- Forgejo 7.0.12: Git platform

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 21:32:23 -07:00

4.4 KiB

Deployment: Generation 31 - Matrix Platform Migration

Date: 2025-10-22 Status: SUCCESS Generation: 31 Deployment Time: ~5 minutes (build + reboot)

Summary

Successfully deployed ops-jrz1 Matrix platform using modules extracted from ops-base. This deployment established the foundation deployment pattern and validated sops-nix secrets management integration.

Deployment Method

Following ops-base best practices from worklog research:

# 1. Build and install to boot (safe, rollback-friendly)
rsync -avz --exclude '.git' --exclude 'result' /home/dan/proj/ops-jrz1/ root@45.77.205.49:/root/ops-jrz1/
ssh root@45.77.205.49 'cd /root/ops-jrz1 && nixos-rebuild boot --flake .#ops-jrz1'

# 2. Reboot to test
ssh root@45.77.205.49 'reboot'

# 3. Verify services after reboot (verified all running)
ssh root@45.77.205.49 'systemctl status matrix-continuwuity nginx postgresql forgejo'

# 4. Test API endpoints
curl http://45.77.205.49:8008/_matrix/client/versions

What Works

Core Infrastructure

  • NixOS Generation 31 booted successfully
  • sops-nix decrypting secrets correctly using VPS SSH host key
  • Age encryption working with key: age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q

Services Running

  • Matrix Homeserver (matrix-continuwuity): Running, API responding

    • Version: conduwuit 0.5.0-rc.8
    • Listening on: 127.0.0.1:8008
    • Database: RocksDB schema version 18
    • Registration enabled, federation disabled
  • nginx: Running

    • Proxying to Matrix homeserver
    • ACME certificates configured for clarun.xyz and git.clarun.xyz
    • Note: WebDAV errors expected (legacy feature, can be removed)
  • PostgreSQL 15.10: Running

    • Serving Forgejo database
    • Minor client disconnect logs normal (connection pooling)
  • Forgejo 7.0.12: Running

    • Git service operational
    • Connected to PostgreSQL
    • Available at git.clarun.xyz

Files Successfully Migrated

  • .sops.yaml - Encrypted secrets configuration
  • secrets/secrets.yaml - Encrypted secrets (committed to git, safe because encrypted)
  • All Matrix platform modules from ops-base

Configuration Highlights

sops-nix Setup

Located in hosts/ops-jrz1.nix:26-38:

sops.defaultSopsFile = ../secrets/secrets.yaml;
sops.age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ];

sops.secrets.matrix-registration-token = {
  owner = "continuwuity";
  group = "continuwuity";
  mode = "0440";
};

sops.secrets.acme-email = {
  owner = "root";
  mode = "0444";
};

Version Compatibility

Pinned sops-nix to avoid Go version mismatch (flake.nix:9):

sops-nix = {
  url = "github:Mic92/sops-nix/c2ea1186c0cbfa4d06d406ae50f3e4b085ddc9b3";  # June 2024 version
  inputs.nixpkgs.follows = "nixpkgs";
};

Key Lessons from ops-base Research

  1. nixos-rebuild boot - Install to bootloader, don't activate yet
  2. Reboot - Test new configuration
  3. Verify services - Ensure everything works
  4. nixos-rebuild switch (optional) - Make current profile permanent

Rollback: If anything fails, select previous generation from GRUB or nixos-rebuild switch --rollback

Secrets Management

  • Encrypted secrets.yaml should be committed to git (it's encrypted with age, safe to track)
  • SSH host key converts to age key automatically via ssh-to-age
  • Multi-recipient encryption allows both VPS and admin workstation to decrypt

Common Pitfalls Avoided

From 46+ ops-base deployments:

  1. Exit code 11 ≠ always segfault - Often intentional exit_group(11) from config validation
  2. SystemCallFilter restrictions - Can block CPU affinity syscalls, needs allowances
  3. LoadCredential patterns - Use for Python scripts reading secrets from environment
  4. ACME debugging - Check journalctl -u acme-*, verify DNS, test staging first

Build Statistics

  • 285 derivations built
  • 378 paths fetched (786.52 MiB download, 3.39 GiB unpacked)
  • Boot time: ~30 seconds
  • Service startup: All services up within 2 minutes

Next Steps

  • Monitor mautrix-slack (currently segfaulting, needs investigation)
  • Establish regular deployment workflow (local build + remote deploy)
  • Configure remaining Matrix bridges (WhatsApp, Google Messages)
  • Set up monitoring/alerting

References

  • ops-base worklogs: Reviewed 46+ deployment entries
  • sops-nix docs: Age encryption with SSH host keys
  • NixOS deployment patterns: boot -> reboot -> switch workflow