ops-jrz1/docs/worklogs/2025-10-22-deployment-generation-31.md
Dan 64246a6615 Deploy Generation 31 with sops-nix secrets management
Successfully deployed ops-jrz1 Matrix platform to production VPS using
extracted modules from ops-base. Validated deployment workflow following
ops-base best practices: boot -> reboot -> verify.

Changes:
- Pin sops-nix to June 2024 version for nixpkgs 24.05 compatibility
- Configure sops secrets for Matrix registration token and ACME email
- Add encrypted secrets.yaml (safe to commit, encrypted with age)
- Document deployment process and lessons learned

All services verified running:
- Matrix homeserver (matrix-continuwuity): conduwuit 0.5.0-rc.8
- nginx: Proxying Matrix and Forgejo
- PostgreSQL 15.10: Database services
- Forgejo 7.0.12: Git platform

Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-21 21:32:23 -07:00

129 lines
4.4 KiB
Markdown

# Deployment: Generation 31 - Matrix Platform Migration
**Date:** 2025-10-22
**Status:** ✅ SUCCESS
**Generation:** 31
**Deployment Time:** ~5 minutes (build + reboot)
## Summary
Successfully deployed ops-jrz1 Matrix platform using modules extracted from ops-base. This deployment established the foundation deployment pattern and validated sops-nix secrets management integration.
## Deployment Method
Following ops-base best practices from worklog research:
```bash
# 1. Build and install to boot (safe, rollback-friendly)
rsync -avz --exclude '.git' --exclude 'result' /home/dan/proj/ops-jrz1/ root@45.77.205.49:/root/ops-jrz1/
ssh root@45.77.205.49 'cd /root/ops-jrz1 && nixos-rebuild boot --flake .#ops-jrz1'
# 2. Reboot to test
ssh root@45.77.205.49 'reboot'
# 3. Verify services after reboot (verified all running)
ssh root@45.77.205.49 'systemctl status matrix-continuwuity nginx postgresql forgejo'
# 4. Test API endpoints
curl http://45.77.205.49:8008/_matrix/client/versions
```
## What Works ✅
### Core Infrastructure
- **NixOS Generation 31** booted successfully
- **sops-nix** decrypting secrets correctly using VPS SSH host key
- **Age encryption** working with key: `age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q`
### Services Running
- **Matrix Homeserver (matrix-continuwuity):** ✅ Running, API responding
- Version: conduwuit 0.5.0-rc.8
- Listening on: 127.0.0.1:8008
- Database: RocksDB schema version 18
- Registration enabled, federation disabled
- **nginx:** ✅ Running
- Proxying to Matrix homeserver
- ACME certificates configured for clarun.xyz and git.clarun.xyz
- Note: WebDAV errors expected (legacy feature, can be removed)
- **PostgreSQL 15.10:** ✅ Running
- Serving Forgejo database
- Minor client disconnect logs normal (connection pooling)
- **Forgejo 7.0.12:** ✅ Running
- Git service operational
- Connected to PostgreSQL
- Available at git.clarun.xyz
### Files Successfully Migrated
- `.sops.yaml` - Encrypted secrets configuration
- `secrets/secrets.yaml` - Encrypted secrets (committed to git, safe because encrypted)
- All Matrix platform modules from ops-base
## Configuration Highlights
### sops-nix Setup
Located in `hosts/ops-jrz1.nix:26-38`:
```nix
sops.defaultSopsFile = ../secrets/secrets.yaml;
sops.age.sshKeyPaths = [ "/etc/ssh/ssh_host_ed25519_key" ];
sops.secrets.matrix-registration-token = {
owner = "continuwuity";
group = "continuwuity";
mode = "0440";
};
sops.secrets.acme-email = {
owner = "root";
mode = "0444";
};
```
### Version Compatibility
Pinned sops-nix to avoid Go version mismatch (flake.nix:9):
```nix
sops-nix = {
url = "github:Mic92/sops-nix/c2ea1186c0cbfa4d06d406ae50f3e4b085ddc9b3"; # June 2024 version
inputs.nixpkgs.follows = "nixpkgs";
};
```
## Key Lessons from ops-base Research
### Deployment Pattern (Recommended)
1. **`nixos-rebuild boot`** - Install to bootloader, don't activate yet
2. **Reboot** - Test new configuration
3. **Verify services** - Ensure everything works
4. **`nixos-rebuild switch`** (optional) - Make current profile permanent
**Rollback:** If anything fails, select previous generation from GRUB or `nixos-rebuild switch --rollback`
### Secrets Management
- Encrypted `secrets.yaml` **should be committed to git** (it's encrypted with age, safe to track)
- SSH host key converts to age key automatically via `ssh-to-age`
- Multi-recipient encryption allows both VPS and admin workstation to decrypt
### Common Pitfalls Avoided
From 46+ ops-base deployments:
1. **Exit code 11 ≠ always segfault** - Often intentional exit_group(11) from config validation
2. **SystemCallFilter restrictions** - Can block CPU affinity syscalls, needs allowances
3. **LoadCredential patterns** - Use for Python scripts reading secrets from environment
4. **ACME debugging** - Check `journalctl -u acme-*`, verify DNS, test staging first
## Build Statistics
- **285 derivations built**
- **378 paths fetched** (786.52 MiB download, 3.39 GiB unpacked)
- **Boot time:** ~30 seconds
- **Service startup:** All services up within 2 minutes
## Next Steps
- [ ] Monitor mautrix-slack (currently segfaulting, needs investigation)
- [ ] Establish regular deployment workflow (local build + remote deploy)
- [ ] Configure remaining Matrix bridges (WhatsApp, Google Messages)
- [ ] Set up monitoring/alerting
## References
- ops-base worklogs: Reviewed 46+ deployment entries
- sops-nix docs: Age encryption with SSH host keys
- NixOS deployment patterns: boot -> reboot -> switch workflow