Commit graph

142 commits

Author SHA1 Message Date
Dan 4531f85386 docs: clarify SSH is pre-configured for git access 2026-01-23 11:53:53 -08:00
Dan 2887ad4122 docs: add persistent services section to SERVER.md
Documents tmux, user systemd, and pm2 options for running
long-lived processes. Notes lingering requirement for systemd.
2026-01-23 11:24:28 -08:00
Dan 3c8f961cdc docs: user-facing docs with SERVER.md symlink pattern
- SERVER.md: symlinked from /etc/user-docs (always current)
- AGENTS.md: user's file, points to SERVER.md, editable
- README.md: welcome doc, copied once
- readme.txt: whaddup cuz

dev-add.sh provisions all four, only overwrites SERVER.md symlink.
2026-01-22 22:04:51 -08:00
Dan a81fd5f299 docs: Forgejo collaboration guide for dev users
Research for ops-jrz1-mh2: how dev users collaborate on git.clarun.xyz

Covers:
- Account setup and SSH access
- Shared repo vs fork+PR models
- Trunk-based workflow
- Troubleshooting common issues
2026-01-22 15:29:16 -08:00
Dan 5e2515d505 Update musiclink flake input 2026-01-22 12:58:08 -08:00
Dan 77925bc22a Update musiclink flake input 2026-01-22 12:23:41 -08:00
Dan 93e694824a Grant devs journal access
Update dev-add to add systemd-journal group and extend check-deploy output.
2026-01-22 11:44:11 -08:00
Dan 9bc0fd88da Update musiclink flake input 2026-01-22 11:26:04 -08:00
Dan b287c0d582 Bind MusicLink health server to localhost 2026-01-22 10:55:59 -08:00
Dan 603b32b7ef Update musiclink flake input 2026-01-22 10:47:54 -08:00
Dan 9737371638 Update musiclink flake input 2026-01-22 09:13:07 -08:00
Dan eb76cc5ad2 Switch MusicLink to Matrix-native config
Replace Matterbridge settings with matrix config options.

Generate TOML with proper room list commas.
2026-01-21 23:10:53 -08:00
Dan 15427dddaf Add mautrix-slack config example 2026-01-21 22:54:34 -08:00
Dan ae16db4898 Refresh musiclink integration docs and tooling
Use local musiclink flake input with Go 1.24.

Add matterbridge patch, routing docs, and deploy check script.
2026-01-21 22:52:39 -08:00
Dan 8918b62765 Resolve git access to git.clarun.xyz for musiclink (zr0q)
- Created musiclink repo on Forgejo
- Added dan's devserver SSH key to Forgejo
- Switched musiclink flake input from local path to git+ssh
- Updated musiclink testing room config in modules/musiclink.nix
2026-01-20 20:34:39 -08:00
Dan db02bf8b61 docs: update musiclink worklog with smoke test results 2026-01-20 15:08:15 -08:00
Dan 490b32c9ae docs: add summary of musiclink integration for team 2026-01-20 15:02:54 -08:00
Dan d345bc6cb8 docs: add worklog for musiclink matrix pivot 2026-01-20 15:00:19 -08:00
Dan 73b932ff47 docs: add worklog and final musiclink config fixes 2026-01-20 14:42:00 -08:00
Dan 4adf6723c5 feat: complete musiclink bot integration with verified VM checks 2026-01-20 13:40:47 -08:00
Dan 82fce7f4e4 docs: remove deprecated Emes workflow references 2026-01-20 10:57:06 -08:00
Dan 3236ed5450 Update worklog with beads system-wide install
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 15:14:14 -08:00
Dan 11b901b503 Add beads (bd) system-wide for all users
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 15:07:10 -08:00
Dan ed8e36257f Update worklog with VS Code auth bug findings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 12:08:52 -08:00
Dan ef9c583c3b Add worklog: ops-review completion and bot research
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 21:49:37 -08:00
Dan fec21745ce Update worklog with ops-review fixes and y8le decision
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 20:19:07 -08:00
Dan b1d2674629 Add failure notification and resilience to backup services
- Add backup-b2-failed oneshot for OnFailure notification
- Add onFailure handler to both backup-b2 and backup-b2-check
- Add network-online.target dependency to backup-b2-check
- Add TimeoutStartSec (2h for backup, 1h for check)

Found via ops-review lenses.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 18:56:33 -08:00
Dan d581d7bac4 Add worklog: NixOS 24.11 upgrade with DR preparation 2026-01-10 18:46:38 -08:00
Dan 75515c7e53 Update flake to NixOS 24.11
- nixpkgs: 24.05 (Dec 2024) → 24.11 (Jun 2025)
- sops-nix: unpinned (now follows nixpkgs)
- nixpkgs-unstable: Dec 2025 → Jan 2026

Key version changes:
- PostgreSQL 15.10 → 15.13 (pinned to v15)
- Forgejo 7.0.12 → 7.0.15 LTS
- Matrix-continuwuity 0.5.0-rc → 0.5.1 stable
- maubot 0.4.2 → 0.5.0
- systemd 255 → 256

Build verified, deployment in separate task.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 18:12:33 -08:00
Dan 9c03d2204d Update DR runbook: first restore drill passed
Tested restore of:
- PostgreSQL dumps (forgejo: 112 tables, mautrix_slack: 32 tables)
- Forgejo repositories
- User home directories

Also updated known gaps status (sops key, PostgreSQL pin fixed).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 16:18:22 -08:00
Dan 5a45993046 Mark PostgreSQL pin complete in upgrade checklist 2026-01-10 16:07:56 -08:00
Dan db7b05a46e Pin PostgreSQL to v15 for NixOS 24.11 upgrade
Prevents automatic upgrade to PostgreSQL 16 when upgrading NixOS.
This allows a safer two-step approach: upgrade NixOS first, then
pg_upgrade later.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 16:07:13 -08:00
Dan 42ebc501c3 Document NixOS 24.11 upgrade impact analysis
Key findings:
- PostgreSQL defaults to 16 (must pin to 15)
- Forgejo 7.0→9.0 (review release notes, backup DB)
- conduwuit discontinued (we use continuwuity fork, OK)
- mautrix-slack, nginx, ACME: no breaking changes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 15:58:06 -08:00
Dan 85989ccc2a Add offline sops recovery key
Secrets now encrypted to three recipients:
- vultr_vps: server SSH host key (primary)
- admin: workstation key (local editing)
- recovery: offline key at ~/.config/sops/age/recovery.key

If server dies and admin key unavailable, recovery key can
still decrypt secrets to bootstrap restore.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 15:40:31 -08:00
Dan 5db6c0dc7e Update DR runbook: mark backup paths as fixed 2026-01-10 14:37:30 -08:00
Dan 6954fbec9a Add /home and /var/lib/acme to B2 backups
Closes r177. Critical DR gap - user home directories and ACME
certificates were not being backed up.

Excludes common caches that can be rebuilt:
- .cache, .npm/_cacache, .bun/install/cache
- node_modules, .nix-profile, .nix-defexpr
- Trash

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 14:33:01 -08:00
Dan b62f649a28 Add disaster recovery runbook draft
Documents restore procedures for full server loss, partial restore,
and user data recovery scenarios. Includes verification checklists,
time estimates, and break-glass quick reference.

Also documents known gaps (home dirs, ACME, RocksDB consistency)
that need fixing before the runbook is production-ready.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 14:02:01 -08:00
Dan 31d388d21c Add B2 automated backup with restic
- Add services.postgresqlBackup for daily DB dumps (2 AM)
- New modules/backup-b2.nix: restic backup to B2 (3 AM daily)
- Weekly integrity check (Sunday 4 AM)
- Retention: 7 daily, 4 weekly, 6 monthly
- B2 bucket: ops-jrz1-backup with scoped app key

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 13:49:59 -08:00
Dan ff34cee51e Sync AI agent sandbox docs to dev-add.sh AGENTS.md
New users will get the Codex sandbox workaround in their home AGENTS.md.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 08:09:25 -08:00
Dan 026f82e697 Document AI agent sandbox conflicts in server-AGENTS.md
Codex CLI seccomp filters block nix daemon access.
Workaround: disable redundant sandbox since server provides isolation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:33:19 -08:00
Dan 51e657d43b Add devs group to nix trusted-users
Allows dev users to use nix develop, nix build, etc.
Previously blocked by daemon access restrictions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:01:40 -08:00
Dan bde2aad939 Harden dev provisioning scripts (ops-review fixes)
- Remove stderr suppression from ssh-keygen (show errors)
- Add curl timeouts (--connect-timeout 5 --max-time 30)
- Add || true to arithmetic increments for set -e safety

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 20:21:57 -08:00
Dan d9c1848e88 Implement dual-key git access for dev users
- Generate server-side SSH keypair for git access from server
- Upload both laptop key and server key to Forgejo
- Add mandatory key revocation in dev-remove.sh
- Fix: use forgejo@ instead of git@ for SSH URLs
- Keys named username-laptop and username-devserver
- Key comment includes DO-NOT-REUSE warning

Closes ops-jrz1-rfx
2026-01-09 19:35:59 -08:00
Dan 99b187fa5a Document security model: simple Unix isolation 2026-01-09 16:31:11 -08:00
Dan f17604f0ad Add Forgejo admin operations doc 2026-01-09 15:09:09 -08:00
Dan 11bb06a959 Revert "Document Forgejo API administration pattern"
This reverts commit f4be5fa7fc.
2026-01-09 15:08:44 -08:00
Dan f4be5fa7fc Document Forgejo API administration pattern 2026-01-09 15:08:23 -08:00
Dan aca792a51d Add secure password delivery for Forgejo provisioning
Write credentials to ~/.forgejo-credentials (JSON, mode 600) when
creating new Forgejo users. Onboarding message points to file
instead of showing password in terminal output.

Addresses ops-jrz1-ofw.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 10:02:57 -08:00
Dan 1575e44ca2 Fix dev-add.sh random password generation, update Forgejo token scope
- Replace openssl rand with /dev/urandom (openssl not in NixOS path)
- Update forgejo-api-token with admin scope for user provisioning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:58:49 -08:00
Dan fafc04cb0d Add Forgejo integration to dev user provisioning
- Add programs.ssh.knownHosts for git.clarun.xyz (prevents SSH prompts)
- Expose forgejo-api-token via sops-nix for provisioning
- dev-add.sh: Create Forgejo account + upload SSH key via API
- dev-add.sh: Set up .gitconfig with user.name/email
- dev-remove.sh: Print warning to manually suspend Forgejo account

Addresses ops-jrz1-qts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:32:18 -08:00