Commit graph

113 commits

Author SHA1 Message Date
Dan 9c03d2204d Update DR runbook: first restore drill passed
Tested restore of:
- PostgreSQL dumps (forgejo: 112 tables, mautrix_slack: 32 tables)
- Forgejo repositories
- User home directories

Also updated known gaps status (sops key, PostgreSQL pin fixed).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 16:18:22 -08:00
Dan 5a45993046 Mark PostgreSQL pin complete in upgrade checklist 2026-01-10 16:07:56 -08:00
Dan db7b05a46e Pin PostgreSQL to v15 for NixOS 24.11 upgrade
Prevents automatic upgrade to PostgreSQL 16 when upgrading NixOS.
This allows a safer two-step approach: upgrade NixOS first, then
pg_upgrade later.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 16:07:13 -08:00
Dan 42ebc501c3 Document NixOS 24.11 upgrade impact analysis
Key findings:
- PostgreSQL defaults to 16 (must pin to 15)
- Forgejo 7.0→9.0 (review release notes, backup DB)
- conduwuit discontinued (we use continuwuity fork, OK)
- mautrix-slack, nginx, ACME: no breaking changes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 15:58:06 -08:00
Dan 85989ccc2a Add offline sops recovery key
Secrets now encrypted to three recipients:
- vultr_vps: server SSH host key (primary)
- admin: workstation key (local editing)
- recovery: offline key at ~/.config/sops/age/recovery.key

If server dies and admin key unavailable, recovery key can
still decrypt secrets to bootstrap restore.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 15:40:31 -08:00
Dan 5db6c0dc7e Update DR runbook: mark backup paths as fixed 2026-01-10 14:37:30 -08:00
Dan 6954fbec9a Add /home and /var/lib/acme to B2 backups
Closes r177. Critical DR gap - user home directories and ACME
certificates were not being backed up.

Excludes common caches that can be rebuilt:
- .cache, .npm/_cacache, .bun/install/cache
- node_modules, .nix-profile, .nix-defexpr
- Trash

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 14:33:01 -08:00
Dan b62f649a28 Add disaster recovery runbook draft
Documents restore procedures for full server loss, partial restore,
and user data recovery scenarios. Includes verification checklists,
time estimates, and break-glass quick reference.

Also documents known gaps (home dirs, ACME, RocksDB consistency)
that need fixing before the runbook is production-ready.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 14:02:01 -08:00
Dan 31d388d21c Add B2 automated backup with restic
- Add services.postgresqlBackup for daily DB dumps (2 AM)
- New modules/backup-b2.nix: restic backup to B2 (3 AM daily)
- Weekly integrity check (Sunday 4 AM)
- Retention: 7 daily, 4 weekly, 6 monthly
- B2 bucket: ops-jrz1-backup with scoped app key

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 13:49:59 -08:00
Dan ff34cee51e Sync AI agent sandbox docs to dev-add.sh AGENTS.md
New users will get the Codex sandbox workaround in their home AGENTS.md.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 08:09:25 -08:00
Dan 026f82e697 Document AI agent sandbox conflicts in server-AGENTS.md
Codex CLI seccomp filters block nix daemon access.
Workaround: disable redundant sandbox since server provides isolation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:33:19 -08:00
Dan 51e657d43b Add devs group to nix trusted-users
Allows dev users to use nix develop, nix build, etc.
Previously blocked by daemon access restrictions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:01:40 -08:00
Dan bde2aad939 Harden dev provisioning scripts (ops-review fixes)
- Remove stderr suppression from ssh-keygen (show errors)
- Add curl timeouts (--connect-timeout 5 --max-time 30)
- Add || true to arithmetic increments for set -e safety

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 20:21:57 -08:00
Dan d9c1848e88 Implement dual-key git access for dev users
- Generate server-side SSH keypair for git access from server
- Upload both laptop key and server key to Forgejo
- Add mandatory key revocation in dev-remove.sh
- Fix: use forgejo@ instead of git@ for SSH URLs
- Keys named username-laptop and username-devserver
- Key comment includes DO-NOT-REUSE warning

Closes ops-jrz1-rfx
2026-01-09 19:35:59 -08:00
Dan 99b187fa5a Document security model: simple Unix isolation 2026-01-09 16:31:11 -08:00
Dan f17604f0ad Add Forgejo admin operations doc 2026-01-09 15:09:09 -08:00
Dan 11bb06a959 Revert "Document Forgejo API administration pattern"
This reverts commit f4be5fa7fc.
2026-01-09 15:08:44 -08:00
Dan f4be5fa7fc Document Forgejo API administration pattern 2026-01-09 15:08:23 -08:00
Dan aca792a51d Add secure password delivery for Forgejo provisioning
Write credentials to ~/.forgejo-credentials (JSON, mode 600) when
creating new Forgejo users. Onboarding message points to file
instead of showing password in terminal output.

Addresses ops-jrz1-ofw.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 10:02:57 -08:00
Dan 1575e44ca2 Fix dev-add.sh random password generation, update Forgejo token scope
- Replace openssl rand with /dev/urandom (openssl not in NixOS path)
- Update forgejo-api-token with admin scope for user provisioning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:58:49 -08:00
Dan fafc04cb0d Add Forgejo integration to dev user provisioning
- Add programs.ssh.knownHosts for git.clarun.xyz (prevents SSH prompts)
- Expose forgejo-api-token via sops-nix for provisioning
- dev-add.sh: Create Forgejo account + upload SSH key via API
- dev-add.sh: Set up .gitconfig with user.name/email
- dev-remove.sh: Print warning to manually suspend Forgejo account

Addresses ops-jrz1-qts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:32:18 -08:00
Dan ba949239a5 Remove obsolete slack-oauth-token null placeholder
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:06:43 -08:00
Dan 6e890396f4 Add Forgejo admin credentials to sops
- forgejo-admin-password: dan user password
- forgejo-api-token: API token for automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 16:58:05 -08:00
Dan ad1fbf1c8c Consolidate scattered networking, environment, systemd keys
Fixes 3x statix W20 warnings. No functional change.
- networking: Moved firewall into main block
- environment: Consolidated systemPackages, localBinInPath, shellInit
- systemd: Consolidated slices, services, timers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 12:49:57 -08:00
Dan 278017efe3 Consolidate scattered sops.* keys into single block
Fixes statix W20 warning. No functional change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 12:42:57 -08:00
Dan 99927712c5 Add VM integration test and shellcheck linting to flake checks
- VM test boots a VM and verifies PostgreSQL, conduwuit, dnsmasq, nginx
- Shellcheck runs on all shell scripts (errors and warnings)
- Fix unused variables in sanitize-files.sh
- Use initialHashedPassword for root in VM config

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 11:04:00 -08:00
Dan 92d7646d52 Migrate Slack tokens to sops-nix, improve egress rate limits
- Remove beads from VPS deployment (kept locally for dev workflow)
- Add slack-bot-token and slack-app-token secrets with devs group access
- Remove dead acme-email secret reference
- Increase egress limits from 30/min to 150/min (burst 60→300)
- Change egress blocking from REJECT to DROP for better app behavior
- Add egress-status script for user self-diagnosis
- Update dev-slack-direct.md with new /run/secrets access patterns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 11:14:19 -08:00
Dan df2cb13f9b Remove redundant olm permission from VM config
VM imports configuration.nix which already has the permission.
Clarified comments explaining why both flake.nix and configuration.nix
need the permission (different pkgs sources).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 22:53:07 -08:00
Dan 2aa005b300 Pin beads and opencode flake inputs to commit hashes
Prevents unexpected breakage from upstream changes.
To update: nix flake update beads opencode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 20:56:25 -08:00
Dan 80ac34fc5c Make dev-add.sh idempotent
Safe to re-run: updates SSH key and config if user exists,
creates new user if not. Matches NixOS declarative model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 20:35:10 -08:00
Dan cbda7aee2b Fix dev-add.sh to check file readability, not just existence
Change [ -f /etc/slack-dev.env ] to [ -r ... ] so users not in
devs group don't get permission denied errors on login.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:44:36 -08:00
Dan 812ffb9802 Add --dry-run flag to dev-remove.sh
Preview mode shows what would be removed without making changes.
Skips confirmation prompt and outputs cyan-colored dry-run messages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:40:21 -08:00
Dan 2dd5684a8b Remove unused Nix lambda patterns (deadnix cleanup) 2026-01-05 18:23:54 -08:00
Dan d3151b39ed Add mosh alternative to dev onboarding doc 2026-01-05 17:38:42 -08:00
Dan 7ea56904d4 Add mosh for mobile shell access
- mosh package in systemPackages
- UDP ports 60000-60010 for mosh sessions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:30:57 -08:00
Dan bcfdf962f3 Disable security modules pending fixes, patch ssh-hardening
ssh-hardening.nix had fatal bugs:
- UsePAM=false breaks NixOS SSH auth
- Protocol=2 deprecated, crashes modern sshd
- AllowUsers defaulted to ["admin"], locks out all users

Partial fixes applied but module still unsafe to enable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:09:07 -08:00
Dan a25abda825 Add Unix social tools section to dev onboarding doc
Documents who, w, finger, write, wall, ytalk and .plan files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 15:34:55 -08:00
Dan 7832e74110 Add classic Unix social tools and fortune on login
- bsd-finger, ytalk, fortune in systemPackages
- Fortune displays on interactive shell login via programs.bash.interactiveShellInit
- Avoids breaking nix copy/rsync/scp (loginShellInit was wrong approach)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 15:32:29 -08:00
Dan 7519c88134 Sync beads 2026-01-05 09:15:25 -08:00
Dan 955b6e91b4 Fix killswitch paths in watchdog scripts, remove replaceStrings workaround 2026-01-05 09:12:46 -08:00
Dan 22f405f995 Add dev tools checks to smoke test (bun, zig) 2026-01-04 17:09:56 -08:00
Dan 39a161ce79 Sync beads 2026-01-04 16:45:26 -08:00
Dan c236deb480 Add zig to AGENTS.md available tools 2026-01-04 16:43:44 -08:00
Dan e1e9e2d635 Add zig to system packages 2026-01-04 16:38:11 -08:00
Dan 1158f3a37b Add bun as preferred JS package manager for faster installs 2026-01-04 13:49:56 -08:00
Dan a2c994b1d1 Set COLORTERM for truecolor terminals in SSH sessions 2026-01-04 10:02:25 -08:00
Dan 79d278ba61 Add terminfo for ghostty and kitty terminals
Source ghostty.terminfo from nixpkgs-unstable since it's not
available in nixos-24.05 stable.
2026-01-03 18:02:40 -08:00
Dan 74cf842afd Improve dev onboarding: devs group, npm setup, AGENTS.md
- Add users.groups.devs for shared resources
- dev-add: check devs group exists before creating user
- dev-add: use .profile for login shell PATH setup
- dev-add: configure npm prefix and .npm-global directory
- dev-add: create AGENTS.md with friendly capability guide
- Update onboarding message with npm install examples
- Add docs/server-AGENTS.md for reference

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 17:11:03 -08:00
Dan bd49ea001a Add documentation for adding dev tools
Covers four methods: system-wide, per-user nix profile,
per-project devShell, and external flakes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 11:00:15 -08:00
Dan bc81b4ec15 Rename learner to dev across codebase
- scripts/learner-*.sh → scripts/dev-*.sh
- docs/learner-*.md → docs/dev-*.md
- tests/test-learner-env.sh → tests/test-dev-env.sh
- Update all internal references

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 10:42:34 -08:00