Commit graph

124 commits

Author SHA1 Message Date
Dan 73b932ff47 docs: add worklog and final musiclink config fixes 2026-01-20 14:42:00 -08:00
Dan 4adf6723c5 feat: complete musiclink bot integration with verified VM checks 2026-01-20 13:40:47 -08:00
Dan 82fce7f4e4 docs: remove deprecated Emes workflow references 2026-01-20 10:57:06 -08:00
Dan 3236ed5450 Update worklog with beads system-wide install
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 15:14:14 -08:00
Dan 11b901b503 Add beads (bd) system-wide for all users
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 15:07:10 -08:00
Dan ed8e36257f Update worklog with VS Code auth bug findings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 12:08:52 -08:00
Dan ef9c583c3b Add worklog: ops-review completion and bot research
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 21:49:37 -08:00
Dan fec21745ce Update worklog with ops-review fixes and y8le decision
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 20:19:07 -08:00
Dan b1d2674629 Add failure notification and resilience to backup services
- Add backup-b2-failed oneshot for OnFailure notification
- Add onFailure handler to both backup-b2 and backup-b2-check
- Add network-online.target dependency to backup-b2-check
- Add TimeoutStartSec (2h for backup, 1h for check)

Found via ops-review lenses.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 18:56:33 -08:00
Dan d581d7bac4 Add worklog: NixOS 24.11 upgrade with DR preparation 2026-01-10 18:46:38 -08:00
Dan 75515c7e53 Update flake to NixOS 24.11
- nixpkgs: 24.05 (Dec 2024) → 24.11 (Jun 2025)
- sops-nix: unpinned (now follows nixpkgs)
- nixpkgs-unstable: Dec 2025 → Jan 2026

Key version changes:
- PostgreSQL 15.10 → 15.13 (pinned to v15)
- Forgejo 7.0.12 → 7.0.15 LTS
- Matrix-continuwuity 0.5.0-rc → 0.5.1 stable
- maubot 0.4.2 → 0.5.0
- systemd 255 → 256

Build verified, deployment in separate task.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 18:12:33 -08:00
Dan 9c03d2204d Update DR runbook: first restore drill passed
Tested restore of:
- PostgreSQL dumps (forgejo: 112 tables, mautrix_slack: 32 tables)
- Forgejo repositories
- User home directories

Also updated known gaps status (sops key, PostgreSQL pin fixed).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 16:18:22 -08:00
Dan 5a45993046 Mark PostgreSQL pin complete in upgrade checklist 2026-01-10 16:07:56 -08:00
Dan db7b05a46e Pin PostgreSQL to v15 for NixOS 24.11 upgrade
Prevents automatic upgrade to PostgreSQL 16 when upgrading NixOS.
This allows a safer two-step approach: upgrade NixOS first, then
pg_upgrade later.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 16:07:13 -08:00
Dan 42ebc501c3 Document NixOS 24.11 upgrade impact analysis
Key findings:
- PostgreSQL defaults to 16 (must pin to 15)
- Forgejo 7.0→9.0 (review release notes, backup DB)
- conduwuit discontinued (we use continuwuity fork, OK)
- mautrix-slack, nginx, ACME: no breaking changes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 15:58:06 -08:00
Dan 85989ccc2a Add offline sops recovery key
Secrets now encrypted to three recipients:
- vultr_vps: server SSH host key (primary)
- admin: workstation key (local editing)
- recovery: offline key at ~/.config/sops/age/recovery.key

If server dies and admin key unavailable, recovery key can
still decrypt secrets to bootstrap restore.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 15:40:31 -08:00
Dan 5db6c0dc7e Update DR runbook: mark backup paths as fixed 2026-01-10 14:37:30 -08:00
Dan 6954fbec9a Add /home and /var/lib/acme to B2 backups
Closes r177. Critical DR gap - user home directories and ACME
certificates were not being backed up.

Excludes common caches that can be rebuilt:
- .cache, .npm/_cacache, .bun/install/cache
- node_modules, .nix-profile, .nix-defexpr
- Trash

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 14:33:01 -08:00
Dan b62f649a28 Add disaster recovery runbook draft
Documents restore procedures for full server loss, partial restore,
and user data recovery scenarios. Includes verification checklists,
time estimates, and break-glass quick reference.

Also documents known gaps (home dirs, ACME, RocksDB consistency)
that need fixing before the runbook is production-ready.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 14:02:01 -08:00
Dan 31d388d21c Add B2 automated backup with restic
- Add services.postgresqlBackup for daily DB dumps (2 AM)
- New modules/backup-b2.nix: restic backup to B2 (3 AM daily)
- Weekly integrity check (Sunday 4 AM)
- Retention: 7 daily, 4 weekly, 6 monthly
- B2 bucket: ops-jrz1-backup with scoped app key

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 13:49:59 -08:00
Dan ff34cee51e Sync AI agent sandbox docs to dev-add.sh AGENTS.md
New users will get the Codex sandbox workaround in their home AGENTS.md.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 08:09:25 -08:00
Dan 026f82e697 Document AI agent sandbox conflicts in server-AGENTS.md
Codex CLI seccomp filters block nix daemon access.
Workaround: disable redundant sandbox since server provides isolation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:33:19 -08:00
Dan 51e657d43b Add devs group to nix trusted-users
Allows dev users to use nix develop, nix build, etc.
Previously blocked by daemon access restrictions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 23:01:40 -08:00
Dan bde2aad939 Harden dev provisioning scripts (ops-review fixes)
- Remove stderr suppression from ssh-keygen (show errors)
- Add curl timeouts (--connect-timeout 5 --max-time 30)
- Add || true to arithmetic increments for set -e safety

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 20:21:57 -08:00
Dan d9c1848e88 Implement dual-key git access for dev users
- Generate server-side SSH keypair for git access from server
- Upload both laptop key and server key to Forgejo
- Add mandatory key revocation in dev-remove.sh
- Fix: use forgejo@ instead of git@ for SSH URLs
- Keys named username-laptop and username-devserver
- Key comment includes DO-NOT-REUSE warning

Closes ops-jrz1-rfx
2026-01-09 19:35:59 -08:00
Dan 99b187fa5a Document security model: simple Unix isolation 2026-01-09 16:31:11 -08:00
Dan f17604f0ad Add Forgejo admin operations doc 2026-01-09 15:09:09 -08:00
Dan 11bb06a959 Revert "Document Forgejo API administration pattern"
This reverts commit f4be5fa7fc.
2026-01-09 15:08:44 -08:00
Dan f4be5fa7fc Document Forgejo API administration pattern 2026-01-09 15:08:23 -08:00
Dan aca792a51d Add secure password delivery for Forgejo provisioning
Write credentials to ~/.forgejo-credentials (JSON, mode 600) when
creating new Forgejo users. Onboarding message points to file
instead of showing password in terminal output.

Addresses ops-jrz1-ofw.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 10:02:57 -08:00
Dan 1575e44ca2 Fix dev-add.sh random password generation, update Forgejo token scope
- Replace openssl rand with /dev/urandom (openssl not in NixOS path)
- Update forgejo-api-token with admin scope for user provisioning

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:58:49 -08:00
Dan fafc04cb0d Add Forgejo integration to dev user provisioning
- Add programs.ssh.knownHosts for git.clarun.xyz (prevents SSH prompts)
- Expose forgejo-api-token via sops-nix for provisioning
- dev-add.sh: Create Forgejo account + upload SSH key via API
- dev-add.sh: Set up .gitconfig with user.name/email
- dev-remove.sh: Print warning to manually suspend Forgejo account

Addresses ops-jrz1-qts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:32:18 -08:00
Dan ba949239a5 Remove obsolete slack-oauth-token null placeholder
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 17:06:43 -08:00
Dan 6e890396f4 Add Forgejo admin credentials to sops
- forgejo-admin-password: dan user password
- forgejo-api-token: API token for automation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 16:58:05 -08:00
Dan ad1fbf1c8c Consolidate scattered networking, environment, systemd keys
Fixes 3x statix W20 warnings. No functional change.
- networking: Moved firewall into main block
- environment: Consolidated systemPackages, localBinInPath, shellInit
- systemd: Consolidated slices, services, timers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 12:49:57 -08:00
Dan 278017efe3 Consolidate scattered sops.* keys into single block
Fixes statix W20 warning. No functional change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 12:42:57 -08:00
Dan 99927712c5 Add VM integration test and shellcheck linting to flake checks
- VM test boots a VM and verifies PostgreSQL, conduwuit, dnsmasq, nginx
- Shellcheck runs on all shell scripts (errors and warnings)
- Fix unused variables in sanitize-files.sh
- Use initialHashedPassword for root in VM config

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 11:04:00 -08:00
Dan 92d7646d52 Migrate Slack tokens to sops-nix, improve egress rate limits
- Remove beads from VPS deployment (kept locally for dev workflow)
- Add slack-bot-token and slack-app-token secrets with devs group access
- Remove dead acme-email secret reference
- Increase egress limits from 30/min to 150/min (burst 60→300)
- Change egress blocking from REJECT to DROP for better app behavior
- Add egress-status script for user self-diagnosis
- Update dev-slack-direct.md with new /run/secrets access patterns

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 11:14:19 -08:00
Dan df2cb13f9b Remove redundant olm permission from VM config
VM imports configuration.nix which already has the permission.
Clarified comments explaining why both flake.nix and configuration.nix
need the permission (different pkgs sources).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 22:53:07 -08:00
Dan 2aa005b300 Pin beads and opencode flake inputs to commit hashes
Prevents unexpected breakage from upstream changes.
To update: nix flake update beads opencode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 20:56:25 -08:00
Dan 80ac34fc5c Make dev-add.sh idempotent
Safe to re-run: updates SSH key and config if user exists,
creates new user if not. Matches NixOS declarative model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 20:35:10 -08:00
Dan cbda7aee2b Fix dev-add.sh to check file readability, not just existence
Change [ -f /etc/slack-dev.env ] to [ -r ... ] so users not in
devs group don't get permission denied errors on login.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:44:36 -08:00
Dan 812ffb9802 Add --dry-run flag to dev-remove.sh
Preview mode shows what would be removed without making changes.
Skips confirmation prompt and outputs cyan-colored dry-run messages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 19:40:21 -08:00
Dan 2dd5684a8b Remove unused Nix lambda patterns (deadnix cleanup) 2026-01-05 18:23:54 -08:00
Dan d3151b39ed Add mosh alternative to dev onboarding doc 2026-01-05 17:38:42 -08:00
Dan 7ea56904d4 Add mosh for mobile shell access
- mosh package in systemPackages
- UDP ports 60000-60010 for mosh sessions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:30:57 -08:00
Dan bcfdf962f3 Disable security modules pending fixes, patch ssh-hardening
ssh-hardening.nix had fatal bugs:
- UsePAM=false breaks NixOS SSH auth
- Protocol=2 deprecated, crashes modern sshd
- AllowUsers defaulted to ["admin"], locks out all users

Partial fixes applied but module still unsafe to enable.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 17:09:07 -08:00
Dan a25abda825 Add Unix social tools section to dev onboarding doc
Documents who, w, finger, write, wall, ytalk and .plan files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 15:34:55 -08:00
Dan 7832e74110 Add classic Unix social tools and fortune on login
- bsd-finger, ytalk, fortune in systemPackages
- Fortune displays on interactive shell login via programs.bash.interactiveShellInit
- Avoids breaking nix copy/rsync/scp (loginShellInit was wrong approach)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 15:32:29 -08:00
Dan 7519c88134 Sync beads 2026-01-05 09:15:25 -08:00