# ops-jrz1 Development Guidelines Auto-generated from all feature plans. Last updated: 2025-10-22 ## Active Technologies - Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts) (001-extract-matrix-platform) - mautrix-slack (Python 3.11), PostgreSQL 15.10, sops-nix (002-slack-bridge-integration) - Matrix homeserver: conduwuit (clarun.xyz) - Secrets management: sops-nix with age encryption ## Project Structure ``` . ├── hosts/ # NixOS host configurations │ └── ops-jrz1.nix # VPS configuration (45.77.205.49) ├── modules/ # NixOS modules │ ├── dev-services.nix # PostgreSQL, Forgejo, bridge coordination │ ├── mautrix-slack.nix # Slack bridge module │ └── matrix-continuwuity.nix # Matrix homeserver ├── secrets/ # sops-encrypted secrets │ └── secrets.yaml # Encrypted credentials (age) ├── specs/ # Feature specifications │ ├── 001-extract-matrix-platform/ │ └── 002-slack-bridge-integration/ │ ├── spec.md # Feature specification │ ├── plan.md # Implementation plan │ ├── research.md # Technical research findings │ ├── data-model.md # Data model & state machines │ ├── quickstart.md # Deployment runbook │ └── contracts/ # Configuration schemas ├── docs/ # Documentation │ ├── platform-vision.md # North star document │ └── worklogs/ # Deployment logs └── .specify/ # Spec-kit framework files ``` ## Commands ### Deployment ```bash # Deploy configuration to VPS nixos-rebuild switch --flake .#ops-jrz1 \ --target-host root@45.77.205.49 \ --build-host localhost # Deploy to staging nixos-rebuild switch --flake .#ops-jrz1-staging \ --target-host root@45.77.205.49 \ --build-host localhost ``` ### Bridge Management ```bash # Check bridge status ssh root@45.77.205.49 'systemctl status mautrix-slack' # View bridge logs ssh root@45.77.205.49 'journalctl -u mautrix-slack -f' # Check Socket Mode connection ssh root@45.77.205.49 'journalctl -u mautrix-slack -n 20 | grep -i socket' # Query bridge database ssh root@45.77.205.49 'sudo -u mautrix_slack psql mautrix_slack -c "SELECT * FROM portal;"' ``` ### Secrets Management ```bash # Edit encrypted secrets sops secrets/secrets.yaml # View decrypted secrets (never commit output) sops -d secrets/secrets.yaml # Add new secret sops secrets/secrets.yaml # (Edit in your $EDITOR, auto-encrypts on save) ``` ### Matrix Server ```bash # Check Matrix homeserver ssh root@45.77.205.49 'systemctl status matrix-continuwuity' # Test federation ssh root@45.77.205.49 'curl -s http://localhost:8008/_matrix/client/versions | jq .' ``` ### Database ```bash # List databases ssh root@45.77.205.49 'sudo -u postgres psql -l' # Check bridge database ssh root@45.77.205.49 'sudo -u postgres psql mautrix_slack -c "\dt"' # Backup bridge database ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack' > backup.sql ``` ## Code Style - Nix 2.x, NixOS 24.05+, Bash 5.x: Follow standard conventions - NixOS modules: Use nixpkgs module pattern (options, config, mkIf) - Configuration: Declarative over imperative - Secrets: Never hardcode, use sops-nix or interactive login - Logging: Use appropriate levels (debug for troubleshooting, info for production) ## Development Patterns ### Slack Bridge (002-slack-bridge-integration) - **Authentication**: Interactive login via Matrix chat (`login app` command) - **Socket Mode**: WebSocket connection, no public endpoint needed - **Portal Creation**: Automatic based on activity (no manual channel mapping) - **Secrets**: Stored in bridge database after authentication (not in NixOS config) - **Token Requirements**: Bot token (xoxb-) + app-level token (xapp-) ### Secrets Management - **Encryption**: Age encryption via SSH host key (/etc/ssh/ssh_host_ed25519_key) - **Storage**: secrets/secrets.yaml (encrypted, safe to commit) - **Runtime**: Decrypted to /run/secrets/ (tmpfs, cleared on reboot) - **Permissions**: 0440 for service-specific secrets, owned by service user ### Deployment Workflow 1. Make configuration changes locally 2. Commit to git 3. Deploy via nixos-rebuild 4. Verify service status and logs 5. Document in worklogs/ 6. Test functionality 7. Monitor for stability ## Git Workflow This project uses **Trunk-Based Development** for simplified collaboration and deployment. ### Branch Strategy - **main**: Single long-lived branch, always deployable - **Feature branches**: Short-lived (hours to days), naming: `###-feature-name` - **No long-lived branches**: Feature branches merge or delete quickly ### Feature Development Workflow ```bash # 1. Start feature from latest main git checkout main git pull origin main git checkout -b 003-feature-name # 2. Develop with frequent commits # Make changes, commit often with clear messages # 3. Keep main in sync (if feature takes >1 day) git checkout main git pull origin main git checkout 003-feature-name git rebase main # 4. When feature complete, merge to main git checkout main git merge 003-feature-name # Fast-forward merge preferred # 5. Tag release if deploying git tag -a v0.3.0 -m "Release notes..." git push origin main --tags # 6. Delete feature branch git branch -d 003-feature-name ``` ### Release Tagging - **Version scheme**: v0.MINOR.PATCH (semver-like) - **When to tag**: After completing and merging a feature - **Tag format**: Annotated tags with comprehensive release notes - **Example**: ```bash git tag -a v0.3.0 -m "Release v0.3.0: Feature Description - Key changes - Architecture updates - Known issues " ``` ### Branch Naming Convention - Format: `###-short-description` - Examples: `002-slack-bridge-integration`, `003-monitoring-setup` - Number matches spec directory in `specs/###-feature-name/` ### Commit Guidelines - Clear, concise commit messages - No emojis or marketing language - Focus on "what" and "why" not "how" - Group related changes in single commit - Example: "Fix bridge homeserver URL to use IPv4 (127.0.0.1) instead of localhost" ### Main Branch Protection - Always keep main deployable - Test before merging to main - Document breaking changes in commit message - Tag releases for deployment milestones ## Recent Changes - 001-extract-matrix-platform: Added Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts) - 002-slack-bridge-integration: Deployed mautrix-slack bridge with Socket Mode (2025-10-26) - Phase 0-1: Research and design complete - Phase 2: Infrastructure deployed and operational - Status: Bidirectional message flow working (Slack ↔ Matrix) - ~50 Slack channels synced to Matrix rooms ## Known Issues - olm-3.2.16 marked insecure (permitted via nixpkgs.config.permittedInsecurePackages) - Fresh database required after conduwuit version upgrades (wipe /var/lib/matrix-continuwuity/db/) ## Resolved Issues - ✅ conduwuit debug logging (reverted to "info" 2025-10-26) - ✅ Manual sender_localpart fix (automated in mautrix-slack.nix 2025-10-26) ## Testing Guidelines - Test message latency: Should be <5 seconds (FR-001, FR-002) - Test reactions, edits, file attachments - Monitor health indicators: connection_status, last_successful_message, error_count - Stability target: 99% uptime over 7-day period ## Configuration Notes ### mautrix-slack Registration File Fix (RESOLVED) **Issue:** The bridge's registration generator (`-g` flag) creates a random `sender_localpart` instead of using the configured `bot.username` value. **Root Cause:** mautrix-slack generates registration independently of `config.yaml` settings. **Solution:** ✅ Automated fix implemented in `modules/mautrix-slack.nix` (lines 339-341) The module now automatically patches the sender_localpart during registration generation: ```nix # In ExecStartPre, after registration generation: ${pkgs.gnused}/bin/sed -i "s/^sender_localpart: .*/sender_localpart: ${cfg.appservice.senderLocalpart}/" "$REG_PATH" ``` **Status:** No manual intervention required on fresh deploys. The fix is applied automatically during service startup. **Verification:** Tested 2025-10-26 - registration file correctly generated with `sender_localpart: slackbot` matching configuration. --- ## QA Testing Checklist ### Core Features (✅ Tested & Working) - [x] Bidirectional text messaging (Slack ↔ Matrix) - [x] Channel discovery and room creation (~50 channels synced) - [x] Socket Mode WebSocket connection - [x] Bot authentication with Matrix homeserver - [x] Bridge startup and recovery after restart ### Features Requiring QA Testing (⚠️ Untested) - [ ] **File Attachments** - Upload file in Slack → verify appears in Matrix - Upload file in Matrix → verify appears in Slack - Test various file types (images, PDFs, archives) - Test large files (>10MB) - [ ] **Emoji Reactions** - Add reaction in Slack → verify appears in Matrix - Add reaction in Matrix → verify appears in Slack - Remove reaction → verify syncs - [ ] **Message Edits** - Edit message in Slack → verify updates in Matrix - Edit message in Matrix → verify updates in Slack - [ ] **Message Deletion** - Delete message in Slack → verify removes from Matrix - Delete message in Matrix → verify removes from Slack - [ ] **Thread Replies** - Reply in Slack thread → verify threading in Matrix - Reply in Matrix thread → verify threading in Slack - [ ] **User Profile Sync** - Change Slack display name → verify updates Matrix puppet - Change Slack avatar → verify updates Matrix puppet - [ ] **Error Handling** - Network interruption recovery - Matrix homeserver restart handling - Slack WebSocket reconnection - Invalid token handling - [ ] **Performance** - High-volume channel (>100 messages/hour) - Large file transfer times - Message latency under load ### Test Commands ```bash # Monitor bridge during testing ssh root@45.77.205.49 'journalctl -u mautrix-slack -f' # Check for errors ssh root@45.77.205.49 'journalctl -u mautrix-slack --since "1 hour ago" | grep -E "ERR|WRN|FTL"' # Verify message flow # Test in #vlads-pad or similar channel # Send from Slack, verify in Matrix room # Send from Matrix room, verify in Slack ``` --- ## Future Infrastructure Needs ### Monitoring & Alerting (Not Implemented) **Health Checks Needed:** - Bridge WebSocket connection status - Matrix homeserver availability - Message processing latency - Database connection health - Error rate thresholds **Potential Solutions:** ```bash # Option 1: Simple systemd monitoring systemctl status mautrix-slack | grep -q "active (running)" || alert # Option 2: Prometheus + Alertmanager # - Export bridge metrics (if available) # - Alert on service down, high error rate, message lag # Option 3: Uptime monitoring # - External ping to Matrix homeserver # - Check /_matrix/client/versions endpoint # - Alert on HTTP errors or timeout ``` **Metrics to Track:** - Bridge uptime percentage - Messages processed (Slack → Matrix, Matrix → Slack) - WebSocket reconnection events - Database query performance - Error counts by type **Alert Conditions:** - Bridge down for >5 minutes - No messages processed in >15 minutes (if active channels exist) - Error rate >5% of total messages - Database connection failures - Disk space <10% free ### Backup Strategy (Not Implemented) **Critical Data:** - Matrix RocksDB: `/var/lib/matrix-continuwuity/db/` (66M) - Bridge PostgreSQL: `mautrix_slack` database (172K) - Registration files: `/var/lib/matrix-appservices/*.yaml` - Secrets: sops-encrypted `secrets/secrets.yaml` (in git) **Backup Approach:** ```bash # Daily database backups ssh root@45.77.205.49 'tar czf /root/backups/matrix-$(date +%Y%m%d).tar.gz /var/lib/matrix-continuwuity/db/' ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack > /root/backups/bridge-$(date +%Y%m%d).sql' # Retention: 7 daily, 4 weekly, 12 monthly # Store off-VPS (rsync to backup server or cloud storage) ``` **Recovery Procedure:** 1. Deploy NixOS configuration 2. Restore database backups 3. Restore registration files 4. Re-authenticate with Slack (new tokens via `login app`) 5. Verify message flow **Note:** Matrix database can be wiped and rebuilt from Slack if needed (current architecture treats Matrix as ephemeral view layer). --- ## Current Architecture State (2025-10-26) ### Deployed Services ``` ┌─────────────────────────────────────────────────────┐ │ clarun.xyz (45.77.205.49) │ │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ nginx :443 (HTTPS) │ │ │ │ - Matrix Client-Server API │ │ │ │ - Forgejo (git.clarun.xyz) │ │ │ └────────────┬────────────────────────────────┘ │ │ │ │ │ ├─→ conduwuit :8008 (127.0.0.1) │ │ │ - Matrix homeserver │ │ │ - RocksDB schema v18 │ │ │ - 66M database │ │ │ │ │ └─→ Forgejo :3000 (127.0.0.1) │ │ │ │ ┌─────────────────────────────────────────────┐ │ │ │ mautrix-slack :29319 (127.0.0.1) │ │ │ │ - Socket Mode WebSocket to Slack │ │ │ │ - PostgreSQL backend (172K) │ │ │ │ - ~50 portal rooms │ │ │ └────────────┬────────────────────────────────┘ │ │ │ │ │ └─→ PostgreSQL :5432 (unix socket) │ │ │ └─────────────────────────────────────────────────────┘ │ └─→ Slack API (Socket Mode WebSocket) - Workspace: chochacho - Bot token: xoxb-... - App token: xapp-... ``` ### Critical Networking Details - **All internal services use IPv4 (127.0.0.1)** - NOT "localhost" - Reason: `localhost` resolves to IPv6 `[::1]` but services bind IPv4-only - Fixed in: nginx proxy_pass, bridge homeserverUrl configuration ### Service Dependencies ``` postgresql.service └─→ mautrix-slack.service └─→ matrix-continuwuity.service └─→ nginx.service ``` ### Data Flow 1. **Slack → Matrix:** - Slack pushes event via Socket Mode WebSocket - Bridge receives, transforms to Matrix event - Bridge POSTs to conduwuit appservice endpoint - conduwuit distributes to Matrix rooms - Element clients receive via /sync 2. **Matrix → Slack:** - Element client sends message via conduwuit - conduwuit forwards to bridge appservice endpoint - Bridge transforms to Slack API call - Bridge POSTs to Slack API (bot token) - Appears in Slack channel ### Security Model - **Secrets:** Managed via sops-nix, deployed to `/run/secrets/` - **Bridge tokens:** - `as_token`: Bridge authenticates to Matrix - `hs_token`: Matrix authenticates to bridge - **Slack tokens:** - `xoxb-`: Bot API calls - `xapp-`: Socket Mode connection - **No public bridge endpoint:** Socket Mode eliminates webhook requirement ### Operational Notes - Matrix database disposable (can rebuild from Slack) - Bridge config fully declarative except sender_localpart fix - Fresh database recommended after conduwuit version upgrades - Debug logging currently enabled on conduwuit