17 KiB
ops-jrz1 Development Guidelines
Auto-generated from all feature plans. Last updated: 2025-10-22
Active Technologies
- Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts) (001-extract-matrix-platform)
- mautrix-slack (Python 3.11), PostgreSQL 15.10, sops-nix (002-slack-bridge-integration)
- Matrix homeserver: conduwuit (clarun.xyz)
- Secrets management: sops-nix with age encryption
Project Structure
.
├── hosts/ # NixOS host configurations
│ └── ops-jrz1.nix # VPS configuration (45.77.205.49)
├── modules/ # NixOS modules
│ ├── dev-services.nix # PostgreSQL, Forgejo, bridge coordination
│ ├── mautrix-slack.nix # Slack bridge module
│ └── matrix-continuwuity.nix # Matrix homeserver
├── secrets/ # sops-encrypted secrets
│ └── secrets.yaml # Encrypted credentials (age)
├── specs/ # Feature specifications
│ ├── 001-extract-matrix-platform/
│ └── 002-slack-bridge-integration/
│ ├── spec.md # Feature specification
│ ├── plan.md # Implementation plan
│ ├── research.md # Technical research findings
│ ├── data-model.md # Data model & state machines
│ ├── quickstart.md # Deployment runbook
│ └── contracts/ # Configuration schemas
├── docs/ # Documentation
│ ├── platform-vision.md # North star document
│ └── worklogs/ # Deployment logs
└── .specify/ # Spec-kit framework files
Commands
Deployment
# Deploy configuration to VPS
nixos-rebuild switch --flake .#ops-jrz1 \
--target-host root@45.77.205.49 \
--build-host localhost
# Deploy to staging
nixos-rebuild switch --flake .#ops-jrz1-staging \
--target-host root@45.77.205.49 \
--build-host localhost
Bridge Management
# Check bridge status
ssh root@45.77.205.49 'systemctl status mautrix-slack'
# View bridge logs
ssh root@45.77.205.49 'journalctl -u mautrix-slack -f'
# Check Socket Mode connection
ssh root@45.77.205.49 'journalctl -u mautrix-slack -n 20 | grep -i socket'
# Query bridge database
ssh root@45.77.205.49 'sudo -u mautrix_slack psql mautrix_slack -c "SELECT * FROM portal;"'
Secrets Management
# Edit encrypted secrets
sops secrets/secrets.yaml
# View decrypted secrets (never commit output)
sops -d secrets/secrets.yaml
# Add new secret
sops secrets/secrets.yaml
# (Edit in your $EDITOR, auto-encrypts on save)
Matrix Server
# Check Matrix homeserver
ssh root@45.77.205.49 'systemctl status matrix-continuwuity'
# Test federation
ssh root@45.77.205.49 'curl -s http://localhost:8008/_matrix/client/versions | jq .'
Database
# List databases
ssh root@45.77.205.49 'sudo -u postgres psql -l'
# Check bridge database
ssh root@45.77.205.49 'sudo -u postgres psql mautrix_slack -c "\dt"'
# Backup bridge database
ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack' > backup.sql
Code Style
- Nix 2.x, NixOS 24.05+, Bash 5.x: Follow standard conventions
- NixOS modules: Use nixpkgs module pattern (options, config, mkIf)
- Configuration: Declarative over imperative
- Secrets: Never hardcode, use sops-nix or interactive login
- Logging: Use appropriate levels (debug for troubleshooting, info for production)
Development Patterns
Slack Bridge (002-slack-bridge-integration)
- Authentication: Interactive login via Matrix chat (
login appcommand) - Socket Mode: WebSocket connection, no public endpoint needed
- Portal Creation: Automatic based on activity (no manual channel mapping)
- Secrets: Stored in bridge database after authentication (not in NixOS config)
- Token Requirements: Bot token (xoxb-) + app-level token (xapp-)
Secrets Management
- Encryption: Age encryption via SSH host key (/etc/ssh/ssh_host_ed25519_key)
- Storage: secrets/secrets.yaml (encrypted, safe to commit)
- Runtime: Decrypted to /run/secrets/ (tmpfs, cleared on reboot)
- Permissions: 0440 for service-specific secrets, owned by service user
Deployment Workflow
- Make configuration changes locally
- Commit to git
- Deploy via nixos-rebuild
- Verify service status and logs
- Document in worklogs/
- Test functionality
- Monitor for stability
Git Workflow
This project uses Trunk-Based Development for simplified collaboration and deployment.
Branch Strategy
- main: Single long-lived branch, always deployable
- Feature branches: Short-lived (hours to days), naming:
###-feature-name - No long-lived branches: Feature branches merge or delete quickly
Feature Development Workflow
# 1. Start feature from latest main
git checkout main
git pull origin main
git checkout -b 003-feature-name
# 2. Develop with frequent commits
# Make changes, commit often with clear messages
# 3. Keep main in sync (if feature takes >1 day)
git checkout main
git pull origin main
git checkout 003-feature-name
git rebase main
# 4. When feature complete, merge to main
git checkout main
git merge 003-feature-name # Fast-forward merge preferred
# 5. Tag release if deploying
git tag -a v0.3.0 -m "Release notes..."
git push origin main --tags
# 6. Delete feature branch
git branch -d 003-feature-name
Release Tagging
- Version scheme: v0.MINOR.PATCH (semver-like)
- When to tag: After completing and merging a feature
- Tag format: Annotated tags with comprehensive release notes
- Example:
git tag -a v0.3.0 -m "Release v0.3.0: Feature Description - Key changes - Architecture updates - Known issues "
Branch Naming Convention
- Format:
###-short-description - Examples:
002-slack-bridge-integration,003-monitoring-setup - Number matches spec directory in
specs/###-feature-name/
Commit Guidelines
- Clear, concise commit messages
- No emojis or marketing language
- Focus on "what" and "why" not "how"
- Group related changes in single commit
- Example: "Fix bridge homeserver URL to use IPv4 (127.0.0.1) instead of localhost"
Main Branch Protection
- Always keep main deployable
- Test before merging to main
- Document breaking changes in commit message
- Tag releases for deployment milestones
Recent Changes
- 001-extract-matrix-platform: Added Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts)
- 002-slack-bridge-integration: Deployed mautrix-slack bridge with Socket Mode (2025-10-26)
- Phase 0-1: Research and design complete
- Phase 2: Infrastructure deployed and operational
- Status: Bidirectional message flow working (Slack ↔ Matrix)
- ~50 Slack channels synced to Matrix rooms
Known Issues
- olm-3.2.16 marked insecure (permitted via nixpkgs.config.permittedInsecurePackages)
- conduwuit log level set to "debug" (intended for troubleshooting, consider reverting to "info")
- Fresh database required after conduwuit version upgrades (wipe /var/lib/matrix-continuwuity/db/)
Testing Guidelines
- Test message latency: Should be <5 seconds (FR-001, FR-002)
- Test reactions, edits, file attachments
- Monitor health indicators: connection_status, last_successful_message, error_count
- Stability target: 99% uptime over 7-day period
Manual Configuration Workarounds
mautrix-slack Registration File Fix (KNOWN ISSUE)
Problem: The bridge's registration generator creates a random sender_localpart instead of using the configured bot.username value.
Current Manual Fix (Required on Fresh Deploy):
# After bridge service starts and generates registration
ssh root@45.77.205.49 'systemctl stop mautrix-slack'
# Edit registration file to fix sender_localpart
ssh root@45.77.205.49 "sed -i 's/^sender_localpart: .*/sender_localpart: slackbot/' /var/lib/matrix-appservices/mautrix_slack_registration.yaml"
# Re-register appservice in Matrix admin room
# In Element, send to admin room:
# !admin appservices unregister slack
# !admin appservices register
# <paste corrected YAML>
# Restart homeserver to load new registration
ssh root@45.77.205.49 'systemctl restart matrix-continuwuity'
# Start bridge
ssh root@45.77.205.49 'systemctl start mautrix-slack'
Root Cause: mautrix-slack's -g flag generates registration independently of config.yaml settings.
Potential Permanent Fix: Patch modules/mautrix-slack.nix to post-process registration file after generation:
# In ExecStartPre, after registration generation:
${pkgs.gnused}/bin/sed -i 's/^sender_localpart: .*/sender_localpart: ${cfg.appservice.senderLocalpart}/' "$REG_PATH"
Impact: Without this fix, registration sender_localpart won't match bridge config, causing authentication failures.
QA Testing Checklist
Core Features (✅ Tested & Working)
- Bidirectional text messaging (Slack ↔ Matrix)
- Channel discovery and room creation (~50 channels synced)
- Socket Mode WebSocket connection
- Bot authentication with Matrix homeserver
- Bridge startup and recovery after restart
Features Requiring QA Testing (⚠️ Untested)
-
File Attachments
- Upload file in Slack → verify appears in Matrix
- Upload file in Matrix → verify appears in Slack
- Test various file types (images, PDFs, archives)
- Test large files (>10MB)
-
Emoji Reactions
- Add reaction in Slack → verify appears in Matrix
- Add reaction in Matrix → verify appears in Slack
- Remove reaction → verify syncs
-
Message Edits
- Edit message in Slack → verify updates in Matrix
- Edit message in Matrix → verify updates in Slack
-
Message Deletion
- Delete message in Slack → verify removes from Matrix
- Delete message in Matrix → verify removes from Slack
-
Thread Replies
- Reply in Slack thread → verify threading in Matrix
- Reply in Matrix thread → verify threading in Slack
-
User Profile Sync
- Change Slack display name → verify updates Matrix puppet
- Change Slack avatar → verify updates Matrix puppet
-
Error Handling
- Network interruption recovery
- Matrix homeserver restart handling
- Slack WebSocket reconnection
- Invalid token handling
-
Performance
- High-volume channel (>100 messages/hour)
- Large file transfer times
- Message latency under load
Test Commands
# Monitor bridge during testing
ssh root@45.77.205.49 'journalctl -u mautrix-slack -f'
# Check for errors
ssh root@45.77.205.49 'journalctl -u mautrix-slack --since "1 hour ago" | grep -E "ERR|WRN|FTL"'
# Verify message flow
# Test in #vlads-pad or similar channel
# Send from Slack, verify in Matrix room
# Send from Matrix room, verify in Slack
Future Infrastructure Needs
Monitoring & Alerting (Not Implemented)
Health Checks Needed:
- Bridge WebSocket connection status
- Matrix homeserver availability
- Message processing latency
- Database connection health
- Error rate thresholds
Potential Solutions:
# Option 1: Simple systemd monitoring
systemctl status mautrix-slack | grep -q "active (running)" || alert
# Option 2: Prometheus + Alertmanager
# - Export bridge metrics (if available)
# - Alert on service down, high error rate, message lag
# Option 3: Uptime monitoring
# - External ping to Matrix homeserver
# - Check /_matrix/client/versions endpoint
# - Alert on HTTP errors or timeout
Metrics to Track:
- Bridge uptime percentage
- Messages processed (Slack → Matrix, Matrix → Slack)
- WebSocket reconnection events
- Database query performance
- Error counts by type
Alert Conditions:
- Bridge down for >5 minutes
- No messages processed in >15 minutes (if active channels exist)
- Error rate >5% of total messages
- Database connection failures
- Disk space <10% free
Backup Strategy (Not Implemented)
Critical Data:
- Matrix RocksDB:
/var/lib/matrix-continuwuity/db/(66M) - Bridge PostgreSQL:
mautrix_slackdatabase (172K) - Registration files:
/var/lib/matrix-appservices/*.yaml - Secrets: sops-encrypted
secrets/secrets.yaml(in git)
Backup Approach:
# Daily database backups
ssh root@45.77.205.49 'tar czf /root/backups/matrix-$(date +%Y%m%d).tar.gz /var/lib/matrix-continuwuity/db/'
ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack > /root/backups/bridge-$(date +%Y%m%d).sql'
# Retention: 7 daily, 4 weekly, 12 monthly
# Store off-VPS (rsync to backup server or cloud storage)
Recovery Procedure:
- Deploy NixOS configuration
- Restore database backups
- Restore registration files
- Re-authenticate with Slack (new tokens via
login app) - Verify message flow
Note: Matrix database can be wiped and rebuilt from Slack if needed (current architecture treats Matrix as ephemeral view layer).
Current Architecture State (2025-10-26)
Deployed Services
┌─────────────────────────────────────────────────────┐
│ clarun.xyz (45.77.205.49) │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ nginx :443 (HTTPS) │ │
│ │ - Matrix Client-Server API │ │
│ │ - Forgejo (git.clarun.xyz) │ │
│ └────────────┬────────────────────────────────┘ │
│ │ │
│ ├─→ conduwuit :8008 (127.0.0.1) │
│ │ - Matrix homeserver │
│ │ - RocksDB schema v18 │
│ │ - 66M database │
│ │ │
│ └─→ Forgejo :3000 (127.0.0.1) │
│ │
│ ┌─────────────────────────────────────────────┐ │
│ │ mautrix-slack :29319 (127.0.0.1) │ │
│ │ - Socket Mode WebSocket to Slack │ │
│ │ - PostgreSQL backend (172K) │ │
│ │ - ~50 portal rooms │ │
│ └────────────┬────────────────────────────────┘ │
│ │ │
│ └─→ PostgreSQL :5432 (unix socket) │
│ │
└─────────────────────────────────────────────────────┘
│
└─→ Slack API (Socket Mode WebSocket)
- Workspace: chochacho
- Bot token: xoxb-...
- App token: xapp-...
Critical Networking Details
- All internal services use IPv4 (127.0.0.1) - NOT "localhost"
- Reason:
localhostresolves to IPv6[::1]but services bind IPv4-only - Fixed in: nginx proxy_pass, bridge homeserverUrl configuration
Service Dependencies
postgresql.service
└─→ mautrix-slack.service
└─→ matrix-continuwuity.service
└─→ nginx.service
Data Flow
-
Slack → Matrix:
- Slack pushes event via Socket Mode WebSocket
- Bridge receives, transforms to Matrix event
- Bridge POSTs to conduwuit appservice endpoint
- conduwuit distributes to Matrix rooms
- Element clients receive via /sync
-
Matrix → Slack:
- Element client sends message via conduwuit
- conduwuit forwards to bridge appservice endpoint
- Bridge transforms to Slack API call
- Bridge POSTs to Slack API (bot token)
- Appears in Slack channel
Security Model
- Secrets: Managed via sops-nix, deployed to
/run/secrets/ - Bridge tokens:
as_token: Bridge authenticates to Matrixhs_token: Matrix authenticates to bridge
- Slack tokens:
xoxb-: Bot API callsxapp-: Socket Mode connection
- No public bridge endpoint: Socket Mode eliminates webhook requirement
Operational Notes
- Matrix database disposable (can rebuild from Slack)
- Bridge config fully declarative except sender_localpart fix
- Fresh database recommended after conduwuit version upgrades
- Debug logging currently enabled on conduwuit