Priority 1 - Production Quality: - Revert Matrix homeserver log level from debug to info - Reduces log volume by ~70% (22k+ lines/day to <7k) - Improves performance and reduces disk usage Priority 2 - Technical Debt: - Automate sender_localpart fix in mautrix-slack.nix - Eliminates manual sed command on fresh deployments - Fix verified working (tested 2025-10-26) - Update CLAUDE.md to document automated solution Priority 3 - Project Hygiene: - Remove unused mautrix-whatsapp and mautrix-gmessages imports - Archive old configurations to docs/examples/alternative-deployments/ - Remove stale staging/ directories from 001 extraction workflow - Update deployment documentation in tasks.md and quickstart.md - Add deployment status notes to spec files Files Modified: - modules/dev-services.nix: log level debug → info - modules/mautrix-slack.nix: automatic sender_localpart fix - hosts/ops-jrz1.nix: remove unused bridge imports - CLAUDE.md: update Known Issues, add Resolved Issues section - specs/002-*/: add deployment status notes - configurations/ → docs/examples/alternative-deployments/ Tested and Verified: - All services running (matrix, bridge, forgejo, postgresql, nginx) - Bridge authenticated and message flow working - sender_localpart fix generates correct registration file
461 lines
16 KiB
Markdown
461 lines
16 KiB
Markdown
# ops-jrz1 Development Guidelines
|
|
|
|
Auto-generated from all feature plans. Last updated: 2025-10-22
|
|
|
|
## Active Technologies
|
|
- Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts) (001-extract-matrix-platform)
|
|
- mautrix-slack (Python 3.11), PostgreSQL 15.10, sops-nix (002-slack-bridge-integration)
|
|
- Matrix homeserver: conduwuit (clarun.xyz)
|
|
- Secrets management: sops-nix with age encryption
|
|
|
|
## Project Structure
|
|
```
|
|
.
|
|
├── hosts/ # NixOS host configurations
|
|
│ └── ops-jrz1.nix # VPS configuration (45.77.205.49)
|
|
├── modules/ # NixOS modules
|
|
│ ├── dev-services.nix # PostgreSQL, Forgejo, bridge coordination
|
|
│ ├── mautrix-slack.nix # Slack bridge module
|
|
│ └── matrix-continuwuity.nix # Matrix homeserver
|
|
├── secrets/ # sops-encrypted secrets
|
|
│ └── secrets.yaml # Encrypted credentials (age)
|
|
├── specs/ # Feature specifications
|
|
│ ├── 001-extract-matrix-platform/
|
|
│ └── 002-slack-bridge-integration/
|
|
│ ├── spec.md # Feature specification
|
|
│ ├── plan.md # Implementation plan
|
|
│ ├── research.md # Technical research findings
|
|
│ ├── data-model.md # Data model & state machines
|
|
│ ├── quickstart.md # Deployment runbook
|
|
│ └── contracts/ # Configuration schemas
|
|
├── docs/ # Documentation
|
|
│ ├── platform-vision.md # North star document
|
|
│ └── worklogs/ # Deployment logs
|
|
└── .specify/ # Spec-kit framework files
|
|
```
|
|
|
|
## Commands
|
|
|
|
### Deployment
|
|
```bash
|
|
# Deploy configuration to VPS
|
|
nixos-rebuild switch --flake .#ops-jrz1 \
|
|
--target-host root@45.77.205.49 \
|
|
--build-host localhost
|
|
|
|
# Deploy to staging
|
|
nixos-rebuild switch --flake .#ops-jrz1-staging \
|
|
--target-host root@45.77.205.49 \
|
|
--build-host localhost
|
|
```
|
|
|
|
### Bridge Management
|
|
```bash
|
|
# Check bridge status
|
|
ssh root@45.77.205.49 'systemctl status mautrix-slack'
|
|
|
|
# View bridge logs
|
|
ssh root@45.77.205.49 'journalctl -u mautrix-slack -f'
|
|
|
|
# Check Socket Mode connection
|
|
ssh root@45.77.205.49 'journalctl -u mautrix-slack -n 20 | grep -i socket'
|
|
|
|
# Query bridge database
|
|
ssh root@45.77.205.49 'sudo -u mautrix_slack psql mautrix_slack -c "SELECT * FROM portal;"'
|
|
```
|
|
|
|
### Secrets Management
|
|
```bash
|
|
# Edit encrypted secrets
|
|
sops secrets/secrets.yaml
|
|
|
|
# View decrypted secrets (never commit output)
|
|
sops -d secrets/secrets.yaml
|
|
|
|
# Add new secret
|
|
sops secrets/secrets.yaml
|
|
# (Edit in your $EDITOR, auto-encrypts on save)
|
|
```
|
|
|
|
### Matrix Server
|
|
```bash
|
|
# Check Matrix homeserver
|
|
ssh root@45.77.205.49 'systemctl status matrix-continuwuity'
|
|
|
|
# Test federation
|
|
ssh root@45.77.205.49 'curl -s http://localhost:8008/_matrix/client/versions | jq .'
|
|
```
|
|
|
|
### Database
|
|
```bash
|
|
# List databases
|
|
ssh root@45.77.205.49 'sudo -u postgres psql -l'
|
|
|
|
# Check bridge database
|
|
ssh root@45.77.205.49 'sudo -u postgres psql mautrix_slack -c "\dt"'
|
|
|
|
# Backup bridge database
|
|
ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack' > backup.sql
|
|
```
|
|
|
|
## Code Style
|
|
- Nix 2.x, NixOS 24.05+, Bash 5.x: Follow standard conventions
|
|
- NixOS modules: Use nixpkgs module pattern (options, config, mkIf)
|
|
- Configuration: Declarative over imperative
|
|
- Secrets: Never hardcode, use sops-nix or interactive login
|
|
- Logging: Use appropriate levels (debug for troubleshooting, info for production)
|
|
|
|
## Development Patterns
|
|
|
|
### Slack Bridge (002-slack-bridge-integration)
|
|
- **Authentication**: Interactive login via Matrix chat (`login app` command)
|
|
- **Socket Mode**: WebSocket connection, no public endpoint needed
|
|
- **Portal Creation**: Automatic based on activity (no manual channel mapping)
|
|
- **Secrets**: Stored in bridge database after authentication (not in NixOS config)
|
|
- **Token Requirements**: Bot token (xoxb-) + app-level token (xapp-)
|
|
|
|
### Secrets Management
|
|
- **Encryption**: Age encryption via SSH host key (/etc/ssh/ssh_host_ed25519_key)
|
|
- **Storage**: secrets/secrets.yaml (encrypted, safe to commit)
|
|
- **Runtime**: Decrypted to /run/secrets/ (tmpfs, cleared on reboot)
|
|
- **Permissions**: 0440 for service-specific secrets, owned by service user
|
|
|
|
### Deployment Workflow
|
|
1. Make configuration changes locally
|
|
2. Commit to git
|
|
3. Deploy via nixos-rebuild
|
|
4. Verify service status and logs
|
|
5. Document in worklogs/
|
|
6. Test functionality
|
|
7. Monitor for stability
|
|
|
|
## Git Workflow
|
|
|
|
This project uses **Trunk-Based Development** for simplified collaboration and deployment.
|
|
|
|
### Branch Strategy
|
|
- **main**: Single long-lived branch, always deployable
|
|
- **Feature branches**: Short-lived (hours to days), naming: `###-feature-name`
|
|
- **No long-lived branches**: Feature branches merge or delete quickly
|
|
|
|
### Feature Development Workflow
|
|
```bash
|
|
# 1. Start feature from latest main
|
|
git checkout main
|
|
git pull origin main
|
|
git checkout -b 003-feature-name
|
|
|
|
# 2. Develop with frequent commits
|
|
# Make changes, commit often with clear messages
|
|
|
|
# 3. Keep main in sync (if feature takes >1 day)
|
|
git checkout main
|
|
git pull origin main
|
|
git checkout 003-feature-name
|
|
git rebase main
|
|
|
|
# 4. When feature complete, merge to main
|
|
git checkout main
|
|
git merge 003-feature-name # Fast-forward merge preferred
|
|
|
|
# 5. Tag release if deploying
|
|
git tag -a v0.3.0 -m "Release notes..."
|
|
git push origin main --tags
|
|
|
|
# 6. Delete feature branch
|
|
git branch -d 003-feature-name
|
|
```
|
|
|
|
### Release Tagging
|
|
- **Version scheme**: v0.MINOR.PATCH (semver-like)
|
|
- **When to tag**: After completing and merging a feature
|
|
- **Tag format**: Annotated tags with comprehensive release notes
|
|
- **Example**:
|
|
```bash
|
|
git tag -a v0.3.0 -m "Release v0.3.0: Feature Description
|
|
|
|
- Key changes
|
|
- Architecture updates
|
|
- Known issues
|
|
"
|
|
```
|
|
|
|
### Branch Naming Convention
|
|
- Format: `###-short-description`
|
|
- Examples: `002-slack-bridge-integration`, `003-monitoring-setup`
|
|
- Number matches spec directory in `specs/###-feature-name/`
|
|
|
|
### Commit Guidelines
|
|
- Clear, concise commit messages
|
|
- No emojis or marketing language
|
|
- Focus on "what" and "why" not "how"
|
|
- Group related changes in single commit
|
|
- Example: "Fix bridge homeserver URL to use IPv4 (127.0.0.1) instead of localhost"
|
|
|
|
### Main Branch Protection
|
|
- Always keep main deployable
|
|
- Test before merging to main
|
|
- Document breaking changes in commit message
|
|
- Tag releases for deployment milestones
|
|
|
|
## Recent Changes
|
|
- 001-extract-matrix-platform: Added Nix 2.x, NixOS 24.05+, Bash 5.x (for scripts)
|
|
- 002-slack-bridge-integration: Deployed mautrix-slack bridge with Socket Mode (2025-10-26)
|
|
- Phase 0-1: Research and design complete
|
|
- Phase 2: Infrastructure deployed and operational
|
|
- Status: Bidirectional message flow working (Slack ↔ Matrix)
|
|
- ~50 Slack channels synced to Matrix rooms
|
|
|
|
## Known Issues
|
|
- olm-3.2.16 marked insecure (permitted via nixpkgs.config.permittedInsecurePackages)
|
|
- Fresh database required after conduwuit version upgrades (wipe /var/lib/matrix-continuwuity/db/)
|
|
|
|
## Resolved Issues
|
|
- ✅ conduwuit debug logging (reverted to "info" 2025-10-26)
|
|
- ✅ Manual sender_localpart fix (automated in mautrix-slack.nix 2025-10-26)
|
|
|
|
## Testing Guidelines
|
|
- Test message latency: Should be <5 seconds (FR-001, FR-002)
|
|
- Test reactions, edits, file attachments
|
|
- Monitor health indicators: connection_status, last_successful_message, error_count
|
|
- Stability target: 99% uptime over 7-day period
|
|
|
|
<!-- MANUAL ADDITIONS START -->
|
|
|
|
## Configuration Notes
|
|
|
|
### mautrix-slack Registration File Fix (RESOLVED)
|
|
|
|
**Issue:** The bridge's registration generator (`-g` flag) creates a random `sender_localpart` instead of using the configured `bot.username` value.
|
|
|
|
**Root Cause:** mautrix-slack generates registration independently of `config.yaml` settings.
|
|
|
|
**Solution:** ✅ Automated fix implemented in `modules/mautrix-slack.nix` (lines 339-341)
|
|
|
|
The module now automatically patches the sender_localpart during registration generation:
|
|
```nix
|
|
# In ExecStartPre, after registration generation:
|
|
${pkgs.gnused}/bin/sed -i "s/^sender_localpart: .*/sender_localpart: ${cfg.appservice.senderLocalpart}/" "$REG_PATH"
|
|
```
|
|
|
|
**Status:** No manual intervention required on fresh deploys. The fix is applied automatically during service startup.
|
|
|
|
**Verification:** Tested 2025-10-26 - registration file correctly generated with `sender_localpart: slackbot` matching configuration.
|
|
|
|
---
|
|
|
|
## QA Testing Checklist
|
|
|
|
### Core Features (✅ Tested & Working)
|
|
- [x] Bidirectional text messaging (Slack ↔ Matrix)
|
|
- [x] Channel discovery and room creation (~50 channels synced)
|
|
- [x] Socket Mode WebSocket connection
|
|
- [x] Bot authentication with Matrix homeserver
|
|
- [x] Bridge startup and recovery after restart
|
|
|
|
### Features Requiring QA Testing (⚠️ Untested)
|
|
- [ ] **File Attachments**
|
|
- Upload file in Slack → verify appears in Matrix
|
|
- Upload file in Matrix → verify appears in Slack
|
|
- Test various file types (images, PDFs, archives)
|
|
- Test large files (>10MB)
|
|
|
|
- [ ] **Emoji Reactions**
|
|
- Add reaction in Slack → verify appears in Matrix
|
|
- Add reaction in Matrix → verify appears in Slack
|
|
- Remove reaction → verify syncs
|
|
|
|
- [ ] **Message Edits**
|
|
- Edit message in Slack → verify updates in Matrix
|
|
- Edit message in Matrix → verify updates in Slack
|
|
|
|
- [ ] **Message Deletion**
|
|
- Delete message in Slack → verify removes from Matrix
|
|
- Delete message in Matrix → verify removes from Slack
|
|
|
|
- [ ] **Thread Replies**
|
|
- Reply in Slack thread → verify threading in Matrix
|
|
- Reply in Matrix thread → verify threading in Slack
|
|
|
|
- [ ] **User Profile Sync**
|
|
- Change Slack display name → verify updates Matrix puppet
|
|
- Change Slack avatar → verify updates Matrix puppet
|
|
|
|
- [ ] **Error Handling**
|
|
- Network interruption recovery
|
|
- Matrix homeserver restart handling
|
|
- Slack WebSocket reconnection
|
|
- Invalid token handling
|
|
|
|
- [ ] **Performance**
|
|
- High-volume channel (>100 messages/hour)
|
|
- Large file transfer times
|
|
- Message latency under load
|
|
|
|
### Test Commands
|
|
```bash
|
|
# Monitor bridge during testing
|
|
ssh root@45.77.205.49 'journalctl -u mautrix-slack -f'
|
|
|
|
# Check for errors
|
|
ssh root@45.77.205.49 'journalctl -u mautrix-slack --since "1 hour ago" | grep -E "ERR|WRN|FTL"'
|
|
|
|
# Verify message flow
|
|
# Test in #vlads-pad or similar channel
|
|
# Send from Slack, verify in Matrix room
|
|
# Send from Matrix room, verify in Slack
|
|
```
|
|
|
|
---
|
|
|
|
## Future Infrastructure Needs
|
|
|
|
### Monitoring & Alerting (Not Implemented)
|
|
|
|
**Health Checks Needed:**
|
|
- Bridge WebSocket connection status
|
|
- Matrix homeserver availability
|
|
- Message processing latency
|
|
- Database connection health
|
|
- Error rate thresholds
|
|
|
|
**Potential Solutions:**
|
|
```bash
|
|
# Option 1: Simple systemd monitoring
|
|
systemctl status mautrix-slack | grep -q "active (running)" || alert
|
|
|
|
# Option 2: Prometheus + Alertmanager
|
|
# - Export bridge metrics (if available)
|
|
# - Alert on service down, high error rate, message lag
|
|
|
|
# Option 3: Uptime monitoring
|
|
# - External ping to Matrix homeserver
|
|
# - Check /_matrix/client/versions endpoint
|
|
# - Alert on HTTP errors or timeout
|
|
```
|
|
|
|
**Metrics to Track:**
|
|
- Bridge uptime percentage
|
|
- Messages processed (Slack → Matrix, Matrix → Slack)
|
|
- WebSocket reconnection events
|
|
- Database query performance
|
|
- Error counts by type
|
|
|
|
**Alert Conditions:**
|
|
- Bridge down for >5 minutes
|
|
- No messages processed in >15 minutes (if active channels exist)
|
|
- Error rate >5% of total messages
|
|
- Database connection failures
|
|
- Disk space <10% free
|
|
|
|
### Backup Strategy (Not Implemented)
|
|
|
|
**Critical Data:**
|
|
- Matrix RocksDB: `/var/lib/matrix-continuwuity/db/` (66M)
|
|
- Bridge PostgreSQL: `mautrix_slack` database (172K)
|
|
- Registration files: `/var/lib/matrix-appservices/*.yaml`
|
|
- Secrets: sops-encrypted `secrets/secrets.yaml` (in git)
|
|
|
|
**Backup Approach:**
|
|
```bash
|
|
# Daily database backups
|
|
ssh root@45.77.205.49 'tar czf /root/backups/matrix-$(date +%Y%m%d).tar.gz /var/lib/matrix-continuwuity/db/'
|
|
ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack > /root/backups/bridge-$(date +%Y%m%d).sql'
|
|
|
|
# Retention: 7 daily, 4 weekly, 12 monthly
|
|
# Store off-VPS (rsync to backup server or cloud storage)
|
|
```
|
|
|
|
**Recovery Procedure:**
|
|
1. Deploy NixOS configuration
|
|
2. Restore database backups
|
|
3. Restore registration files
|
|
4. Re-authenticate with Slack (new tokens via `login app`)
|
|
5. Verify message flow
|
|
|
|
**Note:** Matrix database can be wiped and rebuilt from Slack if needed (current architecture treats Matrix as ephemeral view layer).
|
|
|
|
---
|
|
|
|
## Current Architecture State (2025-10-26)
|
|
|
|
### Deployed Services
|
|
```
|
|
┌─────────────────────────────────────────────────────┐
|
|
│ clarun.xyz (45.77.205.49) │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────┐ │
|
|
│ │ nginx :443 (HTTPS) │ │
|
|
│ │ - Matrix Client-Server API │ │
|
|
│ │ - Forgejo (git.clarun.xyz) │ │
|
|
│ └────────────┬────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ├─→ conduwuit :8008 (127.0.0.1) │
|
|
│ │ - Matrix homeserver │
|
|
│ │ - RocksDB schema v18 │
|
|
│ │ - 66M database │
|
|
│ │ │
|
|
│ └─→ Forgejo :3000 (127.0.0.1) │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────┐ │
|
|
│ │ mautrix-slack :29319 (127.0.0.1) │ │
|
|
│ │ - Socket Mode WebSocket to Slack │ │
|
|
│ │ - PostgreSQL backend (172K) │ │
|
|
│ │ - ~50 portal rooms │ │
|
|
│ └────────────┬────────────────────────────────┘ │
|
|
│ │ │
|
|
│ └─→ PostgreSQL :5432 (unix socket) │
|
|
│ │
|
|
└─────────────────────────────────────────────────────┘
|
|
│
|
|
└─→ Slack API (Socket Mode WebSocket)
|
|
- Workspace: chochacho
|
|
- Bot token: xoxb-...
|
|
- App token: xapp-...
|
|
```
|
|
|
|
### Critical Networking Details
|
|
- **All internal services use IPv4 (127.0.0.1)** - NOT "localhost"
|
|
- Reason: `localhost` resolves to IPv6 `[::1]` but services bind IPv4-only
|
|
- Fixed in: nginx proxy_pass, bridge homeserverUrl configuration
|
|
|
|
### Service Dependencies
|
|
```
|
|
postgresql.service
|
|
└─→ mautrix-slack.service
|
|
└─→ matrix-continuwuity.service
|
|
└─→ nginx.service
|
|
```
|
|
|
|
### Data Flow
|
|
1. **Slack → Matrix:**
|
|
- Slack pushes event via Socket Mode WebSocket
|
|
- Bridge receives, transforms to Matrix event
|
|
- Bridge POSTs to conduwuit appservice endpoint
|
|
- conduwuit distributes to Matrix rooms
|
|
- Element clients receive via /sync
|
|
|
|
2. **Matrix → Slack:**
|
|
- Element client sends message via conduwuit
|
|
- conduwuit forwards to bridge appservice endpoint
|
|
- Bridge transforms to Slack API call
|
|
- Bridge POSTs to Slack API (bot token)
|
|
- Appears in Slack channel
|
|
|
|
### Security Model
|
|
- **Secrets:** Managed via sops-nix, deployed to `/run/secrets/`
|
|
- **Bridge tokens:**
|
|
- `as_token`: Bridge authenticates to Matrix
|
|
- `hs_token`: Matrix authenticates to bridge
|
|
- **Slack tokens:**
|
|
- `xoxb-`: Bot API calls
|
|
- `xapp-`: Socket Mode connection
|
|
- **No public bridge endpoint:** Socket Mode eliminates webhook requirement
|
|
|
|
### Operational Notes
|
|
- Matrix database disposable (can rebuild from Slack)
|
|
- Bridge config fully declarative except sender_localpart fix
|
|
- Fresh database recommended after conduwuit version upgrades
|
|
- Debug logging currently enabled on conduwuit
|
|
|
|
<!-- MANUAL ADDITIONS END --> |