ops-jrz1/specs/002-slack-bridge-integration/research.md
Dan ca379311b8 Add Slack bridge integration feature specification
Includes spec, plan, research, data model, contracts, and quickstart guide
for mautrix-slack Socket Mode bridge deployment.
2025-10-26 14:36:44 -07:00

572 lines
18 KiB
Markdown

# Phase 0: Research Technical Foundations
**Feature**: 002-slack-bridge-integration
**Research Date**: 2025-10-22
**Status**: Complete
## Executive Summary
This document consolidates research on five critical technical areas for implementing the Slack↔Matrix bridge using mautrix-slack with Socket Mode on NixOS.
**Key Decisions**:
- ✅ Use Socket Mode (WebSocket) - no public endpoint needed
- ✅ Use App Login (official OAuth) for production stability
- ✅ Require 29 bot scopes + 1 app-level scope (`connections:write`)
- ✅ Use sops-nix flat key structure for Slack credentials
- ✅ Use automatic portal creation (no manual channel mapping)
- ✅ Leverage existing NixOS module, add secrets integration
---
## 1. Slack Socket Mode
### What is Socket Mode?
Socket Mode is Slack's **WebSocket-based protocol** (RFC 6455) that enables real-time event delivery without requiring a public HTTP endpoint.
**Connection Architecture**:
1. Application calls `apps.connections.open` API with app-level token (xapp-)
2. Slack responds with unique WebSocket URL: `wss://wss.slack.com/link/?ticket=...`
3. Application receives events over WebSocket (Events API, interactivity)
4. Application sends responses via standard Web API (HTTPS)
**Key Characteristics**:
- No public endpoint required (ideal for behind-firewall deployments)
- WebSocket URLs rotate dynamically (not static)
- Up to 10 concurrent connections allowed
- Events may be distributed across connections
- Rate limit: **1 WebSocket URL fetch per minute** (critical for reconnection)
### Token Requirements
**Two tokens required**:
| Token Type | Format | Purpose | Scope Required |
|------------|--------|---------|----------------|
| App-Level Token | `xapp-...` | Establish WebSocket connection | `connections:write` |
| Bot Token | `xoxb-...` | Perform API operations | 29+ bot scopes |
**Authentication Flow**:
1. Open Matrix DM with bridge bot (`@slackbot:clarun.xyz`)
2. Send command: `login app`
3. Provide both tokens when prompted
4. Bridge stores credentials in database, establishes Socket Mode connection
### Limitations and Trade-offs
**Technical Constraints**:
- WebSocket connections refresh every few hours (automatic reconnection)
- Backend container recycling causes occasional disconnects
- Rate-limited reconnections (1 request/minute maximum)
- Long-lived stateful connections (challenging to scale horizontally)
**Production Considerations**:
- ❌ Cannot publish to Slack Marketplace (HTTP required)
- ⚠️ Slack recommends HTTP for highest reliability
- ✅ Socket Mode recommended for: development, local testing, behind-firewall environments
**Why Socket Mode for ops-jrz1**:
1. VPS is private infrastructure (no public webhook complexity)
2. Small team use case (2-5 engineers, moderate message volume)
3. Security model favors minimal external exposure
4. Trade-off of slightly lower reliability is acceptable for non-critical team comms
### References
- [Socket Mode overview](https://docs.slack.dev/apis/events-api/using-socket-mode)
- [HTTP vs Socket Mode comparison](https://docs.slack.dev/apis/events-api/comparing-http-socket-mode)
- [mautrix-slack authentication](https://docs.mau.fi/bridges/go/slack/authentication.html)
---
## 2. Slack API Scopes
### Required Bot Token Scopes (29 total)
From [mautrix-slack app manifest](https://github.com/mautrix/slack/blob/main/app-manifest.yaml):
**Message Operations**:
- `chat:write` - Send messages as bot
- `chat:write.public` - Send to public channels without membership
- `chat:write.customize` - Customize bot username/avatar (for ghosting)
**Channel Access** (public channels):
- `channels:read`, `channels:history` - List and view messages
- `channels:write.invites`, `channels:write.topic` - Manage channels
**Private Channels** (groups):
- `groups:read`, `groups:history`, `groups:write`
- `groups:write.invites`, `groups:write.topic`
**Direct Messages**:
- `im:read`, `im:history`, `im:write`, `im:write.topic`
- `mpim:read`, `mpim:history`, `mpim:write`, `mpim:write.topic` (group DMs)
**User & Workspace**:
- `users:read`, `users.profile:read`, `users:read.email`
- `team:read`
**Rich Content**:
- `files:read`, `files:write`
- `reactions:read`, `reactions:write`
- `pins:read`, `pins:write`
- `emoji:read`
### Required App-Level Token Scopes (1 total)
- `connections:write` - Establish Socket Mode WebSocket connections
### Event Subscriptions (46 events)
The bridge subscribes to events including:
- Workspace: `app_uninstalled`, `team_domain_change`
- Channels: `channel_archive`, `channel_created`, `channel_deleted`, `channel_rename`, etc.
- Messages: `message.channels`, `message.groups`, `message.im`, `message.mpim`
- Interactions: `reaction_added`, `reaction_removed`, `pin_added`, `file_shared`, etc.
### Security Best Practices
**Principle of Least Privilege**:
- Use all 29 scopes from mautrix-slack manifest (required for full functionality)
- Consider removing `conversations.connect:write` if not using Slack Connect
**Token Storage**:
- ✅ Production: Use sops-nix encrypted secrets
- ✅ Never commit tokens to version control
- ✅ Use 0440 permissions (service user only)
**Monitoring**:
- Enable IP allowlisting for token usage (Slack API feature)
- Monitor token usage via Slack app management dashboard
- Log all API calls for audit purposes
### References
- [Permission Scopes Reference](https://api.slack.com/scopes)
- [mautrix-slack app manifest](https://github.com/mautrix/slack/blob/main/app-manifest.yaml)
---
## 3. mautrix-slack Configuration
### Current Module Structure
**Location**: `/home/dan/proj/ops-jrz1/modules/mautrix-slack.nix`
**Configuration Generation** (two-stage):
1. **Root stage**: Creates directory structure (`/var/lib/mautrix_slack/config`)
2. **User stage**: Generates config from example template using `-e` flag, merges overrides
**Module Architecture**:
```nix
# Key configuration sections exposed:
matrix = {
homeserverUrl = "http://127.0.0.1:8008";
serverName = "clarun.xyz";
};
database = {
type = "postgres";
uri = "postgresql:///mautrix_slack?host=/run/postgresql";
maxOpenConnections = 32;
maxIdleConnections = 4;
};
appservice = {
hostname = "127.0.0.1";
port = 29319;
id = "slack";
senderLocalpart = "slackbot";
userPrefix = "slack_";
};
bridge = {
commandPrefix = "!slack";
permissions = { "clarun.xyz" = "user"; };
};
encryption = {
enable = true; # Allow E2EE
default = false; # Don't enable by default
};
logging.level = "info";
```
**Missing from Module Options**:
- Slack-specific configuration (workspace, tokens)
- Socket Mode settings (bot token, app token injection)
- Channel mapping configuration
**Current Issue**: Module configured for "delpadtech" workspace, exits with code 11.
### Socket Mode Configuration Requirements
Based on mautrix patterns, Socket Mode credentials are likely configured via:
**Option A: Interactive login** (current mautrix-slack approach)
- No config needed initially
- Bridge prompts for tokens via Matrix chat
- Stores in database after first login
**Option B: Declarative config** (would require module enhancement)
```yaml
slack:
bot_token: "${BOT_TOKEN}" # From environment or secrets
app_token: "${APP_TOKEN}" # From environment or secrets
```
**Decision**: Use **interactive login** approach (Option A) to avoid module modifications. Tokens provided via `login app` command in Matrix.
### Database Configuration
**Current Setup** (working correctly):
```nix
database = {
type = "postgres";
uri = "postgresql:///mautrix_slack?host=/run/postgresql";
};
```
**Provisioning** (from `modules/dev-services.nix`):
```nix
services.postgresql = {
ensureDatabases = [ "mautrix_slack" ];
ensureUsers = [{
name = "mautrix_slack";
ensureDBOwnership = true;
}];
};
```
✅ No database configuration issues detected.
### Matrix Homeserver Integration
**Appservice Registration**:
- Generated at: `/var/lib/matrix-appservices/mautrix_slack_registration.yaml`
- Contains: `id`, `url`, `as_token`, `hs_token`, `namespaces`
**Missing Step**: Registration file must be loaded into conduwuit homeserver.
**Required Action**: Add to Matrix server configuration:
```toml
[[appservices]]
registration = "/var/lib/matrix-appservices/mautrix_slack_registration.yaml"
```
### Exit Code 11 Root Cause Analysis
**Exit Code 11 = SIGSEGV** (Segmentation Fault)
**Most likely causes** (ranked by probability):
1. **Missing Slack credentials** (95% likely)
- Module generates config without tokens
- Bridge crashes trying to connect with invalid/missing credentials
2. **Incomplete configuration** (80% likely)
- Example config has required fields not set
- Bridge code doesn't validate, crashes on access
3. **olm-3.2.16 library issues** (40% likely)
- Insecure package error requires `permittedInsecurePackages` allowance
- Already addressed in production config (commit 0cbbb19)
4. **SystemD security restrictions** (20% likely)
- Security hardening can cause segfaults with Go binaries
- May need temporary relaxation (as done for mautrix-gmessages)
**Validation Steps**:
1. Enable debug logging: `logging.level = "debug"`
2. Check logs: `journalctl -u mautrix-slack -n 100`
3. Temporarily disable security hardening
4. Verify database connectivity
5. Test with minimal config (no credentials - should fail gracefully)
### References
- [mautrix-slack GitHub](https://github.com/mautrix/slack)
- [mautrix docs](https://docs.mau.fi/bridges/go/slack/)
- Project file: `/home/dan/proj/ops-jrz1/modules/mautrix-slack.nix`
---
## 4. sops-nix Secrets Management
### Current Secrets Infrastructure
**Encryption**: Age encryption via SSH host key conversion
**File**: `/home/dan/proj/ops-jrz1/secrets/secrets.yaml`
```yaml
matrix-registration-token: "..."
acme-email: "dlei@duck.com"
slack-oauth-token: "" # Placeholder (empty)
slack-app-token: "" # Placeholder (empty)
```
**Age Configuration** (`.sops.yaml`):
```yaml
keys:
- &vultr_vps age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q
- &admin age18ue40q4fw8uggdlfag7jf5nrawvfvsnv93nurschhuynus200yjsd775v3
creation_rules:
- path_regex: secrets/secrets\.yaml$
key_groups:
- age:
- *vultr_vps # VPS can decrypt via /etc/ssh/ssh_host_ed25519_key
- *admin # Admin workstation can decrypt/edit
```
**Status**: ✅ Working correctly in production (Generation 31, deployed 2025-10-22)
### Secret Lifecycle
```
System Boot
sops-nix activation script runs
Reads /etc/ssh/ssh_host_ed25519_key
Converts to age key (age1vux...)
Decrypts secrets/secrets.yaml
Extracts individual keys
Writes to /run/secrets/<key-name>
Sets ownership and permissions
Services start (can now read secrets)
```
### Pattern for Slack Tokens
**Step 1: Update secrets.yaml**
```yaml
slack-oauth-token: "xoxb-YOUR-ACTUAL-TOKEN"
slack-app-token: "xapp-YOUR-ACTUAL-TOKEN"
```
Encrypt with: `sops secrets/secrets.yaml`
**Step 2: Declare in hosts/ops-jrz1.nix**
```nix
sops.secrets.slack-oauth-token = {
owner = "mautrix_slack";
group = "mautrix_slack";
mode = "0440";
};
sops.secrets.slack-app-token = {
owner = "mautrix_slack";
group = "mautrix_slack";
mode = "0440";
};
```
**Step 3: Reference in Service** (two patterns)
**Pattern A: LoadCredential** (systemd credentials)
```nix
systemd.services.mautrix-slack.serviceConfig = {
LoadCredential = [
"slack-oauth-token:/run/secrets/slack-oauth-token"
"slack-app-token:/run/secrets/slack-app-token"
];
};
# Service reads from: ${CREDENTIALS_DIRECTORY}/slack-oauth-token
```
**Pattern B: Direct file reference**
```nix
services.mautrix-slack = {
oauthTokenFile = "/run/secrets/slack-oauth-token";
appTokenFile = "/run/secrets/slack-app-token";
};
```
**Decision**: Use **interactive login approach** - tokens provided via Matrix chat, not config files. Secrets will be stored in bridge database, not referenced in NixOS config. This simplifies deployment and matches mautrix-slack's intended workflow.
### File Permissions Best Practices
```
-r--r----- (0440): Service-specific secrets (only service user + group can read)
-r--r--r-- (0444): Broadly readable secrets (e.g., email addresses)
-r-------- (0400): Root-only secrets (maximum security)
```
**Security guarantees**:
- ✅ Secrets never in Nix store (world-readable)
- ✅ Secrets only in `/run/secrets/` (tmpfs, RAM-only)
- ✅ Secrets cleared on reboot
- ✅ Encrypted at rest in git (safe to commit secrets.yaml)
### References
- [sops-nix GitHub](https://github.com/Mic92/sops-nix)
- [Michael Stapelberg's Blog](https://michael.stapelberg.ch/posts/2025-08-24-nixos-sops-nix/) (2025-08-24)
- Project file: `/home/dan/proj/ops-jrz1/secrets/secrets.yaml`
---
## 5. Channel Bridging Patterns
### How Channel Mapping Works
mautrix-slack uses **automatic portal creation** rather than manual channel mapping:
**Portal Creation Triggers**:
1. **Initial login**: Bridge creates portals for recent conversations (controlled by `conversation_count`)
2. **Receiving messages**: Portal auto-created when message arrives in new channel
3. **Bot membership**: Channels where Slack bot is invited are automatically bridged
**Portal Types Supported**:
- Public/private channels (including Slack Connect channels)
- Group DMs (multi-party direct messages)
- 1:1 Direct messages
**Shared Portals**: Multiple Matrix users can interact with the same Slack channel through a shared Matrix room.
### Configuration vs Runtime Management
**Configuration-based** (`conversation_count` in config.yaml):
- Controls how many recent conversations sync on initial login
- Only affects initial synchronization
- Separate settings for channels, group DMs, direct messages
**Runtime Management** (automatic):
- No manual channel mapping required
- Portal creation happens dynamically
- No explicit `open <channel-id>` command needed
- To interact with a new channel, simply send/receive a message in Slack
**Bot Commands** (via Matrix DM with `@slackbot:clarun.xyz`):
- `help` - Display available commands
- `login app` - Authenticate with Slack app credentials
- `login token <token> <cookie>` - Authenticate with user account (unofficial)
### Adding/Removing Channels
**Adding Channels**: ✅ **Runtime (no restart)**
- Receive a message in the channel → portal auto-created
- Invite Slack bot to channel (app login mode) → portal auto-created
**Removing Channels**: ⚠️ **Not explicitly documented**
- Likely has `delete-portal` command (based on other mautrix bridges)
- Would be sent from within the Matrix portal room
**Modifying Configuration**:
- Changes to `conversation_count` require bridge restart
- However, setting only affects initial sync, not ongoing operation
### Archived Channel Handling
⚠️ **Not explicitly documented**
Expected behavior:
- Matrix portal remains but becomes inactive
- No new messages flow (Slack channel is read-only)
- Historical messages remain accessible
**Recommendation**: Test this scenario in pilot deployment to document actual behavior.
### Gradual Rollout Strategy
**Phase 1: Single Test Channel** (Week 1-2)
- Set `conversation_count` low (5-10)
- Start with one channel: `#dev-platform` or `#test`
- Verify automatic portal creation, bidirectional messaging, reactions, files
**Phase 2: Small User Group** (Week 3-4)
- 3-5 team members authenticate
- Test shared portal functionality
- Monitor performance and reliability
**Phase 3: Organic Expansion** (Week 5+)
- Don't pre-configure channel lists
- Let automatic portal creation handle it based on usage
- Users get portals only for channels they actively use
**Configuration Strategy**:
```yaml
bridge:
conversation_count: 10 # Start small, expand organically
```
**Advantages**:
- No manual channel mapping to maintain
- Scales naturally with usage
- Easy to expand without configuration changes
- Users only see channels they interact with
### Key Limitations
⚠️ No traditional message backfill (history before bridge setup)
⚠️ Name changes not fully supported
⚠️ Being added to conversations only partially supported
⚠️ No documented manual `open <channel-id>` command
### References
- [mautrix-slack docs](https://docs.mau.fi/bridges/go/slack/)
- [ROADMAP.md](https://github.com/mautrix/slack/blob/main/ROADMAP.md)
- Support room: #slack:maunium.net
---
## 6. Implementation Decisions
### Critical Path Decisions
| Decision Point | Choice | Rationale |
|----------------|--------|-----------|
| **Connection Method** | Socket Mode (WebSocket) | No public endpoint needed, matches security model |
| **Authentication** | App Login (official OAuth) | Production stability, clear audit trail |
| **Token Management** | Interactive login via Matrix | Matches mautrix-slack workflow, simplifies config |
| **Secrets Storage** | sops-nix (existing pattern) | Already working in production (Gen 31) |
| **Channel Bridging** | Automatic portal creation | No manual mapping, scales with usage |
| **Initial Scope** | Single test channel | Validate before expanding |
| **Workspace** | chochacho (production) | Real workspace with admin rights |
### Risks and Mitigations
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| Exit code 11 continues | High | High | Debug logging, relax systemd hardening, validate credentials |
| Socket Mode disconnects | Medium | Low | Automatic reconnection, monitor health indicators |
| Token expiration | Low | Medium | Clear error messages, documented re-authentication |
| Performance issues | Low | Medium | Start with 1 channel, monitor before expanding |
| Slack API rate limits | Low | Low | Respect rate limits, implement backoff |
### Open Questions for Implementation
1. **Exact cause of exit code 11**: Requires deployment with debug logging
2. **Matrix appservice registration**: Need to integrate with conduwuit config
3. **Actual `conversation_count` value**: Determine optimal setting for initial sync
4. **Archived channel behavior**: Document through testing
5. **Permission mapping**: Slack roles → Matrix power levels (verify in practice)
---
## 7. Next Steps
**Immediate** (Phase 1):
1. ✅ Create `data-model.md` (entities, relationships, state machines)
2. ✅ Create `contracts/bridge-config.yaml` (configuration schema)
3. ✅ Create `contracts/secrets-schema.yaml` (secrets structure)
4. ✅ Create `contracts/channel-mapping.yaml` (portal configuration)
5. ✅ Create `quickstart.md` (deployment runbook)
6. ✅ Update `.claude/CLAUDE.md` (agent context)
**Then** (Phase 2):
- Run `/speckit.tasks` to generate implementation task breakdown
- Begin actual implementation based on plan.md
---
## Document History
- **2025-10-22**: Initial research completed (5 research agents)
- **Phase 0 Status**: ✅ Complete
- **Next Phase**: Phase 1 (Design)