ops-jrz1/specs/002-slack-bridge-integration/research.md
Dan ca379311b8 Add Slack bridge integration feature specification
Includes spec, plan, research, data model, contracts, and quickstart guide
for mautrix-slack Socket Mode bridge deployment.
2025-10-26 14:36:44 -07:00

18 KiB

Phase 0: Research Technical Foundations

Feature: 002-slack-bridge-integration Research Date: 2025-10-22 Status: Complete

Executive Summary

This document consolidates research on five critical technical areas for implementing the Slack↔Matrix bridge using mautrix-slack with Socket Mode on NixOS.

Key Decisions:

  • Use Socket Mode (WebSocket) - no public endpoint needed
  • Use App Login (official OAuth) for production stability
  • Require 29 bot scopes + 1 app-level scope (connections:write)
  • Use sops-nix flat key structure for Slack credentials
  • Use automatic portal creation (no manual channel mapping)
  • Leverage existing NixOS module, add secrets integration

1. Slack Socket Mode

What is Socket Mode?

Socket Mode is Slack's WebSocket-based protocol (RFC 6455) that enables real-time event delivery without requiring a public HTTP endpoint.

Connection Architecture:

  1. Application calls apps.connections.open API with app-level token (xapp-)
  2. Slack responds with unique WebSocket URL: wss://wss.slack.com/link/?ticket=...
  3. Application receives events over WebSocket (Events API, interactivity)
  4. Application sends responses via standard Web API (HTTPS)

Key Characteristics:

  • No public endpoint required (ideal for behind-firewall deployments)
  • WebSocket URLs rotate dynamically (not static)
  • Up to 10 concurrent connections allowed
  • Events may be distributed across connections
  • Rate limit: 1 WebSocket URL fetch per minute (critical for reconnection)

Token Requirements

Two tokens required:

Token Type Format Purpose Scope Required
App-Level Token xapp-... Establish WebSocket connection connections:write
Bot Token xoxb-... Perform API operations 29+ bot scopes

Authentication Flow:

  1. Open Matrix DM with bridge bot (@slackbot:clarun.xyz)
  2. Send command: login app
  3. Provide both tokens when prompted
  4. Bridge stores credentials in database, establishes Socket Mode connection

Limitations and Trade-offs

Technical Constraints:

  • WebSocket connections refresh every few hours (automatic reconnection)
  • Backend container recycling causes occasional disconnects
  • Rate-limited reconnections (1 request/minute maximum)
  • Long-lived stateful connections (challenging to scale horizontally)

Production Considerations:

  • Cannot publish to Slack Marketplace (HTTP required)
  • ⚠️ Slack recommends HTTP for highest reliability
  • Socket Mode recommended for: development, local testing, behind-firewall environments

Why Socket Mode for ops-jrz1:

  1. VPS is private infrastructure (no public webhook complexity)
  2. Small team use case (2-5 engineers, moderate message volume)
  3. Security model favors minimal external exposure
  4. Trade-off of slightly lower reliability is acceptable for non-critical team comms

References


2. Slack API Scopes

Required Bot Token Scopes (29 total)

From mautrix-slack app manifest:

Message Operations:

  • chat:write - Send messages as bot
  • chat:write.public - Send to public channels without membership
  • chat:write.customize - Customize bot username/avatar (for ghosting)

Channel Access (public channels):

  • channels:read, channels:history - List and view messages
  • channels:write.invites, channels:write.topic - Manage channels

Private Channels (groups):

  • groups:read, groups:history, groups:write
  • groups:write.invites, groups:write.topic

Direct Messages:

  • im:read, im:history, im:write, im:write.topic
  • mpim:read, mpim:history, mpim:write, mpim:write.topic (group DMs)

User & Workspace:

  • users:read, users.profile:read, users:read.email
  • team:read

Rich Content:

  • files:read, files:write
  • reactions:read, reactions:write
  • pins:read, pins:write
  • emoji:read

Required App-Level Token Scopes (1 total)

  • connections:write - Establish Socket Mode WebSocket connections

Event Subscriptions (46 events)

The bridge subscribes to events including:

  • Workspace: app_uninstalled, team_domain_change
  • Channels: channel_archive, channel_created, channel_deleted, channel_rename, etc.
  • Messages: message.channels, message.groups, message.im, message.mpim
  • Interactions: reaction_added, reaction_removed, pin_added, file_shared, etc.

Security Best Practices

Principle of Least Privilege:

  • Use all 29 scopes from mautrix-slack manifest (required for full functionality)
  • Consider removing conversations.connect:write if not using Slack Connect

Token Storage:

  • Production: Use sops-nix encrypted secrets
  • Never commit tokens to version control
  • Use 0440 permissions (service user only)

Monitoring:

  • Enable IP allowlisting for token usage (Slack API feature)
  • Monitor token usage via Slack app management dashboard
  • Log all API calls for audit purposes

References


3. mautrix-slack Configuration

Current Module Structure

Location: /home/dan/proj/ops-jrz1/modules/mautrix-slack.nix

Configuration Generation (two-stage):

  1. Root stage: Creates directory structure (/var/lib/mautrix_slack/config)
  2. User stage: Generates config from example template using -e flag, merges overrides

Module Architecture:

# Key configuration sections exposed:
matrix = {
  homeserverUrl = "http://127.0.0.1:8008";
  serverName = "clarun.xyz";
};

database = {
  type = "postgres";
  uri = "postgresql:///mautrix_slack?host=/run/postgresql";
  maxOpenConnections = 32;
  maxIdleConnections = 4;
};

appservice = {
  hostname = "127.0.0.1";
  port = 29319;
  id = "slack";
  senderLocalpart = "slackbot";
  userPrefix = "slack_";
};

bridge = {
  commandPrefix = "!slack";
  permissions = { "clarun.xyz" = "user"; };
};

encryption = {
  enable = true;   # Allow E2EE
  default = false; # Don't enable by default
};

logging.level = "info";

Missing from Module Options:

  • Slack-specific configuration (workspace, tokens)
  • Socket Mode settings (bot token, app token injection)
  • Channel mapping configuration

Current Issue: Module configured for "delpadtech" workspace, exits with code 11.

Socket Mode Configuration Requirements

Based on mautrix patterns, Socket Mode credentials are likely configured via:

Option A: Interactive login (current mautrix-slack approach)

  • No config needed initially
  • Bridge prompts for tokens via Matrix chat
  • Stores in database after first login

Option B: Declarative config (would require module enhancement)

slack:
  bot_token: "${BOT_TOKEN}"  # From environment or secrets
  app_token: "${APP_TOKEN}"  # From environment or secrets

Decision: Use interactive login approach (Option A) to avoid module modifications. Tokens provided via login app command in Matrix.

Database Configuration

Current Setup (working correctly):

database = {
  type = "postgres";
  uri = "postgresql:///mautrix_slack?host=/run/postgresql";
};

Provisioning (from modules/dev-services.nix):

services.postgresql = {
  ensureDatabases = [ "mautrix_slack" ];
  ensureUsers = [{
    name = "mautrix_slack";
    ensureDBOwnership = true;
  }];
};

No database configuration issues detected.

Matrix Homeserver Integration

Appservice Registration:

  • Generated at: /var/lib/matrix-appservices/mautrix_slack_registration.yaml
  • Contains: id, url, as_token, hs_token, namespaces

Missing Step: Registration file must be loaded into conduwuit homeserver.

Required Action: Add to Matrix server configuration:

[[appservices]]
registration = "/var/lib/matrix-appservices/mautrix_slack_registration.yaml"

Exit Code 11 Root Cause Analysis

Exit Code 11 = SIGSEGV (Segmentation Fault)

Most likely causes (ranked by probability):

  1. Missing Slack credentials (95% likely)

    • Module generates config without tokens
    • Bridge crashes trying to connect with invalid/missing credentials
  2. Incomplete configuration (80% likely)

    • Example config has required fields not set
    • Bridge code doesn't validate, crashes on access
  3. olm-3.2.16 library issues (40% likely)

    • Insecure package error requires permittedInsecurePackages allowance
    • Already addressed in production config (commit 0cbbb19)
  4. SystemD security restrictions (20% likely)

    • Security hardening can cause segfaults with Go binaries
    • May need temporary relaxation (as done for mautrix-gmessages)

Validation Steps:

  1. Enable debug logging: logging.level = "debug"
  2. Check logs: journalctl -u mautrix-slack -n 100
  3. Temporarily disable security hardening
  4. Verify database connectivity
  5. Test with minimal config (no credentials - should fail gracefully)

References


4. sops-nix Secrets Management

Current Secrets Infrastructure

Encryption: Age encryption via SSH host key conversion

File: /home/dan/proj/ops-jrz1/secrets/secrets.yaml

matrix-registration-token: "..."
acme-email: "dlei@duck.com"
slack-oauth-token: ""  # Placeholder (empty)
slack-app-token: ""    # Placeholder (empty)

Age Configuration (.sops.yaml):

keys:
  - &vultr_vps age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q
  - &admin age18ue40q4fw8uggdlfag7jf5nrawvfvsnv93nurschhuynus200yjsd775v3

creation_rules:
  - path_regex: secrets/secrets\.yaml$
    key_groups:
      - age:
          - *vultr_vps  # VPS can decrypt via /etc/ssh/ssh_host_ed25519_key
          - *admin      # Admin workstation can decrypt/edit

Status: Working correctly in production (Generation 31, deployed 2025-10-22)

Secret Lifecycle

System Boot
    ↓
sops-nix activation script runs
    ↓
Reads /etc/ssh/ssh_host_ed25519_key
    ↓
Converts to age key (age1vux...)
    ↓
Decrypts secrets/secrets.yaml
    ↓
Extracts individual keys
    ↓
Writes to /run/secrets/<key-name>
    ↓
Sets ownership and permissions
    ↓
Services start (can now read secrets)

Pattern for Slack Tokens

Step 1: Update secrets.yaml

slack-oauth-token: "xoxb-YOUR-ACTUAL-TOKEN"
slack-app-token: "xapp-YOUR-ACTUAL-TOKEN"

Encrypt with: sops secrets/secrets.yaml

Step 2: Declare in hosts/ops-jrz1.nix

sops.secrets.slack-oauth-token = {
  owner = "mautrix_slack";
  group = "mautrix_slack";
  mode = "0440";
};

sops.secrets.slack-app-token = {
  owner = "mautrix_slack";
  group = "mautrix_slack";
  mode = "0440";
};

Step 3: Reference in Service (two patterns)

Pattern A: LoadCredential (systemd credentials)

systemd.services.mautrix-slack.serviceConfig = {
  LoadCredential = [
    "slack-oauth-token:/run/secrets/slack-oauth-token"
    "slack-app-token:/run/secrets/slack-app-token"
  ];
};
# Service reads from: ${CREDENTIALS_DIRECTORY}/slack-oauth-token

Pattern B: Direct file reference

services.mautrix-slack = {
  oauthTokenFile = "/run/secrets/slack-oauth-token";
  appTokenFile = "/run/secrets/slack-app-token";
};

Decision: Use interactive login approach - tokens provided via Matrix chat, not config files. Secrets will be stored in bridge database, not referenced in NixOS config. This simplifies deployment and matches mautrix-slack's intended workflow.

File Permissions Best Practices

-r--r----- (0440): Service-specific secrets (only service user + group can read)
-r--r--r-- (0444): Broadly readable secrets (e.g., email addresses)
-r-------- (0400): Root-only secrets (maximum security)

Security guarantees:

  • Secrets never in Nix store (world-readable)
  • Secrets only in /run/secrets/ (tmpfs, RAM-only)
  • Secrets cleared on reboot
  • Encrypted at rest in git (safe to commit secrets.yaml)

References


5. Channel Bridging Patterns

How Channel Mapping Works

mautrix-slack uses automatic portal creation rather than manual channel mapping:

Portal Creation Triggers:

  1. Initial login: Bridge creates portals for recent conversations (controlled by conversation_count)
  2. Receiving messages: Portal auto-created when message arrives in new channel
  3. Bot membership: Channels where Slack bot is invited are automatically bridged

Portal Types Supported:

  • Public/private channels (including Slack Connect channels)
  • Group DMs (multi-party direct messages)
  • 1:1 Direct messages

Shared Portals: Multiple Matrix users can interact with the same Slack channel through a shared Matrix room.

Configuration vs Runtime Management

Configuration-based (conversation_count in config.yaml):

  • Controls how many recent conversations sync on initial login
  • Only affects initial synchronization
  • Separate settings for channels, group DMs, direct messages

Runtime Management (automatic):

  • No manual channel mapping required
  • Portal creation happens dynamically
  • No explicit open <channel-id> command needed
  • To interact with a new channel, simply send/receive a message in Slack

Bot Commands (via Matrix DM with @slackbot:clarun.xyz):

  • help - Display available commands
  • login app - Authenticate with Slack app credentials
  • login token <token> <cookie> - Authenticate with user account (unofficial)

Adding/Removing Channels

Adding Channels: Runtime (no restart)

  • Receive a message in the channel → portal auto-created
  • Invite Slack bot to channel (app login mode) → portal auto-created

Removing Channels: ⚠️ Not explicitly documented

  • Likely has delete-portal command (based on other mautrix bridges)
  • Would be sent from within the Matrix portal room

Modifying Configuration:

  • Changes to conversation_count require bridge restart
  • However, setting only affects initial sync, not ongoing operation

Archived Channel Handling

⚠️ Not explicitly documented

Expected behavior:

  • Matrix portal remains but becomes inactive
  • No new messages flow (Slack channel is read-only)
  • Historical messages remain accessible

Recommendation: Test this scenario in pilot deployment to document actual behavior.

Gradual Rollout Strategy

Phase 1: Single Test Channel (Week 1-2)

  • Set conversation_count low (5-10)
  • Start with one channel: #dev-platform or #test
  • Verify automatic portal creation, bidirectional messaging, reactions, files

Phase 2: Small User Group (Week 3-4)

  • 3-5 team members authenticate
  • Test shared portal functionality
  • Monitor performance and reliability

Phase 3: Organic Expansion (Week 5+)

  • Don't pre-configure channel lists
  • Let automatic portal creation handle it based on usage
  • Users get portals only for channels they actively use

Configuration Strategy:

bridge:
  conversation_count: 10  # Start small, expand organically

Advantages:

  • No manual channel mapping to maintain
  • Scales naturally with usage
  • Easy to expand without configuration changes
  • Users only see channels they interact with

Key Limitations

⚠️ No traditional message backfill (history before bridge setup) ⚠️ Name changes not fully supported ⚠️ Being added to conversations only partially supported ⚠️ No documented manual open <channel-id> command

References


6. Implementation Decisions

Critical Path Decisions

Decision Point Choice Rationale
Connection Method Socket Mode (WebSocket) No public endpoint needed, matches security model
Authentication App Login (official OAuth) Production stability, clear audit trail
Token Management Interactive login via Matrix Matches mautrix-slack workflow, simplifies config
Secrets Storage sops-nix (existing pattern) Already working in production (Gen 31)
Channel Bridging Automatic portal creation No manual mapping, scales with usage
Initial Scope Single test channel Validate before expanding
Workspace chochacho (production) Real workspace with admin rights

Risks and Mitigations

Risk Probability Impact Mitigation
Exit code 11 continues High High Debug logging, relax systemd hardening, validate credentials
Socket Mode disconnects Medium Low Automatic reconnection, monitor health indicators
Token expiration Low Medium Clear error messages, documented re-authentication
Performance issues Low Medium Start with 1 channel, monitor before expanding
Slack API rate limits Low Low Respect rate limits, implement backoff

Open Questions for Implementation

  1. Exact cause of exit code 11: Requires deployment with debug logging
  2. Matrix appservice registration: Need to integrate with conduwuit config
  3. Actual conversation_count value: Determine optimal setting for initial sync
  4. Archived channel behavior: Document through testing
  5. Permission mapping: Slack roles → Matrix power levels (verify in practice)

7. Next Steps

Immediate (Phase 1):

  1. Create data-model.md (entities, relationships, state machines)
  2. Create contracts/bridge-config.yaml (configuration schema)
  3. Create contracts/secrets-schema.yaml (secrets structure)
  4. Create contracts/channel-mapping.yaml (portal configuration)
  5. Create quickstart.md (deployment runbook)
  6. Update .claude/CLAUDE.md (agent context)

Then (Phase 2):

  • Run /speckit.tasks to generate implementation task breakdown
  • Begin actual implementation based on plan.md

Document History

  • 2025-10-22: Initial research completed (5 research agents)
  • Phase 0 Status: Complete
  • Next Phase: Phase 1 (Design)