musiclink/docs/approach/2026-01-21-matrix-native-routing.md

2.7 KiB

Approach: Matrix-Native MusicLink Routing

The Vector (Strategy)

  • Core Philosophy: Minimal-invasive integration with safety rails (explicit allowlist, shadow mode, persistence) while preserving current MusicLink behavior.
  • Key Technical Decisions:
    • Decision 1: Matterbridge vs Matrix SDK -> Matrix SDK because correct room/thread routing requires direct room-aware event handling.
    • Decision 2: SDK choice -> mautrix-go because the codebase is Go and it offers mature Matrix support (including threads and state handling).
    • Decision 3: E2EE support -> Not supported in v1; bot will refuse/skip encrypted rooms and log a clear warning.
    • Decision 4: Threading semantics -> Reply in-thread when the event references a thread, and always anchor replies with m.in_reply_to for compatibility.
    • Decision 5: Sync token persistence -> Required, stored locally in a lightweight state store (e.g., SQLite in the data directory).
    • Decision 6: Parallel validation -> Shadow mode (read + compute + log only) to avoid double-posting.
    • Decision 7: Allowlist/join policy -> Join only allowlisted rooms; ignore or leave non-allowlisted invites.
    • Decision 8: Dedup/idempotency -> Persist processed event IDs in the state store with a bounded TTL to prevent double replies after restarts.
    • Decision 9: Rate limiting -> Outbound queue with retry/backoff honoring retry_after_ms to avoid 429 storms.

The Architecture

  • New Components:
    • Matrix client module (sync, event filtering, reply posting).
    • State store for sync tokens and event dedupe (SQLite).
    • Outbound send queue with backoff.
  • Modified Components:
    • Config schema (matrix enabled/server/access token/user id/rooms/state store path).
    • Message handling entrypoint to accept Matrix events.
    • Logging/metrics for sync health and send failures.
  • Data Model Changes:
    • Expanded matrix settings with rooms and stateStorePath.

The Risks (Blast Radius)

  • Known Unknowns: Matrix SDK threading behavior across clients; limits of non-E2EE support in target rooms.
  • Failure Modes:
    • Reply posted to wrong room/thread due to malformed relations.
    • Event loops from self-messages or duplicate sync deliveries.
    • Missed messages if sync token store is corrupted or reset.
    • Silent failures if encrypted rooms are allowlisted.
    • Token leakage or expiration without clear operational guidance.

The Plan Outline (High Level)

  1. Phase 1: Implement Matrix-native mode in shadow-only operation with persistence, filtering, and observability.
  2. Phase 2: Enable active posting, canary to a subset of rooms, then retire Matterbridge routing.