musiclink/docs/approach/2026-01-21-matrix-native-routing.md

40 lines
2.7 KiB
Markdown

# Approach: Matrix-Native MusicLink Routing
## The Vector (Strategy)
* **Core Philosophy**: Minimal-invasive integration with safety rails (explicit allowlist, shadow mode, persistence) while preserving current MusicLink behavior.
* **Key Technical Decisions**:
* Decision 1: Matterbridge vs Matrix SDK -> **Matrix SDK** because correct room/thread routing requires direct room-aware event handling.
* Decision 2: SDK choice -> **mautrix-go** because the codebase is Go and it offers mature Matrix support (including threads and state handling).
* Decision 3: E2EE support -> **Not supported in v1**; bot will refuse/skip encrypted rooms and log a clear warning.
* Decision 4: Threading semantics -> **Reply in-thread when the event references a thread**, and always anchor replies with `m.in_reply_to` for compatibility.
* Decision 5: Sync token persistence -> **Required**, stored locally in a lightweight state store (e.g., SQLite in the data directory).
* Decision 6: Parallel validation -> **Shadow mode** (read + compute + log only) to avoid double-posting.
* Decision 7: Allowlist/join policy -> **Join only allowlisted rooms**; ignore or leave non-allowlisted invites.
* Decision 8: Dedup/idempotency -> **Persist processed event IDs** in the state store with a bounded TTL to prevent double replies after restarts.
* Decision 9: Rate limiting -> **Outbound queue with retry/backoff** honoring `retry_after_ms` to avoid 429 storms.
## The Architecture
* **New Components**:
* Matrix client module (sync, event filtering, reply posting).
* State store for sync tokens and event dedupe (SQLite).
* Outbound send queue with backoff.
* **Modified Components**:
* Config schema (matrix enabled/server/access token/user id/rooms/state store path).
* Message handling entrypoint to accept Matrix events.
* Logging/metrics for sync health and send failures.
* **Data Model Changes**:
* Expanded `matrix` settings with `rooms` and `stateStorePath`.
## The Risks (Blast Radius)
* **Known Unknowns**: Matrix SDK threading behavior across clients; limits of non-E2EE support in target rooms.
* **Failure Modes**:
* Reply posted to wrong room/thread due to malformed relations.
* Event loops from self-messages or duplicate sync deliveries.
* Missed messages if sync token store is corrupted or reset.
* Silent failures if encrypted rooms are allowlisted.
* Token leakage or expiration without clear operational guidance.
## The Plan Outline (High Level)
1. **Phase 1**: Implement Matrix-native mode in shadow-only operation with persistence, filtering, and observability.
2. **Phase 2**: Enable active posting, canary to a subset of rooms, then retire Matterbridge routing.