musiclink/docs/reviews/bot-transport.md

2 KiB
Raw Blame History

Code Review: Bot Transport Layers

Scope

  • internal/bot/
  • internal/matrixbot/

Findings

Strengths

  • Matterbridge bot has resilient reconnect loop with exponential backoff and ping keepalive.
  • Matrix bot implements room allowlist, self-message filtering, shadow mode, and thread-aware replies.
  • Matrix bot persists sync tokens + dedupe state; avoids reprocessing on restarts.
  • Rate-limit handling respects M_LIMIT_EXCEEDED with retry/backoff.
  • Health endpoint exposes useful counters and sync timestamps.

⚠️ Issues / Opportunities

  1. Matterbridge message channel is unbounded for handlers

    • messages buffered at 100; when full, messages drop. No metrics for drops or backpressure handling.
    • Consider exposing drop count or increasing buffer if needed.
  2. Matrix queue drops without marking dedupe

    • When send queue is full, response is dropped but the event remains unmarked in dedupe store, so a later replay could re-trigger.
    • Consider marking as processed on drop (or explicit retry queue sizing).
  3. Matrix encrypted room handling is partial

    • m.room.encryption events mark rooms as encrypted, but only after join. Encrypted rooms already in allowlist could still process early timeline events before encryption event arrives.
    • Consider checking encryption state via /state or guarding on first sync for allowlisted rooms.
  4. Matrix send loop has no shutdown drain

    • sendLoop exits on context cancel without draining queued sends; may lose last responses on shutdown.
    • Consider best-effort drain with timeout or log queue length on shutdown.
  5. State store cleanup timer tied to MarkEventProcessed

    • Cleanup only runs when processing messages, so long idle periods may keep stale entries. Low impact but worth noting.

Notes

  • The Matrix bot appropriately avoids E2EE rooms and invites, but could log a clear startup warning when any allowlisted room is encrypted.
  • Matterbridge bot closes connection on ping failure but doesnt propagate to run loop; reconnect relies on read loop exiting.