Add docs, ignore local dev config

This commit is contained in:
Dan 2025-12-08 16:31:40 -08:00
parent acfee9fea9
commit b40b88bb7f
4 changed files with 624 additions and 0 deletions

6
.gitignore vendored
View file

@ -55,3 +55,9 @@ venv/
# Worklogs (may contain sensitive troubleshooting info) # Worklogs (may contain sensitive troubleshooting info)
docs/worklogs/ docs/worklogs/
# Local dev config
.claude/
.opencode/
.envrc
.skills

90
SETUP-VERIFICATION.md Normal file
View file

@ -0,0 +1,90 @@
# Repository Setup Verification
Verify this repository is configured correctly for AI agent workflows.
## Checklist
### 1. Environment (direnv)
**Check `.envrc` exists with:**
```bash
# API keys from sops
use_api_keys
# AI Agent Skills
if [[ -f .skills ]]; then
source ~/proj/skills/bin/use-skills.sh
load_skills_from_manifest
fi
```
**Verify:** `direnv status` shows allowed, no errors on shell entry.
### 2. Skills Manifest
**Check `.skills` exists.** Choose skills based on project needs:
| Skill | Purpose | When to Use |
|-------|---------|-------------|
| `worklog` | Create org-mode worklogs in `docs/worklogs/` | Most projects - documents work sessions |
| `orch` | Multi-model consensus/analysis via external AIs | Research, architecture decisions, second opinions |
| `niri-window-capture` | Screenshot any window across workspaces | UI work, visual debugging, documentation |
| `update-claude` | Check/install Claude Code updates | Development machines |
**Common configurations:**
Minimal (documentation only):
```
worklog
```
Standard (most projects):
```
worklog
orch
```
Full:
```
worklog
orch
niri-window-capture
```
**Verify:** `.claude/skills/` contains symlinks that resolve correctly.
### 3. Beads Issue Tracking (Optional)
For repos that need local issue tracking:
**Check `.beads/` directory exists** with:
- `config.json`
- Git hooks configured
**If missing and desired:** Run `bd setup claude --project`
**Skip if:** Using external issue tracker (GitHub Issues, Gitea, etc.) as primary.
### 4. API Keys
**Verify keys are available** (after direnv loads):
```bash
echo $ANTHROPIC_API_KEY | head -c 10
echo $OPENAI_API_KEY | head -c 10
```
Should show first 10 chars (not empty).
## Actions
1. Report status of each item (✓ ok / ✗ missing / ⚠ partial)
2. Ask which skills configuration is appropriate if `.skills` is missing
3. Ask about beads setup if `.beads/` is missing
4. Fix any issues using the patterns above
5. Run `direnv reload` after any `.envrc` or `.skills` changes
6. Summarize what was checked and any changes made
## Reference
Full setup documentation: `~/proj/dotfiles/docs/new-repo-setup.md`
Available skills: `~/proj/skills/skills/`

View file

@ -0,0 +1,485 @@
# Maubot Sync Not Delivering Events to Instagram Bot
**Status:** OPEN
**Severity:** HIGH
**Date Opened:** 2025-11-01
**Reporter:** Claude (session continuation)
**Affected Component:** maubot Matrix client sync loop
**Related Worklog:** [docs/worklogs/2025-11-01-maubot-instagram-bot-deployment-and-sync-investigation.org](../worklogs/2025-11-01-maubot-instagram-bot-deployment-and-sync-investigation.org)
## Problem Statement
The Instagram bot plugin deployed to maubot appears correctly configured at all layers (service, client, instance, room membership) but is not receiving Matrix events. Messages sent to the bot's configured room (`!dT40EUcemb8e6bPiig:clarun.xyz`) do not trigger the plugin's event handler, and the client's `next_batch` token remains stuck at 40267 despite multiple messages sent.
## Symptoms
1. **No event handler execution**: Plugin's `handle_message()` method never called
2. **Stuck next_batch**: Client sync token remains at 40267 across multiple checks
3. **Silent failure**: No errors in maubot service logs
4. **Successful initialization**: Plugin logs "SocialMediaBot initialized" on startup
5. **Correct configuration**: All database verification queries show expected state
## Environment
- **Server:** clarun.xyz (45.77.205.49)
- **OS:** NixOS (unstable)
- **Matrix Homeserver:** conduwuit (http://127.0.0.1:8008)
- **Maubot Version:** (from nixpkgs-unstable)
- **Bot User:** @instagram-bot:clarun.xyz
- **Test Room:** !dT40EUcemb8e6bPiig:clarun.xyz
- **Plugin:** sna-instagram-bot v1.0.0 (6946 bytes)
- **Database:** /var/lib/maubot/bot.db (SQLite)
## Evidence Collected
### Database State Verification
#### Client Configuration
```sql
SELECT id, homeserver, user_id, sync, autojoin, enabled,
length(access_token) as token_len, next_batch
FROM client WHERE id = 'instagram-bot';
```
**Result:**
- `id`: instagram-bot
- `homeserver`: http://127.0.0.1:8008
- `user_id`: @instagram-bot:clarun.xyz
- `sync`: 1 (enabled)
- `autojoin`: 1 (enabled)
- `enabled`: 1 (enabled)
- `access_token`: 32 bytes (valid)
- `next_batch`: 40267 (NOT UPDATING)
**Verified twice with 5-second gap:** next_batch remained at 40267.
#### Instance Configuration
```sql
SELECT id, enabled, config FROM instance WHERE id = 'instagram-bot-1';
```
**Result:**
- `id`: instagram-bot-1
- `enabled`: 1
- `config`: `{"enabled": true, "max_file_size": 50000000, "allowed_rooms": ["!dT40EUcemb8e6bPiig:clarun.xyz"]}`
Configuration correct with DM room ID in allowed_rooms list.
#### Room Membership
```sql
SELECT * FROM room_state WHERE room_id = '!dT40EUcemb8e6bPiig:clarun.xyz';
```
**Result:** Bot confirmed in room.
### Service Logs
#### Maubot Service Status
```bash
ssh root@45.77.205.49 'systemctl status maubot'
```
- Service: active (running)
- No errors reported
#### Startup Logs
```bash
ssh root@45.77.205.49 'journalctl -u maubot -n 50'
```
**Observed:**
- "SocialMediaBot initialized" message appears
- No initialization errors
- No configuration errors
- No authentication errors
#### Live Monitoring During Message Test
```bash
ssh root@45.77.205.49 'journalctl -u maubot -f'
```
**Test:** User sends message to room `!dT40EUcemb8e6bPiig:clarun.xyz`
**Result:** No log entries appear. Expected "Received message in room..." (INFO level) did not occur.
### Homeserver Verification
#### Conduwuit Accessibility
```bash
ssh root@45.77.205.49 'curl -s http://127.0.0.1:8008/_matrix/client/versions | jq .'
```
**Result:** Successfully returns Matrix client API versions. Homeserver responsive.
#### Other Matrix Functionality
- Slack bridge (mautrix-slack) functioning normally
- Bidirectional message flow working
- ~50 portal rooms syncing successfully
- Element client can send/receive messages normally
**Conclusion:** Conduwuit homeserver operational; issue specific to maubot client.
### Web UI Verification
**Access:** SSH tunnel to http://localhost:29316
**Observations:**
- Bot instance shows as "started"
- Client shows as enabled
- Configuration visible and correct
- No error indicators in UI
### Code Verification
#### Event Handler Registration
```python
@event.on(EventType.ROOM_MESSAGE)
async def handle_message(self, event: MessageEvent) -> None:
self.log.info(f"Received message in room {event.room_id}")
# ... rest of handler
```
**Analysis:** Decorator syntax correct, matches maubot patterns used in other plugins.
#### Config Class Registration
```python
@classmethod
def get_config_class(cls):
return Config
```
**Analysis:** Present and correct (added during session to fix config loading).
## What Has Been Ruled Out
### ❌ Configuration Issues
- [x] Config not loaded (fixed with get_config_class())
- [x] Null config in event handler (fixed with null checks)
- [x] Empty allowed_rooms (updated to include test room)
- [x] Wrong room ID (verified against Element client URL)
- [x] Bot not in room (database shows room membership)
### ❌ Authentication Issues
- [x] Invalid access token (32 bytes, confirmed in database)
- [x] Homeserver unreachable (curl test successful)
- [x] Client disabled (database shows enabled=1)
- [x] Sync disabled (database shows sync=1)
### ❌ Code Issues
- [x] Plugin crashes on startup (logs show successful init)
- [x] Event handler syntax error (no Python errors in logs)
- [x] Missing dependencies (plugin loads successfully)
- [x] Plugin not deployed (correct file size and timestamp)
### ❌ Service Issues
- [x] Maubot service not running (systemctl shows active)
- [x] Service crashes on message (no crash logs observed)
- [x] Database locked (other queries succeed)
- [x] File permissions (service has correct access)
### ❌ Matrix Server Issues
- [x] Homeserver down (curl test succeeds, Slack bridge works)
- [x] Room doesn't exist (Element shows room, user can send messages)
- [x] Encryption preventing delivery (room not encrypted)
## Hypotheses
### Hypothesis 1: Maubot Client Sync Loop Not Starting
**Likelihood:** MEDIUM
**Evidence For:**
- next_batch not updating
- No sync activity in logs
- No events delivered despite correct config
**Evidence Against:**
- No errors in logs
- Service shows as running
- Plugin initialization succeeds
**Test:** Check maubot internal state for sync loop status (no known command for this).
### Hypothesis 2: Appservice Registration Issue
**Likelihood:** LOW
**Evidence For:**
- Events might not be routed to this specific bot
- Other appservices (Slack bridge) working suggests different registration
**Evidence Against:**
- Maubot manages registration automatically
- No errors about appservice registration
- Plugin loaded successfully
**Test:** Manually verify appservice registration file exists and is correct.
### Hypothesis 3: Race Condition in Client Initialization
**Likelihood:** MEDIUM
**Evidence For:**
- Sync might start before plugin fully ready
- next_batch stuck at initialization value
**Evidence Against:**
- Multiple service restarts didn't fix
- Configuration updates didn't trigger re-sync
**Test:** Delete client and recreate via web UI (fresh start).
### Hypothesis 4: Maubot Bug with Multiple Clients
**Likelihood:** LOW
**Evidence For:**
- No other maubot bots tested on this server
- Could be resource contention or state corruption
**Evidence Against:**
- Only one maubot instance running
- Database not corrupted (queries succeed)
**Test:** Create minimal test plugin to verify maubot sync working at all.
### Hypothesis 5: Conduwuit Appservice Event Filtering
**Likelihood:** LOW-MEDIUM
**Evidence For:**
- Events might not be sent to all appservices
- Could be namespacing or user_id filtering issue
**Evidence Against:**
- Slack bridge receives events fine
- No errors in conduwuit logs about failed delivery
**Test:** Check conduwuit logs for appservice event delivery attempts.
### Hypothesis 6: Client Sync Token Corruption
**Likelihood:** MEDIUM
**Evidence For:**
- next_batch stuck at specific value (40267)
- Might be invalid token causing sync to fail silently
**Evidence Against:**
- No errors about invalid sync token
- Value looks reasonable (numeric, not obviously corrupt)
**Test:** Reset next_batch to null or empty string, force fresh sync.
## Investigation Steps (Ordered by Priority)
### Step 1: Test with Minimal Plugin ⚡ HIGH PRIORITY
**Goal:** Determine if maubot sync works at all on this server.
**Actions:**
1. Create minimal test plugin:
```python
from maubot import Plugin
from maubot.handlers import event
from mautrix.types import EventType, MessageEvent
class TestBot(Plugin):
@event.on(EventType.ROOM_MESSAGE)
async def handle(self, evt: MessageEvent) -> None:
self.log.info(f"TEST BOT GOT MESSAGE: {evt.content.body}")
await evt.respond("pong")
```
2. Deploy as new instance
3. Send test message
4. Check if handler executes
**Expected Outcome:**
- If minimal bot works: Issue specific to Instagram bot plugin
- If minimal bot fails: Systemic maubot sync issue
### Step 2: Reset Client next_batch Token ⚡ HIGH PRIORITY
**Goal:** Force fresh sync from current state.
**Actions:**
```bash
ssh root@45.77.205.49 "sqlite3 /var/lib/maubot/bot.db \"UPDATE client SET next_batch = '' WHERE id = 'instagram-bot';\""
ssh root@45.77.205.49 'systemctl restart maubot'
```
**Expected Outcome:**
- next_batch should update to new value after sync
- Events might start delivering
**Risk:** Might cause duplicate events if sync was partially working.
### Step 3: Recreate Bot Client via Web UI
**Goal:** Fresh client state, eliminate corruption.
**Actions:**
1. SSH tunnel: `ssh -L 29316:localhost:29316 root@45.77.205.49`
2. Access http://localhost:29316
3. Delete existing client "instagram-bot"
4. Create new client with same credentials
5. Update instance to use new client
6. Test message delivery
**Expected Outcome:**
- Fresh client initialization might fix sync startup
- New access token might help if token was issue
### Step 4: Check Conduwuit Logs for Event Delivery
**Goal:** Verify homeserver attempting to deliver events to maubot appservice.
**Actions:**
```bash
ssh root@45.77.205.49 'journalctl -u matrix-continuwuity --since "5 minutes ago" | grep -i maubot'
ssh root@45.77.205.49 'journalctl -u matrix-continuwuity --since "5 minutes ago" | grep -i appservice'
```
**Look for:**
- POST requests to maubot appservice endpoint
- Event delivery attempts
- Errors or timeouts
**Expected Outcome:**
- If delivery attempts present: Maubot not processing them
- If no delivery attempts: Conduwuit not routing to maubot
### Step 5: Verify Appservice Registration
**Goal:** Ensure maubot properly registered as appservice with conduwuit.
**Actions:**
```bash
# Check maubot registration file
ssh root@45.77.205.49 'cat /var/lib/maubot/registration.yaml'
# Check conduwuit knows about it
ssh root@45.77.205.49 'cat /var/lib/matrix-continuwuity/continuwuity.toml' | grep -A 10 appservice
```
**Look for:**
- Registration file exists with correct namespace
- Tokens match between maubot and conduwuit config
- URL accessible from conduwuit
### Step 6: Enable Maubot Debug Logging
**Goal:** Get more visibility into sync loop operation.
**Actions:**
1. Edit maubot config to enable debug logging
2. Restart service
3. Monitor logs during test message
4. Look for sync loop activity
**Expected Information:**
- Sync request/response cycle
- Event processing pipeline
- Any silent failures
### Step 7: Review Maubot Source Code
**Goal:** Understand sync loop initialization and failure modes.
**Actions:**
1. Find maubot package in nix store: `nix-store -q --references $(which maubot) | grep maubot`
2. Read client sync implementation
3. Look for conditions that prevent sync from starting
4. Check for silent failure paths
**Expected Outcome:**
- Identify potential bug or misconfiguration
- Find undocumented requirements
### Step 8: Compare with Working Bot
**Goal:** Identify configuration differences.
**Actions:**
If Slack bridge bot uses maubot (it doesn't, but if another working bot exists):
1. Compare database client entries
2. Compare initialization logs
3. Compare registration files
4. Identify what working bot does differently
## Workarounds
### Workaround 1: Direct Matrix Client Library
**Status:** NOT ATTEMPTED
Instead of using maubot, implement bot using matrix-nio or mautrix-python directly.
**Pros:**
- Full control over sync loop
- Better debugging visibility
- No maubot abstraction layer
**Cons:**
- More code to write
- Lose maubot plugin infrastructure
- Need to implement own config management
### Workaround 2: Webhook-Based Bot
**Status:** NOT ATTEMPTED
Use Matrix webhook events instead of client-server sync.
**Pros:**
- No sync loop needed
- Push-based event delivery
- Potentially more reliable
**Cons:**
- Requires appservice registration
- More complex setup
- Different architecture
## Related Issues
- None currently tracked (first occurrence of this issue)
## Resources
### Documentation
- Matrix Client-Server API: https://spec.matrix.org/latest/client-server-api/
- Maubot plugin development: https://docs.mau.fi/maubot/dev/getting-started.html
- mautrix-python sync protocol: https://github.com/mautrix/python
### Similar Issues (External)
- Search maubot GitHub issues for "sync not working"
- Search Matrix spec issues for client sync failures
### Internal References
- Worklog: [docs/worklogs/2025-11-01-maubot-instagram-bot-deployment-and-sync-investigation.org](../worklogs/2025-11-01-maubot-instagram-bot-deployment-and-sync-investigation.org)
- Maubot module: [modules/maubot.nix](../../modules/maubot.nix)
- Plugin source: ~/proj/sna/instagram_bot.py
- Previous maubot worklog: [docs/worklogs/2025-10-26-maubot-planning-session.org](../worklogs/2025-10-26-maubot-planning-session.org)
## Timeline
- **2025-11-01 ~12:00**: Deployed Instagram bot plugin to maubot
- **2025-11-01 ~12:30**: Fixed config loading bugs (get_config_class, null checks)
- **2025-11-01 ~13:00**: Updated allowed_rooms configuration via database
- **2025-11-01 ~13:30**: Added INFO logging to plugin
- **2025-11-01 ~14:00**: Discovered sync not delivering events (next_batch stuck)
- **2025-11-01 ~15:00**: Verified all configuration via database and web UI
- **2025-11-01 ~15:30**: Created worklog and this investigation document
## Next Actions
**Immediate (Today):**
1. ⚡ Step 1: Deploy minimal test plugin
2. ⚡ Step 2: Reset next_batch token
3. ⚡ Step 3: Recreate client via web UI
**Short-term (This Week):**
4. Step 4: Check conduwuit event delivery logs
5. Step 5: Verify appservice registration
6. Step 6: Enable maubot debug logging
**Long-term (If Unresolved):**
7. Step 7: Review maubot source code
8. Step 8: Consider direct Matrix client library implementation
9. Report bug to maubot project if confirmed
## Contact
**Assignee:** TBD (user decision on next steps)
**Reviewer:** TBD
**Related Stakeholders:** Matrix platform users awaiting Instagram bot functionality
---
**Last Updated:** 2025-11-01 15:42 PST
**Next Review:** After attempting Steps 1-3

View file

@ -0,0 +1,43 @@
# Security Blast Radius Assessment
## Matrix Homeserver (conduwuit)
- **Exposure**: Full read/write access to all Matrix rooms, including those bridged to Slack. Attackers can impersonate any bridged or bot user and inject content that propagates back into Slack.
- **Secrets at Risk**: sops-nix decrypted secrets (registration tokens, admin credentials) if compromised in-memory or via logs.
- **Recovery Actions**: Rotate Matrix registration shared secrets and bot credentials, rebuild the homeserver from a known-good state, purge compromised accounts, revoke Slack tokens, audit room history.
## Maubot Host & Plugins
- **Exposure**: Remote code execution via malicious plugin or compromised management UI enables takeover of Maubot service, access to SQLite plugin state, and control of Matrix bot accounts. Impact extends to every bot-enabled room and associated Slack channels.
- **Secrets at Risk**: Maubot admin credentials, plugin-specific API keys, Matrix access tokens stored in sops-nix.
- **Recovery Actions**: Disable Maubot service, remove untrusted plugins, rotate admin password and Matrix tokens, redeploy trusted builds, review logs for malicious activity.
## Go Bot Services (matrix-bot-sdk-go)
- **Exposure**: Compromise of the Go bot binary or config lets attackers act as that Matrix user, echoing into Slack rooms via the bridge. Depending on service permissions, it may allow host-level pivot if the binary runs with elevated rights.
- **Secrets at Risk**: Matrix access tokens and third-party API keys stored in sops-nix.
- **Recovery Actions**: Stop service, rotate affected tokens, reissue signed binaries, inspect journald logs for exploit traces, redeploy with least-privilege systemd unit.
## mautrix-slack Bridge
- **Exposure**: Leaked Slack bot/app tokens grant broad Slack API control—impersonating users, reading channels, manipulating integrations. Matrix rooms mirrored into Slack may also be poisoned with fake messages.
- **Secrets at Risk**: Slack bot token, app-level token, database contents (ghost user mappings) if storage compromised.
- **Recovery Actions**: Revoke Slack tokens, reauthorize bridge, rotate Matrix appservice secrets, audit Slack workspace activity, redeploy bridge service.
## Host-Level Compromise (ops-jrz1 VPS)
- **Exposure**: Attackers with root own the entire stack: homeserver, Maubot, Go bots, secrets, deployment tooling. They can exfiltrate age keys, tamper with Nix closures, or install persistence.
- **Secrets at Risk**: All sops-encrypted secrets (Slack tokens, registration secrets, admin creds), SSH keys, age private keys.
- **Recovery Actions**: Reprovision VPS from clean image, rotate every downstream secret, regenerate age key pair, re-encrypt sops files, re-run deployment, audit Git/worklogs for evidence of tampering.
## Secrets Management (sops-nix)
- **Exposure**: Loss of age private key or accidental commit of decrypted files reveals all managed secrets, cascading to every service.
- **Recovery Actions**: Generate new age key pair, re-encrypt secrets, rotate dependent credentials across Matrix, Slack, and bot services.
## Mitigation Summary
- Keep Maubot management UI bound to localhost with SSH tunneling; enforce strong passwords in sops-nix.
- Run each service under dedicated, least-privilege systemd users; apply firewall rules to limit network exposure.
- Monitor journald for WARN/ERROR across Maubot, Go bot, and bridge services; push critical alerts into the Matrix admin room.
- Enforce `git status`/validation scripts to ensure decrypted secrets never persist; educate contributors on incident response procedures.
## Slack Workspace Immediate Response (Manager)
- **Cut Access**: From the Slack admin console, immediately suspend or revoke the bridge app and Matrix bot users (Settings & administration → Manage apps → [App] → Revoke access).
- **Notify Users**: Post in the designated incident channel (e.g., #general or #ops-alerts) that the bot has been disabled while the team investigates.
- **Contact Ops**: Ping the ops/security on-call via the documented channel so token rotation and investigation can begin.
(Full remediation and recovery steps are handled by the ops team in coordination with the manager.)