ops-jrz1/specs/002-slack-bridge-integration/tasks.md
Dan f25a8b06ef Production hardening and technical debt cleanup
Priority 1 - Production Quality:
- Revert Matrix homeserver log level from debug to info
- Reduces log volume by ~70% (22k+ lines/day to <7k)
- Improves performance and reduces disk usage

Priority 2 - Technical Debt:
- Automate sender_localpart fix in mautrix-slack.nix
- Eliminates manual sed command on fresh deployments
- Fix verified working (tested 2025-10-26)
- Update CLAUDE.md to document automated solution

Priority 3 - Project Hygiene:
- Remove unused mautrix-whatsapp and mautrix-gmessages imports
- Archive old configurations to docs/examples/alternative-deployments/
- Remove stale staging/ directories from 001 extraction workflow
- Update deployment documentation in tasks.md and quickstart.md
- Add deployment status notes to spec files

Files Modified:
- modules/dev-services.nix: log level debug → info
- modules/mautrix-slack.nix: automatic sender_localpart fix
- hosts/ops-jrz1.nix: remove unused bridge imports
- CLAUDE.md: update Known Issues, add Resolved Issues section
- specs/002-*/: add deployment status notes
- configurations/ → docs/examples/alternative-deployments/

Tested and Verified:
- All services running (matrix, bridge, forgejo, postgresql, nginx)
- Bridge authenticated and message flow working
- sender_localpart fix generates correct registration file
2025-10-26 15:59:05 -07:00

276 lines
15 KiB
Markdown

# Tasks: Matrix-Slack Bridge Integration
**⚠️ DEPLOYMENT STATUS**: This feature was deployed successfully on 2025-10-26 following a manual troubleshooting process rather than this task list. For the actual deployment path taken, see `docs/worklogs/2025-10-26-slack-bridge-deployment-complete.org`. This task list represents the original planned approach and is preserved for reference.
**Input**: Design documents from `/specs/002-slack-bridge-integration/`
**Prerequisites**: plan.md, spec.md, research.md, data-model.md, contracts/, quickstart.md
**Tests**: Manual integration testing only (no automated test tasks per spec.md testing strategy)
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
## Format: `[ID] [P?] [Story] Description`
- **[P]**: Can run in parallel (different files, no dependencies)
- **[Story]**: Which user story this task belongs to (e.g., US1, US2, US3, US4)
- Include exact file paths in descriptions
## Path Conventions
- Infrastructure configuration project (NixOS modules)
- Primary files: `modules/mautrix-slack.nix`, `hosts/ops-jrz1.nix`, `secrets/secrets.yaml`
- Documentation: `docs/worklogs/*.org`
---
## Phase 1: Setup (Shared Infrastructure)
**Purpose**: Slack app configuration and secrets preparation (external prerequisites)
- [ ] T001 Create Slack app using mautrix-slack app manifest from https://github.com/mautrix/slack/blob/main/app-manifest.yaml
- [ ] T002 Enable Socket Mode in Slack app settings and generate app-level token (xapp-) with connections:write scope
- [ ] T003 Install Slack app to chochacho workspace and copy bot token (xoxb-)
- [ ] T004 [P] Document app setup process in docs/worklogs/2025-10-22-slack-app-setup.org
- [ ] T005 Verify Slack app has all 29 required bot scopes per research.md section 2
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Core infrastructure that MUST be complete before ANY user story can be implemented
**⚠️ CRITICAL**: No user story work can begin until this phase is complete
- [ ] T006 Add slack-oauth-token and slack-app-token to secrets/secrets.yaml using sops secrets/secrets.yaml
- [ ] T007 Verify secrets decrypt correctly on VPS using ssh root@45.77.205.49 'sops -d /path/to/secrets.yaml'
- [X] T008 Update modules/mautrix-slack.nix to change workspace from "delpadtech" to "chochacho"
- [X] T009 Verify PostgreSQL database mautrix_slack exists using ssh root@45.77.205.49 'sudo -u postgres psql -l | grep mautrix_slack'
- [X] T010 Verify Matrix homeserver conduwuit is running on port 8008 using ssh root@45.77.205.49 'curl -s http://localhost:8008/_matrix/client/versions'
**Checkpoint**: Foundation ready - user story implementation can now begin in parallel
---
## Phase 3: User Story 1 - Slack to Matrix Message Delivery (Priority: P1) 🎯 MVP
**Goal**: Relay messages from Slack to Matrix within 5 seconds with sender identity preserved
**Independent Test**: Send test message in Slack #dev-platform, verify appears in Matrix room within 5 seconds with correct sender
### Implementation for User Story 1
- [ ] T011 [US1] Update hosts/ops-jrz1.nix to enable services.mautrix-slack with homeserverUrl http://127.0.0.1:8008 and serverName clarun.xyz
- [ ] T012 [US1] Configure database connection in hosts/ops-jrz1.nix with uri postgresql:///mautrix_slack?host=/run/postgresql
- [ ] T013 [US1] Set logging.level to "debug" in hosts/ops-jrz1.nix for initial deployment troubleshooting
- [ ] T014 [US1] Deploy configuration to VPS using nixos-rebuild switch --flake .#ops-jrz1 --target-host root@45.77.205.49 --build-host localhost
- [ ] T015 [US1] Verify mautrix-slack service started using ssh root@45.77.205.49 'systemctl status mautrix-slack'
- [ ] T016 [US1] Check service logs for startup errors using ssh root@45.77.205.49 'journalctl -u mautrix-slack -n 50'
- [ ] T017 [US1] Verify appservice registration file created at /var/lib/matrix-appservices/mautrix_slack_registration.yaml
- [ ] T018 [US1] Add appservice registration to Matrix homeserver configuration (conduwuit continuwuity.toml)
- [ ] T019 [US1] Restart Matrix homeserver to load appservice using ssh root@45.77.205.49 'systemctl restart matrix-continuwuity'
- [ ] T020 [US1] Open Matrix DM with @slackbot:clarun.xyz and verify bot responds
- [ ] T021 [US1] Authenticate bridge by sending "login app" command and providing bot token and app token
- [ ] T022 [US1] Verify Socket Mode connection established in logs using ssh root@45.77.205.49 'journalctl -u mautrix-slack -f | grep -i socket'
- [ ] T023 [US1] Accept invitation to Matrix room for #dev-platform channel
- [ ] T024 [US1] Test Slack→Matrix relay by posting "Test message from Slack" in #dev-platform and verifying it appears in Matrix within 5 seconds
- [ ] T025 [US1] Verify sender identity preserved (message shows from ghost user @slack_USERID:clarun.xyz)
- [ ] T026 [US1] Test emoji preservation by posting message with emoji in Slack and verifying emoji appears in Matrix
- [ ] T027 [US1] Test multi-line message formatting by posting multi-line message in Slack and verifying line breaks preserved in Matrix
- [ ] T028 [US1] Test file attachment by uploading file to Slack and verifying link appears in Matrix
- [ ] T029 [US1] Document US1 validation results in docs/worklogs/2025-10-22-us1-slack-to-matrix-validation.org
**Checkpoint**: At this point, User Story 1 should be fully functional and testable independently
---
## Phase 4: User Story 2 - Matrix to Slack Message Delivery (Priority: P1)
**Goal**: Relay messages from Matrix to Slack within 5 seconds with Matrix username preserved
**Independent Test**: Send test message from Matrix room, verify appears in Slack channel within 5 seconds with Matrix username
### Implementation for User Story 2
- [ ] T030 [US2] Test Matrix→Slack relay by posting "Test message from Matrix" in bridged Matrix room and verifying it appears in Slack within 5 seconds
- [ ] T031 [US2] Verify Matrix username appears in Slack message (bridge bot posts with sender attribution)
- [ ] T032 [US2] Test markdown formatting by posting message with **bold** and *italic* in Matrix and verifying formatting converted in Slack
- [ ] T033 [US2] Test user mention by mentioning another Matrix user and verifying translated to Slack @username format
- [ ] T034 [US2] Test attachment posting from Matrix and verify link appears in Slack
- [ ] T035 [US2] Verify bidirectional message flow works simultaneously (send messages from both sides)
- [ ] T036 [US2] Document US2 validation results in docs/worklogs/2025-10-22-us2-matrix-to-slack-validation.org
**Checkpoint**: At this point, User Stories 1 AND 2 should both work independently (full bidirectional communication)
---
## Phase 5: User Story 3 - Bridge Service Reliability (Priority: P2)
**Goal**: Bridge starts automatically on boot and recovers from failures without manual intervention
**Independent Test**: Reboot server and verify bridge auto-starts within 2 minutes, or simulate network failure and verify auto-recovery
### Implementation for User Story 3
- [ ] T037 [US3] Verify systemd service has Restart=always in modules/mautrix-slack.nix serviceConfig
- [ ] T038 [US3] Verify service has After=network-online.target postgresql.service matrix-continuwuity.service in systemd dependencies
- [ ] T039 [US3] Test auto-start by rebooting VPS using ssh root@45.77.205.49 'reboot'
- [ ] T040 [US3] Verify bridge service started automatically within 2 minutes using systemctl status mautrix-slack
- [ ] T041 [US3] Verify messages relay successfully after reboot without manual intervention
- [ ] T042 [US3] Test connection recovery by simulating Slack API outage (temporarily revoke token, then restore)
- [ ] T043 [US3] Verify bridge reconnects automatically after token restored without manual restart
- [ ] T044 [US3] Test Matrix homeserver recovery by restarting conduwuit using ssh root@45.77.205.49 'systemctl restart matrix-continuwuity'
- [ ] T045 [US3] Verify bridge re-establishes connection to Matrix automatically
- [ ] T046 [US3] Test configuration error handling by temporarily breaking config and verifying clear diagnostic message in logs
- [ ] T047 [US3] Verify health indicators logged (connection status, last message timestamp, error count) using journalctl -u mautrix-slack --since "1 hour ago"
- [ ] T048 [US3] Document US3 validation results including recovery times in docs/worklogs/2025-10-22-us3-reliability-validation.org
**Checkpoint**: All P1 and P2 user stories should now be independently functional
---
## Phase 6: User Story 4 - Bridge Configuration Management (Priority: P3)
**Goal**: Administrators can configure channel bridges declaratively without code or manual restarts
**Independent Test**: Add new channel to configuration, reload config, verify new bridge established automatically
### Implementation for User Story 4
- [ ] T049 [US4] Review automatic portal creation behavior from research.md section 5 (channels auto-bridge on activity)
- [ ] T050 [US4] Document conversation_count configuration parameter in hosts/ops-jrz1.nix (controls initial sync count)
- [ ] T051 [US4] Test adding new channel by inviting Slack bot to #general using /invite @Matrix Bridge in Slack
- [ ] T052 [US4] Verify portal auto-created when message sent in #general without configuration change
- [ ] T053 [US4] Accept Matrix room invitation for #general and verify messages relay
- [ ] T054 [US4] Test removing channel bridge by kicking bot from Slack channel and verifying portal becomes inactive
- [ ] T055 [US4] Document active bridges by querying database using ssh root@45.77.205.49 'sudo -u mautrix_slack psql mautrix_slack -c "SELECT * FROM portal;"'
- [ ] T056 [US4] Test configuration error handling by setting invalid conversation_count value and verifying error message
- [ ] T057 [US4] Document channel management workflow in docs/worklogs/2025-10-22-us4-channel-management.org
- [ ] T058 [US4] Update CLAUDE.md with channel management patterns and common commands
**Checkpoint**: All user stories should now be independently functional
---
## Phase 7: Polish & Cross-Cutting Concerns
**Purpose**: Production readiness and documentation
- [ ] T059 [P] Change logging.level from "debug" to "info" in hosts/ops-jrz1.nix for production
- [ ] T060 [P] Create comprehensive deployment worklog in docs/worklogs/2025-10-22-slack-bridge-deployment-complete.org
- [ ] T061 [P] Update platform-vision.md to mark Milestone 1 (Working Slack Bridge) as complete
- [ ] T062 Validate all success criteria from spec.md SC-001 through SC-008
- [ ] T063 Run through quickstart.md steps to verify deployment guide accuracy
- [ ] T064 [P] Create backup of bridge database using ssh root@45.77.205.49 'sudo -u postgres pg_dump mautrix_slack > mautrix_slack_backup.sql'
- [ ] T065 [P] Document monitoring commands and health check procedures in CLAUDE.md
- [ ] T066 Monitor bridge stability for 7 days and collect uptime metrics for SC-003 validation
- [ ] T067 [P] Create troubleshooting guide for common issues (exit code 11, Socket Mode disconnects, auth failures)
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies - can start immediately (external Slack app configuration)
- **Foundational (Phase 2)**: Depends on Setup completion - BLOCKS all user stories
- **User Stories (Phase 3-6)**: All depend on Foundational phase completion
- US1 and US2 are both P1 priority but US2 depends on US1 being tested first (builds on Slack→Matrix foundation)
- US3 (P2) can be tested after US1+US2 work
- US4 (P3) can be implemented after core messaging validated
- **Polish (Phase 7)**: Depends on all desired user stories being complete
### User Story Dependencies
- **User Story 1 (P1)**: Can start after Foundational (Phase 2) - No dependencies on other stories
- **User Story 2 (P1)**: Can start after US1 validated - Builds on Slack→Matrix relay working
- **User Story 3 (P2)**: Can start after US1+US2 - Tests existing bridge reliability
- **User Story 4 (P3)**: Can start after US1+US2 - Tests channel management on working bridge
### Within Each User Story
- US1: Setup → Deploy → Authenticate → Test Slack→Matrix → Validate
- US2: Test Matrix→Slack → Validate (uses infrastructure from US1)
- US3: Test auto-start → Test recovery → Monitor health indicators
- US4: Test auto portal creation → Test removal → Document management
### Parallel Opportunities
- Phase 1 (T001-T005): All can run in parallel (different Slack app configuration steps)
- Phase 2: Most tasks sequential (dependencies on secrets, config, services)
- User Stories: Cannot truly parallelize due to shared bridge instance and sequential validation needs
- Phase 7 polish tasks: Most marked [P] can run in parallel (different files/documentation)
---
## Parallel Example: Phase 1 (Slack App Setup)
```bash
# All Slack app configuration tasks can proceed in parallel:
Task: "Create Slack app using manifest"
Task: "Document app setup process"
Task: "Verify scopes"
```
---
## Parallel Example: Phase 7 (Polish)
```bash
# Documentation and monitoring tasks can run in parallel:
Task: "Change logging level to info"
Task: "Create deployment worklog"
Task: "Update platform-vision.md"
Task: "Create database backup"
Task: "Document monitoring commands"
Task: "Create troubleshooting guide"
```
---
## Implementation Strategy
### MVP First (User Stories 1 + 2)
1. Complete Phase 1: Slack App Setup (external)
2. Complete Phase 2: Foundational (CRITICAL - blocks all stories)
3. Complete Phase 3: User Story 1 (Slack→Matrix)
4. **VALIDATE US1**: Test independently, verify <5 second latency, verify sender identity
5. Complete Phase 4: User Story 2 (MatrixSlack)
6. **VALIDATE US2**: Test independently, verify bidirectional flow works
7. **STOP and VALIDATE MVP**: Full bidirectional messaging working
8. Deploy/demo if ready
### Incremental Delivery
1. Complete Setup + Foundational Foundation ready
2. Add User Story 1 Test independently MVP partial (read-only Slack via Matrix)
3. Add User Story 2 Test independently MVP complete (full bidirectional)
4. Add User Story 3 Test independently Production ready (auto-recovery)
5. Add User Story 4 Test independently Admin friendly (easy channel management)
6. Each story adds value without breaking previous stories
### Single-Person Sequential Strategy
Given infrastructure configuration nature (single bridge instance):
1. Complete Setup (Phase 1) - Slack app external setup
2. Complete Foundational (Phase 2) - Core infrastructure
3. Implement User Story 1 (Phase 3) - Validate thoroughly before proceeding
4. Implement User Story 2 (Phase 4) - Builds on US1, validate bidirectional
5. Implement User Story 3 (Phase 5) - Test reliability features
6. Implement User Story 4 (Phase 6) - Test channel management
7. Polish (Phase 7) - Production hardening and documentation
---
## Notes
- [P] tasks = different files, no dependencies
- [Story] label maps task to specific user story for traceability
- Each user story should be independently completable and testable
- Manual testing throughout (no automated test suite per spec.md)
- Commit after each task or logical group
- Stop at any checkpoint to validate story independently
- Infrastructure project: tasks are configuration updates, not traditional code
- Bridge uses interactive authentication (tokens via Matrix chat, not NixOS config)
- Automatic portal creation means no static channel mapping configuration needed
- Health monitoring via systemd journal logs (basic indicators per FR-011a)