# Research: Extract Matrix Platform Modules **Date**: 2025-10-11 **Feature**: Extract Matrix Platform Modules as Public Template **Status**: Completed ## Overview This document captures technical decisions and research findings for extracting Matrix platform modules from ops-base and publishing as a public template. All decisions are informed by the RFC multi-model consensus validation (gemini-2.5-pro, gpt-5-codex, qwen3-coder). ## Decision 1: Sanitization Strategy ### Decision **Hybrid approach**: Automated script for bulk replacements + manual validation checklist ### Rationale 1. **Safety**: Multiple validation layers reduce risk of missed sensitive data 2. **Efficiency**: Automated script handles 90% of repetitive replacements 3. **Accuracy**: Manual review catches context-specific issues (comments, documentation) 4. **Auditability**: Checklist provides proof of thorough review 5. **RFC Consensus**: All three models recommended this approach ### Implementation ```bash # scripts/sanitize-files.sh # Automated replacements: - clarun.xyz → example.com - talu.uno → matrix.example.org - 192.168.1.x → 10.0.0.x (RFC 1918) - 45.77.205.49 → 203.0.113.10 (TEST-NET-3) - /home/dan → /home/user - jrz1 → matrix - @admin:clarun → @admin:example # Manual validation checklist: - Grep for known sensitive patterns - Review all comments for personal context - Scan git commit messages if preserved - Check for hardcoded tokens/secrets - Verify REPLACE_ME comments added - Run gitleaks for automated secret detection ``` ### Alternatives Considered - **git-filter-repo**: Rejected - creates modified history, we want fresh history - **Fully manual**: Rejected - too error-prone for 3,400+ lines - **Fully automated**: Rejected - can't detect context-specific personal info ### References - RFC Section: "Enhanced Sanitization Process" (lines 305-375) - gitleaks documentation: https://github.com/gitleaks/gitleaks - NixOS security best practices --- ## Decision 2: Worklog Extraction Process ### Decision **LLM-assisted selective extraction** with manual review and organization ### Rationale 1. **Volume**: 300KB+ of worklogs too large for purely manual extraction 2. **Quality**: LLM can identify and extract architectural patterns effectively 3. **Structure**: Human organization ensures docs follow consistent template 4. **Sanitization**: Manual review removes personal debugging context 5. **Precedent**: Successfully used for RFC creation ### Implementation ``` # Process for each worklog file: 1. Identify extractable content: - Architectural decisions (KEEP) - Pattern documentation (KEEP) - Problem-solving approaches (KEEP, sanitize) - Personal debugging sessions (SKIP) - Time-stamped logs (SKIP) - IP/domain-specific troubleshooting (SANITIZE) 2. Extract with LLM: - Input: worklog file + target doc structure - Output: Draft documentation section - Prompt: "Extract technical patterns, remove personal context" 3. Manual review: - Verify accuracy against source code - Remove remaining personal references - Ensure consistency with other docs - Add cross-references and examples 4. Target mapping: - Socket Mode pattern → docs/bridges/slack-setup.md - Config generation → docs/patterns/config-generation.md - Admin room setup → docs/patterns/admin-room-setup.md - sops-nix workflow → docs/secrets-management.md ``` ### Alternatives Considered - **Full manual extraction**: Rejected - too time-consuming, inconsistent - **Direct copy-paste**: Rejected - contains personal info, lacks structure - **Automated extraction only**: Rejected - loses nuance, creates poor docs ### Target Documents (from worklogs) | Worklog Source | Target Documentation | |----------------|---------------------| | mautrix-slack-bridge-implementation-gmessages-pattern.org | docs/patterns/config-generation.md | | mautrix-slack-socket-mode-oauth-scopes-blocker.org | docs/bridges/slack-setup.md | | conduwuit-admin-room-discovery-password-reset.org | docs/patterns/admin-room-setup.md | | sops-nix-secrets-management-rfc.md | docs/secrets-management.md | | (various debugging logs) | docs/troubleshooting.md (optional) | --- ## Decision 3: CI/CD Implementation ### Decision **GitHub Actions with nix flake check + gitleaks**, no Cachix for v1.0 ### Rationale 1. **Simplicity**: GitHub Actions native to platform, no additional services 2. **Security**: gitleaks catches secrets in every PR/commit 3. **Validation**: nix flake check ensures all configs build 4. **Cost**: Free for public repos, no Cachix subscription needed 5. **RFC Alignment**: Matches RFC automated validation requirements ### Implementation ```yaml # .github/workflows/ci.yml name: CI on: [push, pull_request] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: cachix/install-nix-action@v25 - name: Run nix flake check run: nix flake check --all-systems - name: Build example configurations run: | nix build .#nixosConfigurations.example-vps.config.system.build.toplevel nix build .#nixosConfigurations.example-dev.config.system.build.toplevel security: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Run gitleaks uses: gitleaks/gitleaks-action@v2 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} ``` ### Alternatives Considered - **Cachix integration**: Deferred to v1.1 - adds complexity, build times acceptable without it - **GitLab CI**: Rejected - GitHub is primary platform for NixOS community - **Local-only validation**: Rejected - PR validation critical for community contributions ### Performance Targets - Total CI time: <5 minutes per commit - nix flake check: <2 minutes - gitleaks scan: <30 seconds - Build examples: <3 minutes --- ## Decision 4: Sync Workflow Design ### Decision **Git tags + sync-log.md file** with quarterly calendar reminders ### Rationale 1. **Traceability**: Git tags mark template versions synced with ops-base state 2. **Documentation**: sync-log.md records what changed and why 3. **Simplicity**: No complex tooling, standard git workflow 4. **Discoverability**: sync-log.md visible in repo for transparency 5. **RFC Consensus**: Documented workflow reduces sync discipline risk ### Implementation ```bash # scripts/sync-to-template.sh workflow: 1. Identify changes in ops-base since last sync: git log --since="$(git -C ../nixos-matrix-platform-template tag -l 'sync-*' --sort=-v:refname | head -1 | cut -d'-' -f2)" 2. Review changes for applicability: - Bug fixes: SYNC - New features: SYNC if tested - Security fixes: SYNC (priority) - Personal config: SKIP - Worklogs: SKIP 3. Apply sanitization to selected changes 4. Validate in template: nix flake check gitleaks detect --no-git 5. Update sync-log.md: ## Sync 2025-10-11 (from ops-base commit abc123) - Fixed: Matrix registration token validation - Added: WhatsApp bridge reconnection logic - Security: Updated sops-nix to v0.16.0 6. Commit with tag: git commit -m "Sync improvements from ops-base (2025-10-11)" git tag "sync-20251011-abc123" git push --tags # sync-log.md format: # Sync Log: ops-base → nixos-matrix-platform-template ## Sync 2025-10-11 (ops-base: abc123) **Changes**: - [BUGFIX] Matrix registration token validation - [FEATURE] WhatsApp bridge reconnection - [SECURITY] sops-nix v0.16.0 upgrade **Skipped**: - Personal config changes in comm-talu-uno.nix ``` ### Alternatives Considered - **Automated sync via git subtree**: Rejected - sanitization can't be automated - **Manual documentation only**: Rejected - easy to forget, no traceability - **Separate tracking tool**: Rejected - over-engineered for quarterly cadence ### Quarterly Sync Schedule - Q1 (January): Major sync after holiday break - Q2 (April): Feature updates and spring cleaning - Q3 (July): Mid-year maintenance - Q4 (October): Pre-holiday stability sync --- ## Decision 5: Testing Strategy ### Decision **Build validation + selective VPS integration testing** (Phase 3 of RFC) ### Rationale 1. **Cost-effective**: Full VPS test only at major milestones 2. **Fast feedback**: nix flake check catches 90% of issues instantly 3. **Real validation**: At least one VPS deployment before v1.0 publication 4. **Community testing**: Beta testers provide diverse environment testing 5. **Risk management**: Balances thoroughness with time/cost ### Implementation ``` # Testing levels: 1. Every commit (CI): - nix flake check (all configs) - gitleaks scan - Build example-vps.nix - Build example-dev.nix 2. Before PR merge: - All CI checks pass - Manual review of sanitization - Documentation accuracy check 3. Before v1.0 publication (Phase 3): - Deploy example-vps.nix to fresh Vultr VPS - Deploy example-dev.nix to fresh Vultr VPS - Test all user stories end-to-end - Verify Matrix server responds - Test bridge setup guides - Validate secrets management workflow - Community beta testing (3-5 testers) 4. Post-publication: - Monitor GitHub issues for deployment problems - Track success metrics (SC-007, SC-008) ``` ### VPS Test Checklist (Phase 3) - [ ] Fresh NixOS VPS provisioned - [ ] Clone template repository - [ ] Follow getting-started.md (time it - should be <30 min) - [ ] Customize example-vps.nix with test domain - [ ] Deploy with nixos-rebuild - [ ] Verify Matrix API responds (curl /_matrix/client/versions) - [ ] Create test user with registration token - [ ] Test Element Web login - [ ] Follow slack-setup.md (if testing bridges) - [ ] Verify CI runs on sample PR - [ ] Document any issues or unclear docs ### Alternatives Considered - **No integration testing**: Rejected - too risky for v1.0 publication - **Automated VPS tests**: Deferred to v1.1 - complex setup, manual adequate for v1.0 - **Continuous VPS testing**: Rejected - expensive, unnecessary for template --- ## Decision 6: Repository Initialization Approach ### Decision **Manual repository creation** on GitHub with direct file commits (not git-filter-repo) ### Rationale 1. **Clean history**: Fresh git init ensures no ops-base history 2. **Safety**: Manual process allows review at each step 3. **Simplicity**: Direct commits easier than git-filter-repo for one-time extraction 4. **Transparency**: Clear commit history shows sanitization process ### Implementation ```bash # Repository creation workflow: 1. Create empty GitHub repo: nixos-matrix-platform-template 2. Clone to local machine 3. Copy sanitized files from staging directory 4. Review each file before adding 5. Create initial commit structure: - Commit 1: Add README, LICENSE, .gitignore - Commit 2: Add modules/ - Commit 3: Add configurations/ - Commit 4: Add docs/ - Commit 5: Add .github/workflows/ - Commit 6: Add examples/ and scripts/ 6. Run final validation 7. Push to GitHub 8. Enable GitHub Discussions 9. Add repository description and tags ``` ### No git-filter-repo - Not needed - we're creating fresh repo, not rewriting history - ops-base history stays in ops-base - Template has clean, purposeful commit history --- ## Technology Stack Summary ### Core Technologies - **Nix/NixOS**: 24.05+ (pinned via flake.lock) - **nixpkgs**: Pinned to tested commit for reproducibility - **sops-nix**: v0.15.0+ (secrets management) - **age**: Latest (encryption backend for sops-nix) ### Validation & Security - **gitleaks**: v8.18.0+ (secret scanning) - **nix flake check**: Built-in Nix validation ### CI/CD - **GitHub Actions**: Native CI/CD platform - **cachix/install-nix-action@v25**: Nix installation for CI - **gitleaks/gitleaks-action@v2**: Secret scanning action ### Development Tools - **Bash**: 5.x (sanitization/sync scripts) - **git**: 2.x (version control, tagging) - **SSH**: For VPS deployment testing ### Services (Matrix Platform - in template) - **matrix-continuwuity**: Matrix homeserver (from nixpkgs-unstable) - **mautrix-slack**: Slack bridge - **mautrix-whatsapp**: WhatsApp bridge - **mautrix-gmessages**: Google Messages bridge - **forgejo**: Git service - **nginx**: Reverse proxy - **postgresql**: Database for bridges/Forgejo - **fail2ban**: Intrusion prevention - **sops**: Secrets encryption CLI --- ## Best Practices Applied ### From NixOS Community 1. **Pinned dependencies**: Use flake.lock for reproducibility 2. **Module options**: Provide configurable options with sensible defaults 3. **Security hardening**: Apply systemd security features 4. **Documentation**: Comprehensive examples and guides ### From Infrastructure-as-Code 1. **Immutability**: Fresh git history, no rewrites 2. **Validation**: Multiple layers (syntax, build, secrets, integration) 3. **Idempotency**: All scripts can be run multiple times safely 4. **Auditability**: Clear commit messages, sync logs, checklists ### From Open Source Projects 1. **Governance files**: CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT 2. **Issue templates**: Structured bug reports and feature requests 3. **CI/CD**: Automated checks on every PR 4. **Semantic versioning**: v1.0.0 for initial stable release --- ## Risk Mitigations ### Secret Leakage (Critical Risk) - **Mitigation 1**: Automated gitleaks scan (CI + local) - **Mitigation 2**: Manual review checklist - **Mitigation 3**: Fresh git history (no ops-base commits) - **Mitigation 4**: Community beta review before publication - **Residual Risk**: Low (multi-layer validation) ### Template Divergence (Medium Risk) - **Mitigation 1**: Documented sync workflow (scripts/sync-to-template.sh) - **Mitigation 2**: Quarterly calendar reminders - **Mitigation 3**: Git tags + sync-log.md for tracking - **Mitigation 4**: RFC-validated process - **Residual Risk**: Medium (requires discipline) ### Breaking Dependencies (Medium Risk) - **Mitigation 1**: Pinned nixpkgs version - **Mitigation 2**: CI tests on every commit - **Mitigation 3**: Integration testing before major releases - **Mitigation 4**: Version compatibility matrix in README - **Residual Risk**: Low (pinning prevents surprises) ### Poor Documentation (Medium Risk) - **Mitigation 1**: Extract from 300KB+ tested worklogs - **Mitigation 2**: Community beta testing for clarity - **Mitigation 3**: User story acceptance criteria - **Mitigation 4**: Quick start guide (5-minute target) - **Residual Risk**: Low (comprehensive extraction + validation) --- ## Success Metrics Tracking ### How to Measure (post-publication) - **SC-001** (30 min deployment): Time beta testers during Phase 3 - **SC-002** (builds pass): CI badge in README, track failures - **SC-003** (zero secrets): gitleaks CI status, manual audits - **SC-004** (8 modules extracted): Checklist of modules present and building - matrix-continuwuity, mautrix-slack, mautrix-whatsapp, mautrix-gmessages, dev-services, fail2ban, ssh-hardening, matrix-secrets - **SC-005** (documentation complete): Checklist of required docs present - getting-started.md, architecture.md, secrets-management.md, slack-setup.md, whatsapp-setup.md, gmessages-setup.md, config-generation.md, admin-room-setup.md - **SC-006** (CI runs): GitHub Actions badge, monitor runs - **SC-007** (10 stars in 3 months): GitHub stars count - **SC-008** (3 issues/PRs): GitHub insights - **SC-009** (zero incidents): Monitor issues, no secret reports - **SC-010** (quarterly sync): Track sync-log.md entries --- ## Open Questions Resolved All open questions from spec were marked "None - resolved in RFC". Additional implementation questions resolved here: 1. **Q: Should we use pre-commit hooks?** - A: Yes, but optional for users. Include .pre-commit-config.yaml example 2. **Q: What NixOS version to target?** - A: 24.05+ (current stable), test on both stable and unstable 3. **Q: Should we include Cachix in CI?** - A: Not for v1.0 (added complexity), consider for v1.1 if builds slow 4. **Q: How to handle user questions/support?** - A: GitHub Discussions for Q&A, Issues for bugs only (per CONTRIBUTING.md) 5. **Q: Should we create a Matrix room for support?** - A: Yes, mentioned in README (#nixos-matrix-template:matrix.org) - dogfooding --- ## Next Steps 1. ✅ Research completed - all decisions documented 2. → Proceed to Phase 1: Design & Contracts - Create data-model.md (entities, relationships) - Create contracts/ (sanitization rules, CI contracts) - Create quickstart.md (developer onboarding) - Update agent context (.specify/memory/AGENT_FILE.md)