- Add speckit workflow infrastructure (.claude, .specify) - Create NixOS configuration skeleton (flake.nix, configuration.nix, hosts/ops-jrz1.nix) - Add sanitization scripts with 22 rules for personal info removal - Add validation scripts with gitleaks integration - Configure git hooks (pre-commit, pre-push) for security validation - Add project documentation (README, LICENSE) - Add comprehensive .gitignore for Nix, secrets, staging Phase 1 and Phase 2 complete. Foundation ready for module extraction from ops-base.
16 KiB
Research: Extract Matrix Platform Modules
Date: 2025-10-11 Feature: Extract Matrix Platform Modules as Public Template Status: Completed
Overview
This document captures technical decisions and research findings for extracting Matrix platform modules from ops-base and publishing as a public template. All decisions are informed by the RFC multi-model consensus validation (gemini-2.5-pro, gpt-5-codex, qwen3-coder).
Decision 1: Sanitization Strategy
Decision
Hybrid approach: Automated script for bulk replacements + manual validation checklist
Rationale
- Safety: Multiple validation layers reduce risk of missed sensitive data
- Efficiency: Automated script handles 90% of repetitive replacements
- Accuracy: Manual review catches context-specific issues (comments, documentation)
- Auditability: Checklist provides proof of thorough review
- RFC Consensus: All three models recommended this approach
Implementation
# scripts/sanitize-files.sh
# Automated replacements:
- clarun.xyz → example.com
- talu.uno → matrix.example.org
- 192.168.1.x → 10.0.0.x (RFC 1918)
- 45.77.205.49 → 203.0.113.10 (TEST-NET-3)
- /home/dan → /home/user
- jrz1 → matrix
- @admin:clarun → @admin:example
# Manual validation checklist:
- Grep for known sensitive patterns
- Review all comments for personal context
- Scan git commit messages if preserved
- Check for hardcoded tokens/secrets
- Verify REPLACE_ME comments added
- Run gitleaks for automated secret detection
Alternatives Considered
- git-filter-repo: Rejected - creates modified history, we want fresh history
- Fully manual: Rejected - too error-prone for 3,400+ lines
- Fully automated: Rejected - can't detect context-specific personal info
References
- RFC Section: "Enhanced Sanitization Process" (lines 305-375)
- gitleaks documentation: https://github.com/gitleaks/gitleaks
- NixOS security best practices
Decision 2: Worklog Extraction Process
Decision
LLM-assisted selective extraction with manual review and organization
Rationale
- Volume: 300KB+ of worklogs too large for purely manual extraction
- Quality: LLM can identify and extract architectural patterns effectively
- Structure: Human organization ensures docs follow consistent template
- Sanitization: Manual review removes personal debugging context
- Precedent: Successfully used for RFC creation
Implementation
# Process for each worklog file:
1. Identify extractable content:
- Architectural decisions (KEEP)
- Pattern documentation (KEEP)
- Problem-solving approaches (KEEP, sanitize)
- Personal debugging sessions (SKIP)
- Time-stamped logs (SKIP)
- IP/domain-specific troubleshooting (SANITIZE)
2. Extract with LLM:
- Input: worklog file + target doc structure
- Output: Draft documentation section
- Prompt: "Extract technical patterns, remove personal context"
3. Manual review:
- Verify accuracy against source code
- Remove remaining personal references
- Ensure consistency with other docs
- Add cross-references and examples
4. Target mapping:
- Socket Mode pattern → docs/bridges/slack-setup.md
- Config generation → docs/patterns/config-generation.md
- Admin room setup → docs/patterns/admin-room-setup.md
- sops-nix workflow → docs/secrets-management.md
Alternatives Considered
- Full manual extraction: Rejected - too time-consuming, inconsistent
- Direct copy-paste: Rejected - contains personal info, lacks structure
- Automated extraction only: Rejected - loses nuance, creates poor docs
Target Documents (from worklogs)
| Worklog Source | Target Documentation |
|---|---|
| mautrix-slack-bridge-implementation-gmessages-pattern.org | docs/patterns/config-generation.md |
| mautrix-slack-socket-mode-oauth-scopes-blocker.org | docs/bridges/slack-setup.md |
| conduwuit-admin-room-discovery-password-reset.org | docs/patterns/admin-room-setup.md |
| sops-nix-secrets-management-rfc.md | docs/secrets-management.md |
| (various debugging logs) | docs/troubleshooting.md (optional) |
Decision 3: CI/CD Implementation
Decision
GitHub Actions with nix flake check + gitleaks, no Cachix for v1.0
Rationale
- Simplicity: GitHub Actions native to platform, no additional services
- Security: gitleaks catches secrets in every PR/commit
- Validation: nix flake check ensures all configs build
- Cost: Free for public repos, no Cachix subscription needed
- RFC Alignment: Matches RFC automated validation requirements
Implementation
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: cachix/install-nix-action@v25
- name: Run nix flake check
run: nix flake check --all-systems
- name: Build example configurations
run: |
nix build .#nixosConfigurations.example-vps.config.system.build.toplevel
nix build .#nixosConfigurations.example-dev.config.system.build.toplevel
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run gitleaks
uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Alternatives Considered
- Cachix integration: Deferred to v1.1 - adds complexity, build times acceptable without it
- GitLab CI: Rejected - GitHub is primary platform for NixOS community
- Local-only validation: Rejected - PR validation critical for community contributions
Performance Targets
- Total CI time: <5 minutes per commit
- nix flake check: <2 minutes
- gitleaks scan: <30 seconds
- Build examples: <3 minutes
Decision 4: Sync Workflow Design
Decision
Git tags + sync-log.md file with quarterly calendar reminders
Rationale
- Traceability: Git tags mark template versions synced with ops-base state
- Documentation: sync-log.md records what changed and why
- Simplicity: No complex tooling, standard git workflow
- Discoverability: sync-log.md visible in repo for transparency
- RFC Consensus: Documented workflow reduces sync discipline risk
Implementation
# scripts/sync-to-template.sh workflow:
1. Identify changes in ops-base since last sync:
git log --since="$(git -C ../nixos-matrix-platform-template tag -l 'sync-*' --sort=-v:refname | head -1 | cut -d'-' -f2)"
2. Review changes for applicability:
- Bug fixes: SYNC
- New features: SYNC if tested
- Security fixes: SYNC (priority)
- Personal config: SKIP
- Worklogs: SKIP
3. Apply sanitization to selected changes
4. Validate in template:
nix flake check
gitleaks detect --no-git
5. Update sync-log.md:
## Sync 2025-10-11 (from ops-base commit abc123)
- Fixed: Matrix registration token validation
- Added: WhatsApp bridge reconnection logic
- Security: Updated sops-nix to v0.16.0
6. Commit with tag:
git commit -m "Sync improvements from ops-base (2025-10-11)"
git tag "sync-20251011-abc123"
git push --tags
# sync-log.md format:
# Sync Log: ops-base → nixos-matrix-platform-template
## Sync 2025-10-11 (ops-base: abc123)
**Changes**:
- [BUGFIX] Matrix registration token validation
- [FEATURE] WhatsApp bridge reconnection
- [SECURITY] sops-nix v0.16.0 upgrade
**Skipped**:
- Personal config changes in comm-talu-uno.nix
Alternatives Considered
- Automated sync via git subtree: Rejected - sanitization can't be automated
- Manual documentation only: Rejected - easy to forget, no traceability
- Separate tracking tool: Rejected - over-engineered for quarterly cadence
Quarterly Sync Schedule
- Q1 (January): Major sync after holiday break
- Q2 (April): Feature updates and spring cleaning
- Q3 (July): Mid-year maintenance
- Q4 (October): Pre-holiday stability sync
Decision 5: Testing Strategy
Decision
Build validation + selective VPS integration testing (Phase 3 of RFC)
Rationale
- Cost-effective: Full VPS test only at major milestones
- Fast feedback: nix flake check catches 90% of issues instantly
- Real validation: At least one VPS deployment before v1.0 publication
- Community testing: Beta testers provide diverse environment testing
- Risk management: Balances thoroughness with time/cost
Implementation
# Testing levels:
1. Every commit (CI):
- nix flake check (all configs)
- gitleaks scan
- Build example-vps.nix
- Build example-dev.nix
2. Before PR merge:
- All CI checks pass
- Manual review of sanitization
- Documentation accuracy check
3. Before v1.0 publication (Phase 3):
- Deploy example-vps.nix to fresh Vultr VPS
- Deploy example-dev.nix to fresh Vultr VPS
- Test all user stories end-to-end
- Verify Matrix server responds
- Test bridge setup guides
- Validate secrets management workflow
- Community beta testing (3-5 testers)
4. Post-publication:
- Monitor GitHub issues for deployment problems
- Track success metrics (SC-007, SC-008)
VPS Test Checklist (Phase 3)
- Fresh NixOS VPS provisioned
- Clone template repository
- Follow getting-started.md (time it - should be <30 min)
- Customize example-vps.nix with test domain
- Deploy with nixos-rebuild
- Verify Matrix API responds (curl /_matrix/client/versions)
- Create test user with registration token
- Test Element Web login
- Follow slack-setup.md (if testing bridges)
- Verify CI runs on sample PR
- Document any issues or unclear docs
Alternatives Considered
- No integration testing: Rejected - too risky for v1.0 publication
- Automated VPS tests: Deferred to v1.1 - complex setup, manual adequate for v1.0
- Continuous VPS testing: Rejected - expensive, unnecessary for template
Decision 6: Repository Initialization Approach
Decision
Manual repository creation on GitHub with direct file commits (not git-filter-repo)
Rationale
- Clean history: Fresh git init ensures no ops-base history
- Safety: Manual process allows review at each step
- Simplicity: Direct commits easier than git-filter-repo for one-time extraction
- Transparency: Clear commit history shows sanitization process
Implementation
# Repository creation workflow:
1. Create empty GitHub repo: nixos-matrix-platform-template
2. Clone to local machine
3. Copy sanitized files from staging directory
4. Review each file before adding
5. Create initial commit structure:
- Commit 1: Add README, LICENSE, .gitignore
- Commit 2: Add modules/
- Commit 3: Add configurations/
- Commit 4: Add docs/
- Commit 5: Add .github/workflows/
- Commit 6: Add examples/ and scripts/
6. Run final validation
7. Push to GitHub
8. Enable GitHub Discussions
9. Add repository description and tags
No git-filter-repo
- Not needed - we're creating fresh repo, not rewriting history
- ops-base history stays in ops-base
- Template has clean, purposeful commit history
Technology Stack Summary
Core Technologies
- Nix/NixOS: 24.05+ (pinned via flake.lock)
- nixpkgs: Pinned to tested commit for reproducibility
- sops-nix: v0.15.0+ (secrets management)
- age: Latest (encryption backend for sops-nix)
Validation & Security
- gitleaks: v8.18.0+ (secret scanning)
- nix flake check: Built-in Nix validation
CI/CD
- GitHub Actions: Native CI/CD platform
- cachix/install-nix-action@v25: Nix installation for CI
- gitleaks/gitleaks-action@v2: Secret scanning action
Development Tools
- Bash: 5.x (sanitization/sync scripts)
- git: 2.x (version control, tagging)
- SSH: For VPS deployment testing
Services (Matrix Platform - in template)
- matrix-continuwuity: Matrix homeserver (from nixpkgs-unstable)
- mautrix-slack: Slack bridge
- mautrix-whatsapp: WhatsApp bridge
- mautrix-gmessages: Google Messages bridge
- forgejo: Git service
- nginx: Reverse proxy
- postgresql: Database for bridges/Forgejo
- fail2ban: Intrusion prevention
- sops: Secrets encryption CLI
Best Practices Applied
From NixOS Community
- Pinned dependencies: Use flake.lock for reproducibility
- Module options: Provide configurable options with sensible defaults
- Security hardening: Apply systemd security features
- Documentation: Comprehensive examples and guides
From Infrastructure-as-Code
- Immutability: Fresh git history, no rewrites
- Validation: Multiple layers (syntax, build, secrets, integration)
- Idempotency: All scripts can be run multiple times safely
- Auditability: Clear commit messages, sync logs, checklists
From Open Source Projects
- Governance files: CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT
- Issue templates: Structured bug reports and feature requests
- CI/CD: Automated checks on every PR
- Semantic versioning: v1.0.0 for initial stable release
Risk Mitigations
Secret Leakage (Critical Risk)
- Mitigation 1: Automated gitleaks scan (CI + local)
- Mitigation 2: Manual review checklist
- Mitigation 3: Fresh git history (no ops-base commits)
- Mitigation 4: Community beta review before publication
- Residual Risk: Low (multi-layer validation)
Template Divergence (Medium Risk)
- Mitigation 1: Documented sync workflow (scripts/sync-to-template.sh)
- Mitigation 2: Quarterly calendar reminders
- Mitigation 3: Git tags + sync-log.md for tracking
- Mitigation 4: RFC-validated process
- Residual Risk: Medium (requires discipline)
Breaking Dependencies (Medium Risk)
- Mitigation 1: Pinned nixpkgs version
- Mitigation 2: CI tests on every commit
- Mitigation 3: Integration testing before major releases
- Mitigation 4: Version compatibility matrix in README
- Residual Risk: Low (pinning prevents surprises)
Poor Documentation (Medium Risk)
- Mitigation 1: Extract from 300KB+ tested worklogs
- Mitigation 2: Community beta testing for clarity
- Mitigation 3: User story acceptance criteria
- Mitigation 4: Quick start guide (5-minute target)
- Residual Risk: Low (comprehensive extraction + validation)
Success Metrics Tracking
How to Measure (post-publication)
- SC-001 (30 min deployment): Time beta testers during Phase 3
- SC-002 (builds pass): CI badge in README, track failures
- SC-003 (zero secrets): gitleaks CI status, manual audits
- SC-004 (8 modules extracted): Checklist of modules present and building - matrix-continuwuity, mautrix-slack, mautrix-whatsapp, mautrix-gmessages, dev-services, fail2ban, ssh-hardening, matrix-secrets
- SC-005 (documentation complete): Checklist of required docs present - getting-started.md, architecture.md, secrets-management.md, slack-setup.md, whatsapp-setup.md, gmessages-setup.md, config-generation.md, admin-room-setup.md
- SC-006 (CI runs): GitHub Actions badge, monitor runs
- SC-007 (10 stars in 3 months): GitHub stars count
- SC-008 (3 issues/PRs): GitHub insights
- SC-009 (zero incidents): Monitor issues, no secret reports
- SC-010 (quarterly sync): Track sync-log.md entries
Open Questions Resolved
All open questions from spec were marked "None - resolved in RFC". Additional implementation questions resolved here:
-
Q: Should we use pre-commit hooks?
- A: Yes, but optional for users. Include .pre-commit-config.yaml example
-
Q: What NixOS version to target?
- A: 24.05+ (current stable), test on both stable and unstable
-
Q: Should we include Cachix in CI?
- A: Not for v1.0 (added complexity), consider for v1.1 if builds slow
-
Q: How to handle user questions/support?
- A: GitHub Discussions for Q&A, Issues for bugs only (per CONTRIBUTING.md)
-
Q: Should we create a Matrix room for support?
- A: Yes, mentioned in README (#nixos-matrix-template:matrix.org) - dogfooding
Next Steps
- ✅ Research completed - all decisions documented
- → Proceed to Phase 1: Design & Contracts
- Create data-model.md (entities, relationships)
- Create contracts/ (sanitization rules, CI contracts)
- Create quickstart.md (developer onboarding)
- Update agent context (.specify/memory/AGENT_FILE.md)