- Add speckit workflow infrastructure (.claude, .specify) - Create NixOS configuration skeleton (flake.nix, configuration.nix, hosts/ops-jrz1.nix) - Add sanitization scripts with 22 rules for personal info removal - Add validation scripts with gitleaks integration - Configure git hooks (pre-commit, pre-push) for security validation - Add project documentation (README, LICENSE) - Add comprehensive .gitignore for Nix, secrets, staging Phase 1 and Phase 2 complete. Foundation ready for module extraction from ops-base.
470 lines
16 KiB
Markdown
470 lines
16 KiB
Markdown
# Research: Extract Matrix Platform Modules
|
|
|
|
**Date**: 2025-10-11
|
|
**Feature**: Extract Matrix Platform Modules as Public Template
|
|
**Status**: Completed
|
|
|
|
## Overview
|
|
|
|
This document captures technical decisions and research findings for extracting Matrix platform modules from ops-base and publishing as a public template. All decisions are informed by the RFC multi-model consensus validation (gemini-2.5-pro, gpt-5-codex, qwen3-coder).
|
|
|
|
## Decision 1: Sanitization Strategy
|
|
|
|
### Decision
|
|
**Hybrid approach**: Automated script for bulk replacements + manual validation checklist
|
|
|
|
### Rationale
|
|
1. **Safety**: Multiple validation layers reduce risk of missed sensitive data
|
|
2. **Efficiency**: Automated script handles 90% of repetitive replacements
|
|
3. **Accuracy**: Manual review catches context-specific issues (comments, documentation)
|
|
4. **Auditability**: Checklist provides proof of thorough review
|
|
5. **RFC Consensus**: All three models recommended this approach
|
|
|
|
### Implementation
|
|
```bash
|
|
# scripts/sanitize-files.sh
|
|
# Automated replacements:
|
|
- clarun.xyz → example.com
|
|
- talu.uno → matrix.example.org
|
|
- 192.168.1.x → 10.0.0.x (RFC 1918)
|
|
- 45.77.205.49 → 203.0.113.10 (TEST-NET-3)
|
|
- /home/dan → /home/user
|
|
- jrz1 → matrix
|
|
- @admin:clarun → @admin:example
|
|
|
|
# Manual validation checklist:
|
|
- Grep for known sensitive patterns
|
|
- Review all comments for personal context
|
|
- Scan git commit messages if preserved
|
|
- Check for hardcoded tokens/secrets
|
|
- Verify REPLACE_ME comments added
|
|
- Run gitleaks for automated secret detection
|
|
```
|
|
|
|
### Alternatives Considered
|
|
- **git-filter-repo**: Rejected - creates modified history, we want fresh history
|
|
- **Fully manual**: Rejected - too error-prone for 3,400+ lines
|
|
- **Fully automated**: Rejected - can't detect context-specific personal info
|
|
|
|
### References
|
|
- RFC Section: "Enhanced Sanitization Process" (lines 305-375)
|
|
- gitleaks documentation: https://github.com/gitleaks/gitleaks
|
|
- NixOS security best practices
|
|
|
|
---
|
|
|
|
## Decision 2: Worklog Extraction Process
|
|
|
|
### Decision
|
|
**LLM-assisted selective extraction** with manual review and organization
|
|
|
|
### Rationale
|
|
1. **Volume**: 300KB+ of worklogs too large for purely manual extraction
|
|
2. **Quality**: LLM can identify and extract architectural patterns effectively
|
|
3. **Structure**: Human organization ensures docs follow consistent template
|
|
4. **Sanitization**: Manual review removes personal debugging context
|
|
5. **Precedent**: Successfully used for RFC creation
|
|
|
|
### Implementation
|
|
```
|
|
# Process for each worklog file:
|
|
1. Identify extractable content:
|
|
- Architectural decisions (KEEP)
|
|
- Pattern documentation (KEEP)
|
|
- Problem-solving approaches (KEEP, sanitize)
|
|
- Personal debugging sessions (SKIP)
|
|
- Time-stamped logs (SKIP)
|
|
- IP/domain-specific troubleshooting (SANITIZE)
|
|
|
|
2. Extract with LLM:
|
|
- Input: worklog file + target doc structure
|
|
- Output: Draft documentation section
|
|
- Prompt: "Extract technical patterns, remove personal context"
|
|
|
|
3. Manual review:
|
|
- Verify accuracy against source code
|
|
- Remove remaining personal references
|
|
- Ensure consistency with other docs
|
|
- Add cross-references and examples
|
|
|
|
4. Target mapping:
|
|
- Socket Mode pattern → docs/bridges/slack-setup.md
|
|
- Config generation → docs/patterns/config-generation.md
|
|
- Admin room setup → docs/patterns/admin-room-setup.md
|
|
- sops-nix workflow → docs/secrets-management.md
|
|
```
|
|
|
|
### Alternatives Considered
|
|
- **Full manual extraction**: Rejected - too time-consuming, inconsistent
|
|
- **Direct copy-paste**: Rejected - contains personal info, lacks structure
|
|
- **Automated extraction only**: Rejected - loses nuance, creates poor docs
|
|
|
|
### Target Documents (from worklogs)
|
|
| Worklog Source | Target Documentation |
|
|
|----------------|---------------------|
|
|
| mautrix-slack-bridge-implementation-gmessages-pattern.org | docs/patterns/config-generation.md |
|
|
| mautrix-slack-socket-mode-oauth-scopes-blocker.org | docs/bridges/slack-setup.md |
|
|
| conduwuit-admin-room-discovery-password-reset.org | docs/patterns/admin-room-setup.md |
|
|
| sops-nix-secrets-management-rfc.md | docs/secrets-management.md |
|
|
| (various debugging logs) | docs/troubleshooting.md (optional) |
|
|
|
|
---
|
|
|
|
## Decision 3: CI/CD Implementation
|
|
|
|
### Decision
|
|
**GitHub Actions with nix flake check + gitleaks**, no Cachix for v1.0
|
|
|
|
### Rationale
|
|
1. **Simplicity**: GitHub Actions native to platform, no additional services
|
|
2. **Security**: gitleaks catches secrets in every PR/commit
|
|
3. **Validation**: nix flake check ensures all configs build
|
|
4. **Cost**: Free for public repos, no Cachix subscription needed
|
|
5. **RFC Alignment**: Matches RFC automated validation requirements
|
|
|
|
### Implementation
|
|
```yaml
|
|
# .github/workflows/ci.yml
|
|
name: CI
|
|
|
|
on: [push, pull_request]
|
|
|
|
jobs:
|
|
validate:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: cachix/install-nix-action@v25
|
|
- name: Run nix flake check
|
|
run: nix flake check --all-systems
|
|
- name: Build example configurations
|
|
run: |
|
|
nix build .#nixosConfigurations.example-vps.config.system.build.toplevel
|
|
nix build .#nixosConfigurations.example-dev.config.system.build.toplevel
|
|
|
|
security:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
fetch-depth: 0
|
|
- name: Run gitleaks
|
|
uses: gitleaks/gitleaks-action@v2
|
|
env:
|
|
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
|
```
|
|
|
|
### Alternatives Considered
|
|
- **Cachix integration**: Deferred to v1.1 - adds complexity, build times acceptable without it
|
|
- **GitLab CI**: Rejected - GitHub is primary platform for NixOS community
|
|
- **Local-only validation**: Rejected - PR validation critical for community contributions
|
|
|
|
### Performance Targets
|
|
- Total CI time: <5 minutes per commit
|
|
- nix flake check: <2 minutes
|
|
- gitleaks scan: <30 seconds
|
|
- Build examples: <3 minutes
|
|
|
|
---
|
|
|
|
## Decision 4: Sync Workflow Design
|
|
|
|
### Decision
|
|
**Git tags + sync-log.md file** with quarterly calendar reminders
|
|
|
|
### Rationale
|
|
1. **Traceability**: Git tags mark template versions synced with ops-base state
|
|
2. **Documentation**: sync-log.md records what changed and why
|
|
3. **Simplicity**: No complex tooling, standard git workflow
|
|
4. **Discoverability**: sync-log.md visible in repo for transparency
|
|
5. **RFC Consensus**: Documented workflow reduces sync discipline risk
|
|
|
|
### Implementation
|
|
```bash
|
|
# scripts/sync-to-template.sh workflow:
|
|
1. Identify changes in ops-base since last sync:
|
|
git log --since="$(git -C ../nixos-matrix-platform-template tag -l 'sync-*' --sort=-v:refname | head -1 | cut -d'-' -f2)"
|
|
|
|
2. Review changes for applicability:
|
|
- Bug fixes: SYNC
|
|
- New features: SYNC if tested
|
|
- Security fixes: SYNC (priority)
|
|
- Personal config: SKIP
|
|
- Worklogs: SKIP
|
|
|
|
3. Apply sanitization to selected changes
|
|
|
|
4. Validate in template:
|
|
nix flake check
|
|
gitleaks detect --no-git
|
|
|
|
5. Update sync-log.md:
|
|
## Sync 2025-10-11 (from ops-base commit abc123)
|
|
- Fixed: Matrix registration token validation
|
|
- Added: WhatsApp bridge reconnection logic
|
|
- Security: Updated sops-nix to v0.16.0
|
|
|
|
6. Commit with tag:
|
|
git commit -m "Sync improvements from ops-base (2025-10-11)"
|
|
git tag "sync-20251011-abc123"
|
|
git push --tags
|
|
|
|
# sync-log.md format:
|
|
# Sync Log: ops-base → nixos-matrix-platform-template
|
|
|
|
## Sync 2025-10-11 (ops-base: abc123)
|
|
**Changes**:
|
|
- [BUGFIX] Matrix registration token validation
|
|
- [FEATURE] WhatsApp bridge reconnection
|
|
- [SECURITY] sops-nix v0.16.0 upgrade
|
|
|
|
**Skipped**:
|
|
- Personal config changes in comm-talu-uno.nix
|
|
```
|
|
|
|
### Alternatives Considered
|
|
- **Automated sync via git subtree**: Rejected - sanitization can't be automated
|
|
- **Manual documentation only**: Rejected - easy to forget, no traceability
|
|
- **Separate tracking tool**: Rejected - over-engineered for quarterly cadence
|
|
|
|
### Quarterly Sync Schedule
|
|
- Q1 (January): Major sync after holiday break
|
|
- Q2 (April): Feature updates and spring cleaning
|
|
- Q3 (July): Mid-year maintenance
|
|
- Q4 (October): Pre-holiday stability sync
|
|
|
|
---
|
|
|
|
## Decision 5: Testing Strategy
|
|
|
|
### Decision
|
|
**Build validation + selective VPS integration testing** (Phase 3 of RFC)
|
|
|
|
### Rationale
|
|
1. **Cost-effective**: Full VPS test only at major milestones
|
|
2. **Fast feedback**: nix flake check catches 90% of issues instantly
|
|
3. **Real validation**: At least one VPS deployment before v1.0 publication
|
|
4. **Community testing**: Beta testers provide diverse environment testing
|
|
5. **Risk management**: Balances thoroughness with time/cost
|
|
|
|
### Implementation
|
|
```
|
|
# Testing levels:
|
|
1. Every commit (CI):
|
|
- nix flake check (all configs)
|
|
- gitleaks scan
|
|
- Build example-vps.nix
|
|
- Build example-dev.nix
|
|
|
|
2. Before PR merge:
|
|
- All CI checks pass
|
|
- Manual review of sanitization
|
|
- Documentation accuracy check
|
|
|
|
3. Before v1.0 publication (Phase 3):
|
|
- Deploy example-vps.nix to fresh Vultr VPS
|
|
- Deploy example-dev.nix to fresh Vultr VPS
|
|
- Test all user stories end-to-end
|
|
- Verify Matrix server responds
|
|
- Test bridge setup guides
|
|
- Validate secrets management workflow
|
|
- Community beta testing (3-5 testers)
|
|
|
|
4. Post-publication:
|
|
- Monitor GitHub issues for deployment problems
|
|
- Track success metrics (SC-007, SC-008)
|
|
```
|
|
|
|
### VPS Test Checklist (Phase 3)
|
|
- [ ] Fresh NixOS VPS provisioned
|
|
- [ ] Clone template repository
|
|
- [ ] Follow getting-started.md (time it - should be <30 min)
|
|
- [ ] Customize example-vps.nix with test domain
|
|
- [ ] Deploy with nixos-rebuild
|
|
- [ ] Verify Matrix API responds (curl /_matrix/client/versions)
|
|
- [ ] Create test user with registration token
|
|
- [ ] Test Element Web login
|
|
- [ ] Follow slack-setup.md (if testing bridges)
|
|
- [ ] Verify CI runs on sample PR
|
|
- [ ] Document any issues or unclear docs
|
|
|
|
### Alternatives Considered
|
|
- **No integration testing**: Rejected - too risky for v1.0 publication
|
|
- **Automated VPS tests**: Deferred to v1.1 - complex setup, manual adequate for v1.0
|
|
- **Continuous VPS testing**: Rejected - expensive, unnecessary for template
|
|
|
|
---
|
|
|
|
## Decision 6: Repository Initialization Approach
|
|
|
|
### Decision
|
|
**Manual repository creation** on GitHub with direct file commits (not git-filter-repo)
|
|
|
|
### Rationale
|
|
1. **Clean history**: Fresh git init ensures no ops-base history
|
|
2. **Safety**: Manual process allows review at each step
|
|
3. **Simplicity**: Direct commits easier than git-filter-repo for one-time extraction
|
|
4. **Transparency**: Clear commit history shows sanitization process
|
|
|
|
### Implementation
|
|
```bash
|
|
# Repository creation workflow:
|
|
1. Create empty GitHub repo: nixos-matrix-platform-template
|
|
2. Clone to local machine
|
|
3. Copy sanitized files from staging directory
|
|
4. Review each file before adding
|
|
5. Create initial commit structure:
|
|
- Commit 1: Add README, LICENSE, .gitignore
|
|
- Commit 2: Add modules/
|
|
- Commit 3: Add configurations/
|
|
- Commit 4: Add docs/
|
|
- Commit 5: Add .github/workflows/
|
|
- Commit 6: Add examples/ and scripts/
|
|
6. Run final validation
|
|
7. Push to GitHub
|
|
8. Enable GitHub Discussions
|
|
9. Add repository description and tags
|
|
```
|
|
|
|
### No git-filter-repo
|
|
- Not needed - we're creating fresh repo, not rewriting history
|
|
- ops-base history stays in ops-base
|
|
- Template has clean, purposeful commit history
|
|
|
|
---
|
|
|
|
## Technology Stack Summary
|
|
|
|
### Core Technologies
|
|
- **Nix/NixOS**: 24.05+ (pinned via flake.lock)
|
|
- **nixpkgs**: Pinned to tested commit for reproducibility
|
|
- **sops-nix**: v0.15.0+ (secrets management)
|
|
- **age**: Latest (encryption backend for sops-nix)
|
|
|
|
### Validation & Security
|
|
- **gitleaks**: v8.18.0+ (secret scanning)
|
|
- **nix flake check**: Built-in Nix validation
|
|
|
|
### CI/CD
|
|
- **GitHub Actions**: Native CI/CD platform
|
|
- **cachix/install-nix-action@v25**: Nix installation for CI
|
|
- **gitleaks/gitleaks-action@v2**: Secret scanning action
|
|
|
|
### Development Tools
|
|
- **Bash**: 5.x (sanitization/sync scripts)
|
|
- **git**: 2.x (version control, tagging)
|
|
- **SSH**: For VPS deployment testing
|
|
|
|
### Services (Matrix Platform - in template)
|
|
- **matrix-continuwuity**: Matrix homeserver (from nixpkgs-unstable)
|
|
- **mautrix-slack**: Slack bridge
|
|
- **mautrix-whatsapp**: WhatsApp bridge
|
|
- **mautrix-gmessages**: Google Messages bridge
|
|
- **forgejo**: Git service
|
|
- **nginx**: Reverse proxy
|
|
- **postgresql**: Database for bridges/Forgejo
|
|
- **fail2ban**: Intrusion prevention
|
|
- **sops**: Secrets encryption CLI
|
|
|
|
---
|
|
|
|
## Best Practices Applied
|
|
|
|
### From NixOS Community
|
|
1. **Pinned dependencies**: Use flake.lock for reproducibility
|
|
2. **Module options**: Provide configurable options with sensible defaults
|
|
3. **Security hardening**: Apply systemd security features
|
|
4. **Documentation**: Comprehensive examples and guides
|
|
|
|
### From Infrastructure-as-Code
|
|
1. **Immutability**: Fresh git history, no rewrites
|
|
2. **Validation**: Multiple layers (syntax, build, secrets, integration)
|
|
3. **Idempotency**: All scripts can be run multiple times safely
|
|
4. **Auditability**: Clear commit messages, sync logs, checklists
|
|
|
|
### From Open Source Projects
|
|
1. **Governance files**: CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT
|
|
2. **Issue templates**: Structured bug reports and feature requests
|
|
3. **CI/CD**: Automated checks on every PR
|
|
4. **Semantic versioning**: v1.0.0 for initial stable release
|
|
|
|
---
|
|
|
|
## Risk Mitigations
|
|
|
|
### Secret Leakage (Critical Risk)
|
|
- **Mitigation 1**: Automated gitleaks scan (CI + local)
|
|
- **Mitigation 2**: Manual review checklist
|
|
- **Mitigation 3**: Fresh git history (no ops-base commits)
|
|
- **Mitigation 4**: Community beta review before publication
|
|
- **Residual Risk**: Low (multi-layer validation)
|
|
|
|
### Template Divergence (Medium Risk)
|
|
- **Mitigation 1**: Documented sync workflow (scripts/sync-to-template.sh)
|
|
- **Mitigation 2**: Quarterly calendar reminders
|
|
- **Mitigation 3**: Git tags + sync-log.md for tracking
|
|
- **Mitigation 4**: RFC-validated process
|
|
- **Residual Risk**: Medium (requires discipline)
|
|
|
|
### Breaking Dependencies (Medium Risk)
|
|
- **Mitigation 1**: Pinned nixpkgs version
|
|
- **Mitigation 2**: CI tests on every commit
|
|
- **Mitigation 3**: Integration testing before major releases
|
|
- **Mitigation 4**: Version compatibility matrix in README
|
|
- **Residual Risk**: Low (pinning prevents surprises)
|
|
|
|
### Poor Documentation (Medium Risk)
|
|
- **Mitigation 1**: Extract from 300KB+ tested worklogs
|
|
- **Mitigation 2**: Community beta testing for clarity
|
|
- **Mitigation 3**: User story acceptance criteria
|
|
- **Mitigation 4**: Quick start guide (5-minute target)
|
|
- **Residual Risk**: Low (comprehensive extraction + validation)
|
|
|
|
---
|
|
|
|
## Success Metrics Tracking
|
|
|
|
### How to Measure (post-publication)
|
|
- **SC-001** (30 min deployment): Time beta testers during Phase 3
|
|
- **SC-002** (builds pass): CI badge in README, track failures
|
|
- **SC-003** (zero secrets): gitleaks CI status, manual audits
|
|
- **SC-004** (8 modules extracted): Checklist of modules present and building - matrix-continuwuity, mautrix-slack, mautrix-whatsapp, mautrix-gmessages, dev-services, fail2ban, ssh-hardening, matrix-secrets
|
|
- **SC-005** (documentation complete): Checklist of required docs present - getting-started.md, architecture.md, secrets-management.md, slack-setup.md, whatsapp-setup.md, gmessages-setup.md, config-generation.md, admin-room-setup.md
|
|
- **SC-006** (CI runs): GitHub Actions badge, monitor runs
|
|
- **SC-007** (10 stars in 3 months): GitHub stars count
|
|
- **SC-008** (3 issues/PRs): GitHub insights
|
|
- **SC-009** (zero incidents): Monitor issues, no secret reports
|
|
- **SC-010** (quarterly sync): Track sync-log.md entries
|
|
|
|
---
|
|
|
|
## Open Questions Resolved
|
|
|
|
All open questions from spec were marked "None - resolved in RFC". Additional implementation questions resolved here:
|
|
|
|
1. **Q: Should we use pre-commit hooks?**
|
|
- A: Yes, but optional for users. Include .pre-commit-config.yaml example
|
|
|
|
2. **Q: What NixOS version to target?**
|
|
- A: 24.05+ (current stable), test on both stable and unstable
|
|
|
|
3. **Q: Should we include Cachix in CI?**
|
|
- A: Not for v1.0 (added complexity), consider for v1.1 if builds slow
|
|
|
|
4. **Q: How to handle user questions/support?**
|
|
- A: GitHub Discussions for Q&A, Issues for bugs only (per CONTRIBUTING.md)
|
|
|
|
5. **Q: Should we create a Matrix room for support?**
|
|
- A: Yes, mentioned in README (#nixos-matrix-template:matrix.org) - dogfooding
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. ✅ Research completed - all decisions documented
|
|
2. → Proceed to Phase 1: Design & Contracts
|
|
- Create data-model.md (entities, relationships)
|
|
- Create contracts/ (sanitization rules, CI contracts)
|
|
- Create quickstart.md (developer onboarding)
|
|
- Update agent context (.specify/memory/AGENT_FILE.md)
|