11 KiB
ops-jrz1 Platform Vision
Status: North Star Document Last Updated: 2025-10-22 Maintainers: dan (primary), team (shared responsibility)
Executive Summary
ops-jrz1 is a self-hosted collaborative development platform for small engineering teams (2-5 engineers). It provides communication bridging (Matrix ↔ Slack), code hosting (Forgejo), and declarative deployment infrastructure (NixOS) with a focus on sustainability over speed and quality over quick wins.
Core Philosophy
Build It Right Over Time
- Avoid technical debt
- Declarative and reproducible (NixOS)
- Self-documenting
- Sustainable for small team
- Clear patterns for contributions
Presentable State First
- Working demo-able features
- Clear documentation
- Inviting for new engineers
- Professional appearance
Current State (Generation 31+)
Operational Services
- ✅ Matrix homeserver (conduwuit 0.5.0-rc.8) on clarun.xyz
- ✅ Forgejo (7.0.12) at git.clarun.xyz
- ✅ nginx reverse proxy with TLS (Let's Encrypt)
- ✅ PostgreSQL 15.10 (Forgejo database)
- ✅ sops-nix secrets management
- ✅ Self-hosted infrastructure configuration (ops-jrz1 repo on Forgejo)
Security Posture
- ✅ SSH key-only authentication
- ✅ Secrets encrypted with age/sops-nix
- ✅ Services isolated on localhost (Matrix, PostgreSQL)
- ✅ Firewall (only SSH, HTTP, HTTPS exposed)
- ✅ Comprehensive security validation completed
Incomplete/Blocked
- ⚠️ mautrix-slack bridge (exit code 11, needs configuration)
- ⚠️ mautrix-whatsapp (configured but not tested)
- ⚠️ mautrix-gmessages (configured but not tested)
- ⚠️ No deployment pattern for team projects yet
Target "Presentable MVP"
Definition of Presentable
When we can say: "Here's a working platform you can use and contribute to"
Criteria:
- Slack bridge works bidirectionally
- One example project successfully deployed
- Clear onboarding documentation
- Stable and tested (not constantly broken)
- Professional presentation (docs, architecture clarity)
Milestone 1: Working Slack Bridge
Goal: Engineers in Slack can see it's alive and useful
Success Metric: Send "Hello from Matrix!" message that appears in Slack via bridge
Tasks:
- Update workspace config (delpadtech → chochacho)
- Create Slack app in chochacho workspace
- Configure Slack credentials (app token, bot token) in sops-nix
- Debug exit code 11 issue
- Test bidirectional messaging (Slack ↔ Matrix)
- Document setup in worklog
Impact: Highly visible proof of concept, validates core architecture
Priority: HIGH - Unblocks team communication and collaboration
Milestone 2: Example Project Pattern
Goal: Clear template for "how to add a project"
Success Metric: Engineer can clone template repo, modify, and deploy a simple bot
Deliverables:
- Example project: "chochacho-hello-bot" (responds to !hello in Matrix)
- Project structure: Nix flake + NixOS module pattern
- Documentation: docs/project-template.md
- Template repository on Forgejo
Impact: Makes platform "joinable" - clear contribution path
Priority: MEDIUM - Required before onboarding engineers
Milestone 3: Platform Documentation
Goal: New engineer can understand and use the platform
Deliverables:
- docs/architecture.md - How the platform works
- docs/onboarding.md - How to join as an engineer
- docs/deployment.md - How to deploy projects
- README.md - Overview and navigation
Impact: Presentability factor, shows maturity and thoughtfulness
Priority: MEDIUM - Can iterate as engineers join
Architecture Principles
Communication Layer
Primary: Slack (chochacho workspace) Hub: Matrix homeserver bridges to Slack Direction: Bidirectional (Slack ↔ Matrix)
Current Focus: Slack bridge only (not WhatsApp, Google Messages, etc.)
User Experience: Engineers stay in Slack, Matrix runs behind the scenes to unify communication
Code Hosting
Primary: Self-hosted Forgejo at git.clarun.xyz Flexibility: Projects can also reference external repos (GitHub, etc.)
Model:
ops-jrz1repository: Platform infrastructure (NixOS config)- Project repositories: Individual team projects
- Clear separation: Infrastructure vs applications
Deployment Philosophy
Chosen Approach: NixOS-Native (Strict Declarative)
Pattern: Project as NixOS Module
# Example project structure
project-name/
├── flake.nix # Nix flake (how to build)
├── default.nix # Derivation (package definition)
├── module.nix # NixOS service module
├── src/ # Project code
└── README.md # Deployment instructions
Deployment Workflow:
- Engineer develops project locally (with Nix)
- Project added to ops-jrz1 as import or flake reference
- Push to Forgejo (project repo or ops-jrz1 update)
- Admin reviews change (pull request optional)
nixos-rebuild switchdeploys to production- Rollback available via NixOS generations
Benefits:
- ✅ Declarative and reproducible
- ✅ Built-in rollback (generation management)
- ✅ Consistent with existing ops-jrz1 pattern
- ✅ Forces proper packaging (quality gate)
- ✅ No additional deployment systems to maintain
Trade-offs:
- ❌ Requires NixOS knowledge (acceptable: team can learn)
- ❌ Less "instant" than webhook deployment (acceptable: "no deployment urgency")
- ❌ Admin approval step (beneficial: quality control)
Alternative Considered: Hybrid model (platform in NixOS, projects flexible)
- Deferred: Can relax strictness later if needed
- Starting strict enforces quality and consistency
Multi-Engineer Access Model
Level 1: Communication Only
- Slack workspace access (chochacho)
- Can participate in bridged conversations
- No infrastructure access needed
Level 2: Code Contributor
- Forgejo account (pattern established)
- SSH key uploaded to Forgejo
- Can push to project repositories
- Can submit pull requests
Level 3: Deployer
- Can trigger deployments (merge to main?)
- May have SSH access for debugging
- Permissions to restart services
Level 4: Admin
- SSH root access to VPS
- Can modify ops-jrz1 NixOS config
- Secrets management access (sops-nix keys)
- Infrastructure decision authority
Target Distribution (2-5 engineers):
- Level 1: All engineers
- Level 2: All engineers (default)
- Level 3: 2-3 trusted engineers
- Level 4: 1-2 admins (primary: dan)
Secrets Management
Tool: sops-nix with age encryption
Current State:
- VPS SSH host key as age key:
age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q - Admin workstation can decrypt (dan's age key)
Pattern:
# secrets/secrets.yaml (encrypted)
matrix-registration-token: "..."
acme-email: "..."
slack-app-token: "..." # Future
slack-bot-token: "..." # Future
Future Considerations:
- Add engineer age keys for collaboration
- Per-project secrets (if needed)
- Secret rotation workflow
Testing Strategy
Current: ops-jrz1-vm (VM testing before production)
Workflow:
- Develop locally
- Test in VM (
nixos-rebuild build-vm) - Deploy to production (
nixos-rebuild switch) - Rollback if issues (
nixos-rebuild switch --rollback)
Future:
- Automated testing (unit, integration)
- Staging environment (if needed)
- Pre-deployment health checks
Technical Stack
Infrastructure
- OS: NixOS 24.05
- Config Management: Nix flakes
- Secrets: sops-nix with age encryption
- Firewall: iptables (nixos-fw)
- Web Server: nginx with ACME/Let's Encrypt
Communication
- Matrix Homeserver: conduwuit 0.5.0-rc.8
- Bridge Framework: mautrix (Python-based)
- Target Bridge: mautrix-slack (Socket Mode)
Development Platform
- Git Server: Forgejo 7.0.12
- Database: PostgreSQL 15.10
- CI/CD: Forgejo Actions (future consideration)
Expected Project Stack (Flexible)
- Python bots (primary expectation)
- Node.js services (if needed)
- Go binaries (if needed)
- Any language with Nix packaging support
Open Questions
Communication Bridge
- Which Slack channels to bridge? (All? Specific list? On-demand?)
- User identity mapping: Slack display names or Matrix usernames?
- Bot integration needs: GitHub notifications? CI/CD status?
Project Deployment
- Automated deployment on merge? Or manual trigger?
- Pull request workflow required? Or direct push to main?
- Health checks before deployment?
- Monitoring and alerting strategy?
Team Collaboration
- How many engineers will actually join? (impacts scaling decisions)
- Shared development environments needed?
- Per-project Matrix rooms or one big room?
- Weekly syncs or async-only collaboration?
Repository Organization
- Monorepo (ops-jrz1 + projects) or separate repos?
- Public vs private repositories?
- Who owns which repositories?
Success Metrics
Technical Success
- ✅ All services healthy and monitored
- ✅ Zero unplanned downtime
- ✅ Fast rollback capability (< 5 minutes)
- ✅ Clear audit trail (git history + NixOS generations)
Team Success
- ✅ Engineers can deploy projects independently
- ✅ Onboarding time < 1 hour
- ✅ Documentation answers common questions
- ✅ Platform feels stable and trustworthy
Project Success (Presentable State)
- ✅ Slack bridge works reliably
- ✅ Example project demonstrates the pattern
- ✅ Documentation is complete and clear
- ✅ At least one other engineer has successfully deployed
Timeline
Phase 1: Working Slack Bridge (1-2 focused sessions)
- Update workspace configuration
- Slack app setup and credential management
- Debug and validate bidirectional messaging
Phase 2: Project Pattern (1-2 sessions after Phase 1)
- Create example bot
- Document deployment pattern
- Establish template repository
Phase 3: Documentation (1 session)
- Architecture documentation
- Onboarding guide
- Deployment runbook
Phase 4: Team Onboarding (1 session per engineer)
- Invite engineers
- Supervised first deployment
- Gather feedback and iterate
Target: Presentable state within 4-8 focused work sessions
Constraint: Not pressing, quality over speed
References
Internal Documentation
- Security Test Report - Generation 31 validation
- Deployment Log - Initial deployment
- Forgejo Setup - Git server configuration
External Resources
Revision History
- 2025-10-22: Initial vision document created after brainstorming session
- Defined presentable MVP criteria
- Established three-milestone roadmap
- Documented architectural principles
- Identified open questions for iteration