# ops-jrz1 Platform Vision **Status:** North Star Document **Last Updated:** 2025-10-22 **Maintainers:** dan (primary), team (shared responsibility) ## Executive Summary ops-jrz1 is a self-hosted collaborative development platform for small engineering teams (2-5 engineers). It provides communication bridging (Matrix ↔ Slack), code hosting (Forgejo), and declarative deployment infrastructure (NixOS) with a focus on **sustainability over speed** and **quality over quick wins**. ## Core Philosophy **Build It Right Over Time** - Avoid technical debt - Declarative and reproducible (NixOS) - Self-documenting - Sustainable for small team - Clear patterns for contributions **Presentable State First** - Working demo-able features - Clear documentation - Inviting for new engineers - Professional appearance ## Current State (Generation 31+) ### Operational Services - ✅ Matrix homeserver (conduwuit 0.5.0-rc.8) on clarun.xyz - ✅ Forgejo (7.0.12) at git.clarun.xyz - ✅ nginx reverse proxy with TLS (Let's Encrypt) - ✅ PostgreSQL 15.10 (Forgejo database) - ✅ sops-nix secrets management - ✅ Self-hosted infrastructure configuration (ops-jrz1 repo on Forgejo) ### Security Posture - ✅ SSH key-only authentication - ✅ Secrets encrypted with age/sops-nix - ✅ Services isolated on localhost (Matrix, PostgreSQL) - ✅ Firewall (only SSH, HTTP, HTTPS exposed) - ✅ Comprehensive security validation completed ### Incomplete/Blocked - ⚠️ mautrix-slack bridge (exit code 11, needs configuration) - ⚠️ mautrix-whatsapp (configured but not tested) - ⚠️ mautrix-gmessages (configured but not tested) - ⚠️ No deployment pattern for team projects yet ## Target "Presentable MVP" ### Definition of Presentable When we can say: "Here's a working platform you can use and contribute to" **Criteria:** 1. Slack bridge works bidirectionally 2. One example project successfully deployed 3. Clear onboarding documentation 4. Stable and tested (not constantly broken) 5. Professional presentation (docs, architecture clarity) ### Milestone 1: Working Slack Bridge **Goal:** Engineers in Slack can see it's alive and useful **Success Metric:** Send "Hello from Matrix!" message that appears in Slack via bridge **Tasks:** - Update workspace config (delpadtech → chochacho) - Create Slack app in chochacho workspace - Configure Slack credentials (app token, bot token) in sops-nix - Debug exit code 11 issue - Test bidirectional messaging (Slack ↔ Matrix) - Document setup in worklog **Impact:** Highly visible proof of concept, validates core architecture **Priority:** **HIGH** - Unblocks team communication and collaboration ### Milestone 2: Example Project Pattern **Goal:** Clear template for "how to add a project" **Success Metric:** Engineer can clone template repo, modify, and deploy a simple bot **Deliverables:** - Example project: "chochacho-hello-bot" (responds to !hello in Matrix) - Project structure: Nix flake + NixOS module pattern - Documentation: docs/project-template.md - Template repository on Forgejo **Impact:** Makes platform "joinable" - clear contribution path **Priority:** **MEDIUM** - Required before onboarding engineers ### Milestone 3: Platform Documentation **Goal:** New engineer can understand and use the platform **Deliverables:** - docs/architecture.md - How the platform works - docs/onboarding.md - How to join as an engineer - docs/deployment.md - How to deploy projects - README.md - Overview and navigation **Impact:** Presentability factor, shows maturity and thoughtfulness **Priority:** **MEDIUM** - Can iterate as engineers join ## Architecture Principles ### Communication Layer **Primary:** Slack (chochacho workspace) **Hub:** Matrix homeserver bridges to Slack **Direction:** Bidirectional (Slack ↔ Matrix) **Current Focus:** Slack bridge only (not WhatsApp, Google Messages, etc.) **User Experience:** Engineers stay in Slack, Matrix runs behind the scenes to unify communication ### Code Hosting **Primary:** Self-hosted Forgejo at git.clarun.xyz **Flexibility:** Projects can also reference external repos (GitHub, etc.) **Model:** - `ops-jrz1` repository: Platform infrastructure (NixOS config) - Project repositories: Individual team projects - Clear separation: Infrastructure vs applications ### Deployment Philosophy **Chosen Approach:** NixOS-Native (Strict Declarative) **Pattern: Project as NixOS Module** ```nix # Example project structure project-name/ ├── flake.nix # Nix flake (how to build) ├── default.nix # Derivation (package definition) ├── module.nix # NixOS service module ├── src/ # Project code └── README.md # Deployment instructions ``` **Deployment Workflow:** 1. Engineer develops project locally (with Nix) 2. Project added to ops-jrz1 as import or flake reference 3. Push to Forgejo (project repo or ops-jrz1 update) 4. Admin reviews change (pull request optional) 5. `nixos-rebuild switch` deploys to production 6. Rollback available via NixOS generations **Benefits:** - ✅ Declarative and reproducible - ✅ Built-in rollback (generation management) - ✅ Consistent with existing ops-jrz1 pattern - ✅ Forces proper packaging (quality gate) - ✅ No additional deployment systems to maintain **Trade-offs:** - ❌ Requires NixOS knowledge (acceptable: team can learn) - ❌ Less "instant" than webhook deployment (acceptable: "no deployment urgency") - ❌ Admin approval step (beneficial: quality control) **Alternative Considered:** Hybrid model (platform in NixOS, projects flexible) - Deferred: Can relax strictness later if needed - Starting strict enforces quality and consistency ### Multi-Engineer Access Model **Level 1: Communication Only** - Slack workspace access (chochacho) - Can participate in bridged conversations - No infrastructure access needed **Level 2: Code Contributor** - Forgejo account (pattern established) - SSH key uploaded to Forgejo - Can push to project repositories - Can submit pull requests **Level 3: Deployer** - Can trigger deployments (merge to main?) - May have SSH access for debugging - Permissions to restart services **Level 4: Admin** - SSH root access to VPS - Can modify ops-jrz1 NixOS config - Secrets management access (sops-nix keys) - Infrastructure decision authority **Target Distribution (2-5 engineers):** - Level 1: All engineers - Level 2: All engineers (default) - Level 3: 2-3 trusted engineers - Level 4: 1-2 admins (primary: dan) ### Secrets Management **Tool:** sops-nix with age encryption **Current State:** - VPS SSH host key as age key: `age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q` - Admin workstation can decrypt (dan's age key) **Pattern:** ```yaml # secrets/secrets.yaml (encrypted) matrix-registration-token: "..." acme-email: "..." slack-app-token: "..." # Future slack-bot-token: "..." # Future ``` **Future Considerations:** - Add engineer age keys for collaboration - Per-project secrets (if needed) - Secret rotation workflow ### Testing Strategy **Current:** ops-jrz1-vm (VM testing before production) **Workflow:** 1. Develop locally 2. Test in VM (`nixos-rebuild build-vm`) 3. Deploy to production (`nixos-rebuild switch`) 4. Rollback if issues (`nixos-rebuild switch --rollback`) **Future:** - Automated testing (unit, integration) - Staging environment (if needed) - Pre-deployment health checks ## Technical Stack ### Infrastructure - **OS:** NixOS 24.05 - **Config Management:** Nix flakes - **Secrets:** sops-nix with age encryption - **Firewall:** iptables (nixos-fw) - **Web Server:** nginx with ACME/Let's Encrypt ### Communication - **Matrix Homeserver:** conduwuit 0.5.0-rc.8 - **Bridge Framework:** mautrix (Python-based) - **Target Bridge:** mautrix-slack (Socket Mode) ### Development Platform - **Git Server:** Forgejo 7.0.12 - **Database:** PostgreSQL 15.10 - **CI/CD:** Forgejo Actions (future consideration) ### Expected Project Stack (Flexible) - Python bots (primary expectation) - Node.js services (if needed) - Go binaries (if needed) - Any language with Nix packaging support ## Open Questions ### Communication Bridge - Which Slack channels to bridge? (All? Specific list? On-demand?) - User identity mapping: Slack display names or Matrix usernames? - Bot integration needs: GitHub notifications? CI/CD status? ### Project Deployment - Automated deployment on merge? Or manual trigger? - Pull request workflow required? Or direct push to main? - Health checks before deployment? - Monitoring and alerting strategy? ### Team Collaboration - How many engineers will actually join? (impacts scaling decisions) - Shared development environments needed? - Per-project Matrix rooms or one big room? - Weekly syncs or async-only collaboration? ### Repository Organization - Monorepo (ops-jrz1 + projects) or separate repos? - Public vs private repositories? - Who owns which repositories? ## Success Metrics ### Technical Success - ✅ All services healthy and monitored - ✅ Zero unplanned downtime - ✅ Fast rollback capability (< 5 minutes) - ✅ Clear audit trail (git history + NixOS generations) ### Team Success - ✅ Engineers can deploy projects independently - ✅ Onboarding time < 1 hour - ✅ Documentation answers common questions - ✅ Platform feels stable and trustworthy ### Project Success (Presentable State) - ✅ Slack bridge works reliably - ✅ Example project demonstrates the pattern - ✅ Documentation is complete and clear - ✅ At least one other engineer has successfully deployed ## Timeline **Phase 1: Working Slack Bridge** (1-2 focused sessions) - Update workspace configuration - Slack app setup and credential management - Debug and validate bidirectional messaging **Phase 2: Project Pattern** (1-2 sessions after Phase 1) - Create example bot - Document deployment pattern - Establish template repository **Phase 3: Documentation** (1 session) - Architecture documentation - Onboarding guide - Deployment runbook **Phase 4: Team Onboarding** (1 session per engineer) - Invite engineers - Supervised first deployment - Gather feedback and iterate **Target:** Presentable state within 4-8 focused work sessions **Constraint:** Not pressing, quality over speed ## References ### Internal Documentation - [Security Test Report](worklogs/2025-10-22-security-validation-test-report.md) - Generation 31 validation - [Deployment Log](worklogs/2025-10-22-deployment-generation-31.md) - Initial deployment - [Forgejo Setup](worklogs/2025-10-22-forgejo-repository-setup.org) - Git server configuration ### External Resources - [Mautrix Bridges Documentation](https://docs.mau.fi/) - [NixOS Manual](https://nixos.org/manual/nixos/stable/) - [Forgejo Documentation](https://forgejo.org/docs/) - [Matrix Specification](https://spec.matrix.org/) ## Revision History - **2025-10-22:** Initial vision document created after brainstorming session - Defined presentable MVP criteria - Established three-milestone roadmap - Documented architectural principles - Identified open questions for iteration