ops-jrz1/docs/platform-vision.md

11 KiB

ops-jrz1 Platform Vision

Status: North Star Document Last Updated: 2025-10-22 Maintainers: dan (primary), team (shared responsibility)

Executive Summary

ops-jrz1 is a self-hosted collaborative development platform for small engineering teams (2-5 engineers). It provides communication bridging (Matrix ↔ Slack), code hosting (Forgejo), and declarative deployment infrastructure (NixOS) with a focus on sustainability over speed and quality over quick wins.

Core Philosophy

Build It Right Over Time

  • Avoid technical debt
  • Declarative and reproducible (NixOS)
  • Self-documenting
  • Sustainable for small team
  • Clear patterns for contributions

Presentable State First

  • Working demo-able features
  • Clear documentation
  • Inviting for new engineers
  • Professional appearance

Current State (Generation 31+)

Operational Services

  • Matrix homeserver (conduwuit 0.5.0-rc.8) on clarun.xyz
  • Forgejo (7.0.12) at git.clarun.xyz
  • nginx reverse proxy with TLS (Let's Encrypt)
  • PostgreSQL 15.10 (Forgejo database)
  • sops-nix secrets management
  • Self-hosted infrastructure configuration (ops-jrz1 repo on Forgejo)

Security Posture

  • SSH key-only authentication
  • Secrets encrypted with age/sops-nix
  • Services isolated on localhost (Matrix, PostgreSQL)
  • Firewall (only SSH, HTTP, HTTPS exposed)
  • Comprehensive security validation completed

Incomplete/Blocked

  • ⚠️ mautrix-slack bridge (exit code 11, needs configuration)
  • ⚠️ mautrix-whatsapp (configured but not tested)
  • ⚠️ mautrix-gmessages (configured but not tested)
  • ⚠️ No deployment pattern for team projects yet

Target "Presentable MVP"

Definition of Presentable

When we can say: "Here's a working platform you can use and contribute to"

Criteria:

  1. Slack bridge works bidirectionally
  2. One example project successfully deployed
  3. Clear onboarding documentation
  4. Stable and tested (not constantly broken)
  5. Professional presentation (docs, architecture clarity)

Milestone 1: Working Slack Bridge

Goal: Engineers in Slack can see it's alive and useful

Success Metric: Send "Hello from Matrix!" message that appears in Slack via bridge

Tasks:

  • Update workspace config (delpadtech → chochacho)
  • Create Slack app in chochacho workspace
  • Configure Slack credentials (app token, bot token) in sops-nix
  • Debug exit code 11 issue
  • Test bidirectional messaging (Slack ↔ Matrix)
  • Document setup in worklog

Impact: Highly visible proof of concept, validates core architecture

Priority: HIGH - Unblocks team communication and collaboration

Milestone 2: Example Project Pattern

Goal: Clear template for "how to add a project"

Success Metric: Engineer can clone template repo, modify, and deploy a simple bot

Deliverables:

  • Example project: "chochacho-hello-bot" (responds to !hello in Matrix)
  • Project structure: Nix flake + NixOS module pattern
  • Documentation: docs/project-template.md
  • Template repository on Forgejo

Impact: Makes platform "joinable" - clear contribution path

Priority: MEDIUM - Required before onboarding engineers

Milestone 3: Platform Documentation

Goal: New engineer can understand and use the platform

Deliverables:

  • docs/architecture.md - How the platform works
  • docs/onboarding.md - How to join as an engineer
  • docs/deployment.md - How to deploy projects
  • README.md - Overview and navigation

Impact: Presentability factor, shows maturity and thoughtfulness

Priority: MEDIUM - Can iterate as engineers join

Architecture Principles

Communication Layer

Primary: Slack (chochacho workspace) Hub: Matrix homeserver bridges to Slack Direction: Bidirectional (Slack ↔ Matrix)

Current Focus: Slack bridge only (not WhatsApp, Google Messages, etc.)

User Experience: Engineers stay in Slack, Matrix runs behind the scenes to unify communication

Code Hosting

Primary: Self-hosted Forgejo at git.clarun.xyz Flexibility: Projects can also reference external repos (GitHub, etc.)

Model:

  • ops-jrz1 repository: Platform infrastructure (NixOS config)
  • Project repositories: Individual team projects
  • Clear separation: Infrastructure vs applications

Deployment Philosophy

Chosen Approach: NixOS-Native (Strict Declarative)

Pattern: Project as NixOS Module

# Example project structure
project-name/
├── flake.nix              # Nix flake (how to build)
├── default.nix            # Derivation (package definition)
├── module.nix             # NixOS service module
├── src/                   # Project code
└── README.md              # Deployment instructions

Deployment Workflow:

  1. Engineer develops project locally (with Nix)
  2. Project added to ops-jrz1 as import or flake reference
  3. Push to Forgejo (project repo or ops-jrz1 update)
  4. Admin reviews change (pull request optional)
  5. nixos-rebuild switch deploys to production
  6. Rollback available via NixOS generations

Benefits:

  • Declarative and reproducible
  • Built-in rollback (generation management)
  • Consistent with existing ops-jrz1 pattern
  • Forces proper packaging (quality gate)
  • No additional deployment systems to maintain

Trade-offs:

  • Requires NixOS knowledge (acceptable: team can learn)
  • Less "instant" than webhook deployment (acceptable: "no deployment urgency")
  • Admin approval step (beneficial: quality control)

Alternative Considered: Hybrid model (platform in NixOS, projects flexible)

  • Deferred: Can relax strictness later if needed
  • Starting strict enforces quality and consistency

Multi-Engineer Access Model

Level 1: Communication Only

  • Slack workspace access (chochacho)
  • Can participate in bridged conversations
  • No infrastructure access needed

Level 2: Code Contributor

  • Forgejo account (pattern established)
  • SSH key uploaded to Forgejo
  • Can push to project repositories
  • Can submit pull requests

Level 3: Deployer

  • Can trigger deployments (merge to main?)
  • May have SSH access for debugging
  • Permissions to restart services

Level 4: Admin

  • SSH root access to VPS
  • Can modify ops-jrz1 NixOS config
  • Secrets management access (sops-nix keys)
  • Infrastructure decision authority

Target Distribution (2-5 engineers):

  • Level 1: All engineers
  • Level 2: All engineers (default)
  • Level 3: 2-3 trusted engineers
  • Level 4: 1-2 admins (primary: dan)

Secrets Management

Tool: sops-nix with age encryption

Current State:

  • VPS SSH host key as age key: age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q
  • Admin workstation can decrypt (dan's age key)

Pattern:

# secrets/secrets.yaml (encrypted)
matrix-registration-token: "..."
acme-email: "..."
slack-app-token: "..."        # Future
slack-bot-token: "..."        # Future

Future Considerations:

  • Add engineer age keys for collaboration
  • Per-project secrets (if needed)
  • Secret rotation workflow

Testing Strategy

Current: ops-jrz1-vm (VM testing before production)

Workflow:

  1. Develop locally
  2. Test in VM (nixos-rebuild build-vm)
  3. Deploy to production (nixos-rebuild switch)
  4. Rollback if issues (nixos-rebuild switch --rollback)

Future:

  • Automated testing (unit, integration)
  • Staging environment (if needed)
  • Pre-deployment health checks

Technical Stack

Infrastructure

  • OS: NixOS 24.05
  • Config Management: Nix flakes
  • Secrets: sops-nix with age encryption
  • Firewall: iptables (nixos-fw)
  • Web Server: nginx with ACME/Let's Encrypt

Communication

  • Matrix Homeserver: conduwuit 0.5.0-rc.8
  • Bridge Framework: mautrix (Python-based)
  • Target Bridge: mautrix-slack (Socket Mode)

Development Platform

  • Git Server: Forgejo 7.0.12
  • Database: PostgreSQL 15.10
  • CI/CD: Forgejo Actions (future consideration)

Expected Project Stack (Flexible)

  • Python bots (primary expectation)
  • Node.js services (if needed)
  • Go binaries (if needed)
  • Any language with Nix packaging support

Open Questions

Communication Bridge

  • Which Slack channels to bridge? (All? Specific list? On-demand?)
  • User identity mapping: Slack display names or Matrix usernames?
  • Bot integration needs: GitHub notifications? CI/CD status?

Project Deployment

  • Automated deployment on merge? Or manual trigger?
  • Pull request workflow required? Or direct push to main?
  • Health checks before deployment?
  • Monitoring and alerting strategy?

Team Collaboration

  • How many engineers will actually join? (impacts scaling decisions)
  • Shared development environments needed?
  • Per-project Matrix rooms or one big room?
  • Weekly syncs or async-only collaboration?

Repository Organization

  • Monorepo (ops-jrz1 + projects) or separate repos?
  • Public vs private repositories?
  • Who owns which repositories?

Success Metrics

Technical Success

  • All services healthy and monitored
  • Zero unplanned downtime
  • Fast rollback capability (< 5 minutes)
  • Clear audit trail (git history + NixOS generations)

Team Success

  • Engineers can deploy projects independently
  • Onboarding time < 1 hour
  • Documentation answers common questions
  • Platform feels stable and trustworthy

Project Success (Presentable State)

  • Slack bridge works reliably
  • Example project demonstrates the pattern
  • Documentation is complete and clear
  • At least one other engineer has successfully deployed

Timeline

Phase 1: Working Slack Bridge (1-2 focused sessions)

  • Update workspace configuration
  • Slack app setup and credential management
  • Debug and validate bidirectional messaging

Phase 2: Project Pattern (1-2 sessions after Phase 1)

  • Create example bot
  • Document deployment pattern
  • Establish template repository

Phase 3: Documentation (1 session)

  • Architecture documentation
  • Onboarding guide
  • Deployment runbook

Phase 4: Team Onboarding (1 session per engineer)

  • Invite engineers
  • Supervised first deployment
  • Gather feedback and iterate

Target: Presentable state within 4-8 focused work sessions

Constraint: Not pressing, quality over speed

References

Internal Documentation

External Resources

Revision History

  • 2025-10-22: Initial vision document created after brainstorming session
    • Defined presentable MVP criteria
    • Established three-milestone roadmap
    • Documented architectural principles
    • Identified open questions for iteration