344 lines
11 KiB
Markdown
344 lines
11 KiB
Markdown
# ops-jrz1 Platform Vision
|
|
|
|
**Status:** North Star Document
|
|
**Last Updated:** 2025-10-22
|
|
**Maintainers:** dan (primary), team (shared responsibility)
|
|
|
|
## Executive Summary
|
|
|
|
ops-jrz1 is a self-hosted collaborative development platform for small engineering teams (2-5 engineers). It provides communication bridging (Matrix ↔ Slack), code hosting (Forgejo), and declarative deployment infrastructure (NixOS) with a focus on **sustainability over speed** and **quality over quick wins**.
|
|
|
|
## Core Philosophy
|
|
|
|
**Build It Right Over Time**
|
|
- Avoid technical debt
|
|
- Declarative and reproducible (NixOS)
|
|
- Self-documenting
|
|
- Sustainable for small team
|
|
- Clear patterns for contributions
|
|
|
|
**Presentable State First**
|
|
- Working demo-able features
|
|
- Clear documentation
|
|
- Inviting for new engineers
|
|
- Professional appearance
|
|
|
|
## Current State (Generation 31+)
|
|
|
|
### Operational Services
|
|
- ✅ Matrix homeserver (conduwuit 0.5.0-rc.8) on clarun.xyz
|
|
- ✅ Forgejo (7.0.12) at git.clarun.xyz
|
|
- ✅ nginx reverse proxy with TLS (Let's Encrypt)
|
|
- ✅ PostgreSQL 15.10 (Forgejo database)
|
|
- ✅ sops-nix secrets management
|
|
- ✅ Self-hosted infrastructure configuration (ops-jrz1 repo on Forgejo)
|
|
|
|
### Security Posture
|
|
- ✅ SSH key-only authentication
|
|
- ✅ Secrets encrypted with age/sops-nix
|
|
- ✅ Services isolated on localhost (Matrix, PostgreSQL)
|
|
- ✅ Firewall (only SSH, HTTP, HTTPS exposed)
|
|
- ✅ Comprehensive security validation completed
|
|
|
|
### Incomplete/Blocked
|
|
- ⚠️ mautrix-slack bridge (exit code 11, needs configuration)
|
|
- ⚠️ mautrix-whatsapp (configured but not tested)
|
|
- ⚠️ mautrix-gmessages (configured but not tested)
|
|
- ⚠️ No deployment pattern for team projects yet
|
|
|
|
## Target "Presentable MVP"
|
|
|
|
### Definition of Presentable
|
|
When we can say: "Here's a working platform you can use and contribute to"
|
|
|
|
**Criteria:**
|
|
1. Slack bridge works bidirectionally
|
|
2. One example project successfully deployed
|
|
3. Clear onboarding documentation
|
|
4. Stable and tested (not constantly broken)
|
|
5. Professional presentation (docs, architecture clarity)
|
|
|
|
### Milestone 1: Working Slack Bridge
|
|
**Goal:** Engineers in Slack can see it's alive and useful
|
|
|
|
**Success Metric:** Send "Hello from Matrix!" message that appears in Slack via bridge
|
|
|
|
**Tasks:**
|
|
- Update workspace config (delpadtech → chochacho)
|
|
- Create Slack app in chochacho workspace
|
|
- Configure Slack credentials (app token, bot token) in sops-nix
|
|
- Debug exit code 11 issue
|
|
- Test bidirectional messaging (Slack ↔ Matrix)
|
|
- Document setup in worklog
|
|
|
|
**Impact:** Highly visible proof of concept, validates core architecture
|
|
|
|
**Priority:** **HIGH** - Unblocks team communication and collaboration
|
|
|
|
### Milestone 2: Example Project Pattern
|
|
**Goal:** Clear template for "how to add a project"
|
|
|
|
**Success Metric:** Engineer can clone template repo, modify, and deploy a simple bot
|
|
|
|
**Deliverables:**
|
|
- Example project: "chochacho-hello-bot" (responds to !hello in Matrix)
|
|
- Project structure: Nix flake + NixOS module pattern
|
|
- Documentation: docs/project-template.md
|
|
- Template repository on Forgejo
|
|
|
|
**Impact:** Makes platform "joinable" - clear contribution path
|
|
|
|
**Priority:** **MEDIUM** - Required before onboarding engineers
|
|
|
|
### Milestone 3: Platform Documentation
|
|
**Goal:** New engineer can understand and use the platform
|
|
|
|
**Deliverables:**
|
|
- docs/architecture.md - How the platform works
|
|
- docs/onboarding.md - How to join as an engineer
|
|
- docs/deployment.md - How to deploy projects
|
|
- README.md - Overview and navigation
|
|
|
|
**Impact:** Presentability factor, shows maturity and thoughtfulness
|
|
|
|
**Priority:** **MEDIUM** - Can iterate as engineers join
|
|
|
|
## Architecture Principles
|
|
|
|
### Communication Layer
|
|
**Primary:** Slack (chochacho workspace)
|
|
**Hub:** Matrix homeserver bridges to Slack
|
|
**Direction:** Bidirectional (Slack ↔ Matrix)
|
|
|
|
**Current Focus:** Slack bridge only (not WhatsApp, Google Messages, etc.)
|
|
|
|
**User Experience:** Engineers stay in Slack, Matrix runs behind the scenes to unify communication
|
|
|
|
### Code Hosting
|
|
**Primary:** Self-hosted Forgejo at git.clarun.xyz
|
|
**Flexibility:** Projects can also reference external repos (GitHub, etc.)
|
|
|
|
**Model:**
|
|
- `ops-jrz1` repository: Platform infrastructure (NixOS config)
|
|
- Project repositories: Individual team projects
|
|
- Clear separation: Infrastructure vs applications
|
|
|
|
### Deployment Philosophy
|
|
**Chosen Approach:** NixOS-Native (Strict Declarative)
|
|
|
|
**Pattern: Project as NixOS Module**
|
|
```nix
|
|
# Example project structure
|
|
project-name/
|
|
├── flake.nix # Nix flake (how to build)
|
|
├── default.nix # Derivation (package definition)
|
|
├── module.nix # NixOS service module
|
|
├── src/ # Project code
|
|
└── README.md # Deployment instructions
|
|
```
|
|
|
|
**Deployment Workflow:**
|
|
1. Engineer develops project locally (with Nix)
|
|
2. Project added to ops-jrz1 as import or flake reference
|
|
3. Push to Forgejo (project repo or ops-jrz1 update)
|
|
4. Admin reviews change (pull request optional)
|
|
5. `nixos-rebuild switch` deploys to production
|
|
6. Rollback available via NixOS generations
|
|
|
|
**Benefits:**
|
|
- ✅ Declarative and reproducible
|
|
- ✅ Built-in rollback (generation management)
|
|
- ✅ Consistent with existing ops-jrz1 pattern
|
|
- ✅ Forces proper packaging (quality gate)
|
|
- ✅ No additional deployment systems to maintain
|
|
|
|
**Trade-offs:**
|
|
- ❌ Requires NixOS knowledge (acceptable: team can learn)
|
|
- ❌ Less "instant" than webhook deployment (acceptable: "no deployment urgency")
|
|
- ❌ Admin approval step (beneficial: quality control)
|
|
|
|
**Alternative Considered:** Hybrid model (platform in NixOS, projects flexible)
|
|
- Deferred: Can relax strictness later if needed
|
|
- Starting strict enforces quality and consistency
|
|
|
|
### Multi-Engineer Access Model
|
|
|
|
**Level 1: Communication Only**
|
|
- Slack workspace access (chochacho)
|
|
- Can participate in bridged conversations
|
|
- No infrastructure access needed
|
|
|
|
**Level 2: Code Contributor**
|
|
- Forgejo account (pattern established)
|
|
- SSH key uploaded to Forgejo
|
|
- Can push to project repositories
|
|
- Can submit pull requests
|
|
|
|
**Level 3: Deployer**
|
|
- Can trigger deployments (merge to main?)
|
|
- May have SSH access for debugging
|
|
- Permissions to restart services
|
|
|
|
**Level 4: Admin**
|
|
- SSH root access to VPS
|
|
- Can modify ops-jrz1 NixOS config
|
|
- Secrets management access (sops-nix keys)
|
|
- Infrastructure decision authority
|
|
|
|
**Target Distribution (2-5 engineers):**
|
|
- Level 1: All engineers
|
|
- Level 2: All engineers (default)
|
|
- Level 3: 2-3 trusted engineers
|
|
- Level 4: 1-2 admins (primary: dan)
|
|
|
|
### Secrets Management
|
|
**Tool:** sops-nix with age encryption
|
|
|
|
**Current State:**
|
|
- VPS SSH host key as age key: `age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q`
|
|
- Admin workstation can decrypt (dan's age key)
|
|
|
|
**Pattern:**
|
|
```yaml
|
|
# secrets/secrets.yaml (encrypted)
|
|
matrix-registration-token: "..."
|
|
acme-email: "..."
|
|
slack-app-token: "..." # Future
|
|
slack-bot-token: "..." # Future
|
|
```
|
|
|
|
**Future Considerations:**
|
|
- Add engineer age keys for collaboration
|
|
- Per-project secrets (if needed)
|
|
- Secret rotation workflow
|
|
|
|
### Testing Strategy
|
|
**Current:** ops-jrz1-vm (VM testing before production)
|
|
|
|
**Workflow:**
|
|
1. Develop locally
|
|
2. Test in VM (`nixos-rebuild build-vm`)
|
|
3. Deploy to production (`nixos-rebuild switch`)
|
|
4. Rollback if issues (`nixos-rebuild switch --rollback`)
|
|
|
|
**Future:**
|
|
- Automated testing (unit, integration)
|
|
- Staging environment (if needed)
|
|
- Pre-deployment health checks
|
|
|
|
## Technical Stack
|
|
|
|
### Infrastructure
|
|
- **OS:** NixOS 24.05
|
|
- **Config Management:** Nix flakes
|
|
- **Secrets:** sops-nix with age encryption
|
|
- **Firewall:** iptables (nixos-fw)
|
|
- **Web Server:** nginx with ACME/Let's Encrypt
|
|
|
|
### Communication
|
|
- **Matrix Homeserver:** conduwuit 0.5.0-rc.8
|
|
- **Bridge Framework:** mautrix (Python-based)
|
|
- **Target Bridge:** mautrix-slack (Socket Mode)
|
|
|
|
### Development Platform
|
|
- **Git Server:** Forgejo 7.0.12
|
|
- **Database:** PostgreSQL 15.10
|
|
- **CI/CD:** Forgejo Actions (future consideration)
|
|
|
|
### Expected Project Stack (Flexible)
|
|
- Python bots (primary expectation)
|
|
- Node.js services (if needed)
|
|
- Go binaries (if needed)
|
|
- Any language with Nix packaging support
|
|
|
|
## Open Questions
|
|
|
|
### Communication Bridge
|
|
- Which Slack channels to bridge? (All? Specific list? On-demand?)
|
|
- User identity mapping: Slack display names or Matrix usernames?
|
|
- Bot integration needs: GitHub notifications? CI/CD status?
|
|
|
|
### Project Deployment
|
|
- Automated deployment on merge? Or manual trigger?
|
|
- Pull request workflow required? Or direct push to main?
|
|
- Health checks before deployment?
|
|
- Monitoring and alerting strategy?
|
|
|
|
### Team Collaboration
|
|
- How many engineers will actually join? (impacts scaling decisions)
|
|
- Shared development environments needed?
|
|
- Per-project Matrix rooms or one big room?
|
|
- Weekly syncs or async-only collaboration?
|
|
|
|
### Repository Organization
|
|
- Monorepo (ops-jrz1 + projects) or separate repos?
|
|
- Public vs private repositories?
|
|
- Who owns which repositories?
|
|
|
|
## Success Metrics
|
|
|
|
### Technical Success
|
|
- ✅ All services healthy and monitored
|
|
- ✅ Zero unplanned downtime
|
|
- ✅ Fast rollback capability (< 5 minutes)
|
|
- ✅ Clear audit trail (git history + NixOS generations)
|
|
|
|
### Team Success
|
|
- ✅ Engineers can deploy projects independently
|
|
- ✅ Onboarding time < 1 hour
|
|
- ✅ Documentation answers common questions
|
|
- ✅ Platform feels stable and trustworthy
|
|
|
|
### Project Success (Presentable State)
|
|
- ✅ Slack bridge works reliably
|
|
- ✅ Example project demonstrates the pattern
|
|
- ✅ Documentation is complete and clear
|
|
- ✅ At least one other engineer has successfully deployed
|
|
|
|
## Timeline
|
|
|
|
**Phase 1: Working Slack Bridge** (1-2 focused sessions)
|
|
- Update workspace configuration
|
|
- Slack app setup and credential management
|
|
- Debug and validate bidirectional messaging
|
|
|
|
**Phase 2: Project Pattern** (1-2 sessions after Phase 1)
|
|
- Create example bot
|
|
- Document deployment pattern
|
|
- Establish template repository
|
|
|
|
**Phase 3: Documentation** (1 session)
|
|
- Architecture documentation
|
|
- Onboarding guide
|
|
- Deployment runbook
|
|
|
|
**Phase 4: Team Onboarding** (1 session per engineer)
|
|
- Invite engineers
|
|
- Supervised first deployment
|
|
- Gather feedback and iterate
|
|
|
|
**Target:** Presentable state within 4-8 focused work sessions
|
|
|
|
**Constraint:** Not pressing, quality over speed
|
|
|
|
## References
|
|
|
|
### Internal Documentation
|
|
- [Security Test Report](worklogs/2025-10-22-security-validation-test-report.md) - Generation 31 validation
|
|
- [Deployment Log](worklogs/2025-10-22-deployment-generation-31.md) - Initial deployment
|
|
- [Forgejo Setup](worklogs/2025-10-22-forgejo-repository-setup.org) - Git server configuration
|
|
|
|
### External Resources
|
|
- [Mautrix Bridges Documentation](https://docs.mau.fi/)
|
|
- [NixOS Manual](https://nixos.org/manual/nixos/stable/)
|
|
- [Forgejo Documentation](https://forgejo.org/docs/)
|
|
- [Matrix Specification](https://spec.matrix.org/)
|
|
|
|
## Revision History
|
|
|
|
- **2025-10-22:** Initial vision document created after brainstorming session
|
|
- Defined presentable MVP criteria
|
|
- Established three-milestone roadmap
|
|
- Documented architectural principles
|
|
- Identified open questions for iteration
|