From bce31933ed0508efe581106d74bfbcf194e431fa Mon Sep 17 00:00:00 2001 From: Dan Date: Sun, 26 Oct 2025 14:36:52 -0700 Subject: [PATCH] Add platform vision and spec-kit integration docs --- docs/platform-vision.md | 343 +++++++++++++++++++++++++++++++++++ docs/spec-kit-integration.md | 237 ++++++++++++++++++++++++ 2 files changed, 580 insertions(+) create mode 100644 docs/platform-vision.md create mode 100644 docs/spec-kit-integration.md diff --git a/docs/platform-vision.md b/docs/platform-vision.md new file mode 100644 index 0000000..bc70208 --- /dev/null +++ b/docs/platform-vision.md @@ -0,0 +1,343 @@ +# ops-jrz1 Platform Vision + +**Status:** North Star Document +**Last Updated:** 2025-10-22 +**Maintainers:** dan (primary), team (shared responsibility) + +## Executive Summary + +ops-jrz1 is a self-hosted collaborative development platform for small engineering teams (2-5 engineers). It provides communication bridging (Matrix ↔ Slack), code hosting (Forgejo), and declarative deployment infrastructure (NixOS) with a focus on **sustainability over speed** and **quality over quick wins**. + +## Core Philosophy + +**Build It Right Over Time** +- Avoid technical debt +- Declarative and reproducible (NixOS) +- Self-documenting +- Sustainable for small team +- Clear patterns for contributions + +**Presentable State First** +- Working demo-able features +- Clear documentation +- Inviting for new engineers +- Professional appearance + +## Current State (Generation 31+) + +### Operational Services +- ✅ Matrix homeserver (conduwuit 0.5.0-rc.8) on clarun.xyz +- ✅ Forgejo (7.0.12) at git.clarun.xyz +- ✅ nginx reverse proxy with TLS (Let's Encrypt) +- ✅ PostgreSQL 15.10 (Forgejo database) +- ✅ sops-nix secrets management +- ✅ Self-hosted infrastructure configuration (ops-jrz1 repo on Forgejo) + +### Security Posture +- ✅ SSH key-only authentication +- ✅ Secrets encrypted with age/sops-nix +- ✅ Services isolated on localhost (Matrix, PostgreSQL) +- ✅ Firewall (only SSH, HTTP, HTTPS exposed) +- ✅ Comprehensive security validation completed + +### Incomplete/Blocked +- ⚠️ mautrix-slack bridge (exit code 11, needs configuration) +- ⚠️ mautrix-whatsapp (configured but not tested) +- ⚠️ mautrix-gmessages (configured but not tested) +- ⚠️ No deployment pattern for team projects yet + +## Target "Presentable MVP" + +### Definition of Presentable +When we can say: "Here's a working platform you can use and contribute to" + +**Criteria:** +1. Slack bridge works bidirectionally +2. One example project successfully deployed +3. Clear onboarding documentation +4. Stable and tested (not constantly broken) +5. Professional presentation (docs, architecture clarity) + +### Milestone 1: Working Slack Bridge +**Goal:** Engineers in Slack can see it's alive and useful + +**Success Metric:** Send "Hello from Matrix!" message that appears in Slack via bridge + +**Tasks:** +- Update workspace config (delpadtech → chochacho) +- Create Slack app in chochacho workspace +- Configure Slack credentials (app token, bot token) in sops-nix +- Debug exit code 11 issue +- Test bidirectional messaging (Slack ↔ Matrix) +- Document setup in worklog + +**Impact:** Highly visible proof of concept, validates core architecture + +**Priority:** **HIGH** - Unblocks team communication and collaboration + +### Milestone 2: Example Project Pattern +**Goal:** Clear template for "how to add a project" + +**Success Metric:** Engineer can clone template repo, modify, and deploy a simple bot + +**Deliverables:** +- Example project: "chochacho-hello-bot" (responds to !hello in Matrix) +- Project structure: Nix flake + NixOS module pattern +- Documentation: docs/project-template.md +- Template repository on Forgejo + +**Impact:** Makes platform "joinable" - clear contribution path + +**Priority:** **MEDIUM** - Required before onboarding engineers + +### Milestone 3: Platform Documentation +**Goal:** New engineer can understand and use the platform + +**Deliverables:** +- docs/architecture.md - How the platform works +- docs/onboarding.md - How to join as an engineer +- docs/deployment.md - How to deploy projects +- README.md - Overview and navigation + +**Impact:** Presentability factor, shows maturity and thoughtfulness + +**Priority:** **MEDIUM** - Can iterate as engineers join + +## Architecture Principles + +### Communication Layer +**Primary:** Slack (chochacho workspace) +**Hub:** Matrix homeserver bridges to Slack +**Direction:** Bidirectional (Slack ↔ Matrix) + +**Current Focus:** Slack bridge only (not WhatsApp, Google Messages, etc.) + +**User Experience:** Engineers stay in Slack, Matrix runs behind the scenes to unify communication + +### Code Hosting +**Primary:** Self-hosted Forgejo at git.clarun.xyz +**Flexibility:** Projects can also reference external repos (GitHub, etc.) + +**Model:** +- `ops-jrz1` repository: Platform infrastructure (NixOS config) +- Project repositories: Individual team projects +- Clear separation: Infrastructure vs applications + +### Deployment Philosophy +**Chosen Approach:** NixOS-Native (Strict Declarative) + +**Pattern: Project as NixOS Module** +```nix +# Example project structure +project-name/ +├── flake.nix # Nix flake (how to build) +├── default.nix # Derivation (package definition) +├── module.nix # NixOS service module +├── src/ # Project code +└── README.md # Deployment instructions +``` + +**Deployment Workflow:** +1. Engineer develops project locally (with Nix) +2. Project added to ops-jrz1 as import or flake reference +3. Push to Forgejo (project repo or ops-jrz1 update) +4. Admin reviews change (pull request optional) +5. `nixos-rebuild switch` deploys to production +6. Rollback available via NixOS generations + +**Benefits:** +- ✅ Declarative and reproducible +- ✅ Built-in rollback (generation management) +- ✅ Consistent with existing ops-jrz1 pattern +- ✅ Forces proper packaging (quality gate) +- ✅ No additional deployment systems to maintain + +**Trade-offs:** +- ❌ Requires NixOS knowledge (acceptable: team can learn) +- ❌ Less "instant" than webhook deployment (acceptable: "no deployment urgency") +- ❌ Admin approval step (beneficial: quality control) + +**Alternative Considered:** Hybrid model (platform in NixOS, projects flexible) +- Deferred: Can relax strictness later if needed +- Starting strict enforces quality and consistency + +### Multi-Engineer Access Model + +**Level 1: Communication Only** +- Slack workspace access (chochacho) +- Can participate in bridged conversations +- No infrastructure access needed + +**Level 2: Code Contributor** +- Forgejo account (pattern established) +- SSH key uploaded to Forgejo +- Can push to project repositories +- Can submit pull requests + +**Level 3: Deployer** +- Can trigger deployments (merge to main?) +- May have SSH access for debugging +- Permissions to restart services + +**Level 4: Admin** +- SSH root access to VPS +- Can modify ops-jrz1 NixOS config +- Secrets management access (sops-nix keys) +- Infrastructure decision authority + +**Target Distribution (2-5 engineers):** +- Level 1: All engineers +- Level 2: All engineers (default) +- Level 3: 2-3 trusted engineers +- Level 4: 1-2 admins (primary: dan) + +### Secrets Management +**Tool:** sops-nix with age encryption + +**Current State:** +- VPS SSH host key as age key: `age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q` +- Admin workstation can decrypt (dan's age key) + +**Pattern:** +```yaml +# secrets/secrets.yaml (encrypted) +matrix-registration-token: "..." +acme-email: "..." +slack-app-token: "..." # Future +slack-bot-token: "..." # Future +``` + +**Future Considerations:** +- Add engineer age keys for collaboration +- Per-project secrets (if needed) +- Secret rotation workflow + +### Testing Strategy +**Current:** ops-jrz1-vm (VM testing before production) + +**Workflow:** +1. Develop locally +2. Test in VM (`nixos-rebuild build-vm`) +3. Deploy to production (`nixos-rebuild switch`) +4. Rollback if issues (`nixos-rebuild switch --rollback`) + +**Future:** +- Automated testing (unit, integration) +- Staging environment (if needed) +- Pre-deployment health checks + +## Technical Stack + +### Infrastructure +- **OS:** NixOS 24.05 +- **Config Management:** Nix flakes +- **Secrets:** sops-nix with age encryption +- **Firewall:** iptables (nixos-fw) +- **Web Server:** nginx with ACME/Let's Encrypt + +### Communication +- **Matrix Homeserver:** conduwuit 0.5.0-rc.8 +- **Bridge Framework:** mautrix (Python-based) +- **Target Bridge:** mautrix-slack (Socket Mode) + +### Development Platform +- **Git Server:** Forgejo 7.0.12 +- **Database:** PostgreSQL 15.10 +- **CI/CD:** Forgejo Actions (future consideration) + +### Expected Project Stack (Flexible) +- Python bots (primary expectation) +- Node.js services (if needed) +- Go binaries (if needed) +- Any language with Nix packaging support + +## Open Questions + +### Communication Bridge +- Which Slack channels to bridge? (All? Specific list? On-demand?) +- User identity mapping: Slack display names or Matrix usernames? +- Bot integration needs: GitHub notifications? CI/CD status? + +### Project Deployment +- Automated deployment on merge? Or manual trigger? +- Pull request workflow required? Or direct push to main? +- Health checks before deployment? +- Monitoring and alerting strategy? + +### Team Collaboration +- How many engineers will actually join? (impacts scaling decisions) +- Shared development environments needed? +- Per-project Matrix rooms or one big room? +- Weekly syncs or async-only collaboration? + +### Repository Organization +- Monorepo (ops-jrz1 + projects) or separate repos? +- Public vs private repositories? +- Who owns which repositories? + +## Success Metrics + +### Technical Success +- ✅ All services healthy and monitored +- ✅ Zero unplanned downtime +- ✅ Fast rollback capability (< 5 minutes) +- ✅ Clear audit trail (git history + NixOS generations) + +### Team Success +- ✅ Engineers can deploy projects independently +- ✅ Onboarding time < 1 hour +- ✅ Documentation answers common questions +- ✅ Platform feels stable and trustworthy + +### Project Success (Presentable State) +- ✅ Slack bridge works reliably +- ✅ Example project demonstrates the pattern +- ✅ Documentation is complete and clear +- ✅ At least one other engineer has successfully deployed + +## Timeline + +**Phase 1: Working Slack Bridge** (1-2 focused sessions) +- Update workspace configuration +- Slack app setup and credential management +- Debug and validate bidirectional messaging + +**Phase 2: Project Pattern** (1-2 sessions after Phase 1) +- Create example bot +- Document deployment pattern +- Establish template repository + +**Phase 3: Documentation** (1 session) +- Architecture documentation +- Onboarding guide +- Deployment runbook + +**Phase 4: Team Onboarding** (1 session per engineer) +- Invite engineers +- Supervised first deployment +- Gather feedback and iterate + +**Target:** Presentable state within 4-8 focused work sessions + +**Constraint:** Not pressing, quality over speed + +## References + +### Internal Documentation +- [Security Test Report](worklogs/2025-10-22-security-validation-test-report.md) - Generation 31 validation +- [Deployment Log](worklogs/2025-10-22-deployment-generation-31.md) - Initial deployment +- [Forgejo Setup](worklogs/2025-10-22-forgejo-repository-setup.org) - Git server configuration + +### External Resources +- [Mautrix Bridges Documentation](https://docs.mau.fi/) +- [NixOS Manual](https://nixos.org/manual/nixos/stable/) +- [Forgejo Documentation](https://forgejo.org/docs/) +- [Matrix Specification](https://spec.matrix.org/) + +## Revision History + +- **2025-10-22:** Initial vision document created after brainstorming session + - Defined presentable MVP criteria + - Established three-milestone roadmap + - Documented architectural principles + - Identified open questions for iteration diff --git a/docs/spec-kit-integration.md b/docs/spec-kit-integration.md new file mode 100644 index 0000000..31ae269 --- /dev/null +++ b/docs/spec-kit-integration.md @@ -0,0 +1,237 @@ +# Spec-Kit Integration for ops-jrz1 + +**Purpose:** How to use the spec-kit framework for structured feature development + +## What is Spec-Kit? + +Spec-kit is a feature development framework that provides structured planning and execution. It's designed to: +- Force clear thinking before coding +- Document decisions and rationale +- Create actionable task lists +- Provide quality checklists +- Track progress systematically + +## Current Spec-Kit Usage + +**Existing Feature:** +- `specs/001-extract-matrix-platform/` - The feature that established this platform + +**Structure:** +``` +specs/001-extract-matrix-platform/ +├── spec.md # What we're building and why +├── plan.md # How we'll build it +├── tasks.md # Specific actionable tasks +├── analysis.md # Technical analysis and decisions +├── research.md # Background research +├── data-model.md # Data structures and schemas +├── quickstart.md # Quick reference guide +├── checklists/ # Validation checklists +└── contracts/ # Interface contracts +``` + +## Using Spec-Kit for New Features + +### When to Create a Spec + +**✅ Good candidates for spec-kit:** +- Complex features with multiple components (e.g., Slack bridge integration) +- Features that affect architecture or patterns +- Features requiring team coordination +- Features with unclear requirements + +**❌ Not worth spec-kit overhead:** +- Bug fixes +- Minor config changes +- Documentation updates +- Quick experiments + +### Workflow + +1. **Create spec:** `/speckit.specify` - Describe what you want to build +2. **Plan implementation:** `/speckit.plan` - Break down into design steps +3. **Generate tasks:** `/speckit.tasks` - Create actionable task list +4. **Implement:** `/speckit.implement` - Execute tasks systematically +5. **Validate:** Use checklists to verify completion + +## Recommended: Slack Bridge Feature Spec + +Given our north star goal (Milestone 1: Working Slack Bridge), this is a **perfect candidate** for spec-kit. + +### Why Use Spec-Kit for Slack Bridge? + +**Complexity factors:** +- Requires external service integration (Slack API) +- Needs secrets management coordination +- Involves debugging unknown exit code 11 +- Has architectural implications (bridge pattern for future features) +- Requires documentation for team onboarding + +**Benefits:** +- Clear specification prevents scope creep +- Plan documents decision rationale +- Tasks provide clear progress tracking +- Checklists ensure quality (security, testing, docs) +- Future engineers can understand the design + +### Proposed Feature: `002-slack-bridge-integration` + +**Spec outline:** +```markdown +# Feature: Matrix-Slack Bridge Integration + +## Goal +Enable bidirectional communication between Slack (chochacho workspace) +and Matrix homeserver to unify team communication. + +## Background +- Team currently uses Slack as primary communication +- Matrix homeserver operational on clarun.xyz +- mautrix-slack module exists but exits with code 11 +- Existing Slack bot needs reauthorization with updated scopes +- Need Socket Mode for reliable connection + +## Success Criteria +- Send message in Slack → appears in Matrix +- Send message in Matrix → appears in Slack +- Bridge survives server restart +- Clear documentation for adding/removing channels +- Secrets properly managed via sops-nix + +## Non-Goals +- WhatsApp bridge (future) +- Google Messages bridge (future) +- Multi-workspace support (future) +``` + +## Next Steps + +1. **Start the spec process:** + ```bash + # Create new feature spec + # Use /speckit.specify command or create directory manually + mkdir -p specs/002-slack-bridge-integration + ``` + +2. **Write initial spec.md:** + - What: Slack bridge integration + - Why: Team communication unification + - Success criteria: Bidirectional messaging + - Constraints: Socket Mode, sops-nix secrets + +3. **Run planning:** + ```bash + # Generate implementation plan + # Use /speckit.plan command + ``` + +4. **Generate tasks:** + ```bash + # Create actionable task list + # Use /speckit.tasks command + ``` + +5. **Execute systematically:** + - Work through tasks in order + - Document blockers and decisions + - Update worklog as you go + +## Important Context: Existing Slack Bot + +**Key Information from User:** +> "We have access to a bot, we need to have a manager reauthorize +> the bot because we need to change something or other, redo the +> scopes maybe, change it to socket based" + +**Action Items:** +1. Identify existing Slack bot name/app +2. Document current scopes/permissions +3. Determine required scopes for mautrix-slack bridge +4. Request manager to reauthorize with new scopes +5. Enable Socket Mode in Slack app settings +6. Extract bot token and app token for sops-nix + +**Socket Mode Benefits:** +- No public webhook URL needed +- More reliable than polling +- Better for localhost-based bridges +- Recommended by mautrix-slack documentation + +## Coordination with Platform Vision + +The Slack bridge spec should reference and align with: +- `docs/platform-vision.md` - Overall platform goals +- Milestone 1: Working Slack Bridge +- Architecture Principles → Communication Layer + +## Checklist Template for Bridge Features + +```markdown +# Bridge Integration Checklist + +## Configuration +- [ ] External service credentials obtained +- [ ] Secrets added to sops-nix +- [ ] NixOS module configuration updated +- [ ] Database schema initialized + +## Testing +- [ ] Service starts without errors +- [ ] Bidirectional messaging works +- [ ] Bridge survives restart +- [ ] Error handling tested + +## Documentation +- [ ] Setup documented in worklog +- [ ] Credentials management documented +- [ ] Troubleshooting guide created +- [ ] Architecture diagram updated + +## Security +- [ ] Secrets encrypted and committed +- [ ] Permissions scoped appropriately +- [ ] Network isolation verified +- [ ] Audit logging enabled +``` + +## Questions to Answer in Spec + +When creating `002-slack-bridge-integration`, make sure to address: + +1. **Scope:** Which channels to bridge? (all, specific list, on-demand) +2. **Identity:** How to map Slack users to Matrix users? +3. **Permissions:** Who can manage bridge? (admin-only vs engineer-accessible) +4. **Failure Modes:** What happens if Slack is down? Bridge crashes? +5. **Monitoring:** How to know if bridge is healthy? +6. **Scaling:** Future multi-workspace support? + +## Integration with Worklogs + +**Relationship:** +- Spec-kit specs are **planning documents** (what and how) +- Worklogs are **execution records** (what happened and why) + +**Workflow:** +1. Create spec for the feature (e.g., `002-slack-bridge-integration`) +2. Work on implementation +3. Write worklogs documenting sessions (e.g., `2025-10-23-slack-bridge-setup.org`) +4. Update spec with learnings (if design changes) + +**Cross-references:** +- Specs should link to worklogs for implementation details +- Worklogs should reference specs for context +- Platform vision should reference both + +## Conclusion + +For the Slack bridge work (Milestone 1 of our platform vision), I recommend: + +**✅ Use spec-kit:** Create `specs/002-slack-bridge-integration/` +- Complex enough to warrant structured planning +- Clear bounded feature +- Important architectural precedent +- Needs team coordination + +**Next Command:** `/speckit.specify` to start the spec process + +This will ensure we build it right, document decisions, and make it easy for future engineers to understand and extend.