Add platform vision and spec-kit integration docs

This commit is contained in:
Dan 2025-10-26 14:36:52 -07:00
parent ca379311b8
commit bce31933ed
2 changed files with 580 additions and 0 deletions

343
docs/platform-vision.md Normal file
View file

@ -0,0 +1,343 @@
# ops-jrz1 Platform Vision
**Status:** North Star Document
**Last Updated:** 2025-10-22
**Maintainers:** dan (primary), team (shared responsibility)
## Executive Summary
ops-jrz1 is a self-hosted collaborative development platform for small engineering teams (2-5 engineers). It provides communication bridging (Matrix ↔ Slack), code hosting (Forgejo), and declarative deployment infrastructure (NixOS) with a focus on **sustainability over speed** and **quality over quick wins**.
## Core Philosophy
**Build It Right Over Time**
- Avoid technical debt
- Declarative and reproducible (NixOS)
- Self-documenting
- Sustainable for small team
- Clear patterns for contributions
**Presentable State First**
- Working demo-able features
- Clear documentation
- Inviting for new engineers
- Professional appearance
## Current State (Generation 31+)
### Operational Services
- ✅ Matrix homeserver (conduwuit 0.5.0-rc.8) on clarun.xyz
- ✅ Forgejo (7.0.12) at git.clarun.xyz
- ✅ nginx reverse proxy with TLS (Let's Encrypt)
- ✅ PostgreSQL 15.10 (Forgejo database)
- ✅ sops-nix secrets management
- ✅ Self-hosted infrastructure configuration (ops-jrz1 repo on Forgejo)
### Security Posture
- ✅ SSH key-only authentication
- ✅ Secrets encrypted with age/sops-nix
- ✅ Services isolated on localhost (Matrix, PostgreSQL)
- ✅ Firewall (only SSH, HTTP, HTTPS exposed)
- ✅ Comprehensive security validation completed
### Incomplete/Blocked
- ⚠️ mautrix-slack bridge (exit code 11, needs configuration)
- ⚠️ mautrix-whatsapp (configured but not tested)
- ⚠️ mautrix-gmessages (configured but not tested)
- ⚠️ No deployment pattern for team projects yet
## Target "Presentable MVP"
### Definition of Presentable
When we can say: "Here's a working platform you can use and contribute to"
**Criteria:**
1. Slack bridge works bidirectionally
2. One example project successfully deployed
3. Clear onboarding documentation
4. Stable and tested (not constantly broken)
5. Professional presentation (docs, architecture clarity)
### Milestone 1: Working Slack Bridge
**Goal:** Engineers in Slack can see it's alive and useful
**Success Metric:** Send "Hello from Matrix!" message that appears in Slack via bridge
**Tasks:**
- Update workspace config (delpadtech → chochacho)
- Create Slack app in chochacho workspace
- Configure Slack credentials (app token, bot token) in sops-nix
- Debug exit code 11 issue
- Test bidirectional messaging (Slack ↔ Matrix)
- Document setup in worklog
**Impact:** Highly visible proof of concept, validates core architecture
**Priority:** **HIGH** - Unblocks team communication and collaboration
### Milestone 2: Example Project Pattern
**Goal:** Clear template for "how to add a project"
**Success Metric:** Engineer can clone template repo, modify, and deploy a simple bot
**Deliverables:**
- Example project: "chochacho-hello-bot" (responds to !hello in Matrix)
- Project structure: Nix flake + NixOS module pattern
- Documentation: docs/project-template.md
- Template repository on Forgejo
**Impact:** Makes platform "joinable" - clear contribution path
**Priority:** **MEDIUM** - Required before onboarding engineers
### Milestone 3: Platform Documentation
**Goal:** New engineer can understand and use the platform
**Deliverables:**
- docs/architecture.md - How the platform works
- docs/onboarding.md - How to join as an engineer
- docs/deployment.md - How to deploy projects
- README.md - Overview and navigation
**Impact:** Presentability factor, shows maturity and thoughtfulness
**Priority:** **MEDIUM** - Can iterate as engineers join
## Architecture Principles
### Communication Layer
**Primary:** Slack (chochacho workspace)
**Hub:** Matrix homeserver bridges to Slack
**Direction:** Bidirectional (Slack ↔ Matrix)
**Current Focus:** Slack bridge only (not WhatsApp, Google Messages, etc.)
**User Experience:** Engineers stay in Slack, Matrix runs behind the scenes to unify communication
### Code Hosting
**Primary:** Self-hosted Forgejo at git.clarun.xyz
**Flexibility:** Projects can also reference external repos (GitHub, etc.)
**Model:**
- `ops-jrz1` repository: Platform infrastructure (NixOS config)
- Project repositories: Individual team projects
- Clear separation: Infrastructure vs applications
### Deployment Philosophy
**Chosen Approach:** NixOS-Native (Strict Declarative)
**Pattern: Project as NixOS Module**
```nix
# Example project structure
project-name/
├── flake.nix # Nix flake (how to build)
├── default.nix # Derivation (package definition)
├── module.nix # NixOS service module
├── src/ # Project code
└── README.md # Deployment instructions
```
**Deployment Workflow:**
1. Engineer develops project locally (with Nix)
2. Project added to ops-jrz1 as import or flake reference
3. Push to Forgejo (project repo or ops-jrz1 update)
4. Admin reviews change (pull request optional)
5. `nixos-rebuild switch` deploys to production
6. Rollback available via NixOS generations
**Benefits:**
- ✅ Declarative and reproducible
- ✅ Built-in rollback (generation management)
- ✅ Consistent with existing ops-jrz1 pattern
- ✅ Forces proper packaging (quality gate)
- ✅ No additional deployment systems to maintain
**Trade-offs:**
- ❌ Requires NixOS knowledge (acceptable: team can learn)
- ❌ Less "instant" than webhook deployment (acceptable: "no deployment urgency")
- ❌ Admin approval step (beneficial: quality control)
**Alternative Considered:** Hybrid model (platform in NixOS, projects flexible)
- Deferred: Can relax strictness later if needed
- Starting strict enforces quality and consistency
### Multi-Engineer Access Model
**Level 1: Communication Only**
- Slack workspace access (chochacho)
- Can participate in bridged conversations
- No infrastructure access needed
**Level 2: Code Contributor**
- Forgejo account (pattern established)
- SSH key uploaded to Forgejo
- Can push to project repositories
- Can submit pull requests
**Level 3: Deployer**
- Can trigger deployments (merge to main?)
- May have SSH access for debugging
- Permissions to restart services
**Level 4: Admin**
- SSH root access to VPS
- Can modify ops-jrz1 NixOS config
- Secrets management access (sops-nix keys)
- Infrastructure decision authority
**Target Distribution (2-5 engineers):**
- Level 1: All engineers
- Level 2: All engineers (default)
- Level 3: 2-3 trusted engineers
- Level 4: 1-2 admins (primary: dan)
### Secrets Management
**Tool:** sops-nix with age encryption
**Current State:**
- VPS SSH host key as age key: `age1vuxcwvdvzl2u7w6kudqvnnf45czrnhwv9aevjq9hyjjpa409jvkqhkz32q`
- Admin workstation can decrypt (dan's age key)
**Pattern:**
```yaml
# secrets/secrets.yaml (encrypted)
matrix-registration-token: "..."
acme-email: "..."
slack-app-token: "..." # Future
slack-bot-token: "..." # Future
```
**Future Considerations:**
- Add engineer age keys for collaboration
- Per-project secrets (if needed)
- Secret rotation workflow
### Testing Strategy
**Current:** ops-jrz1-vm (VM testing before production)
**Workflow:**
1. Develop locally
2. Test in VM (`nixos-rebuild build-vm`)
3. Deploy to production (`nixos-rebuild switch`)
4. Rollback if issues (`nixos-rebuild switch --rollback`)
**Future:**
- Automated testing (unit, integration)
- Staging environment (if needed)
- Pre-deployment health checks
## Technical Stack
### Infrastructure
- **OS:** NixOS 24.05
- **Config Management:** Nix flakes
- **Secrets:** sops-nix with age encryption
- **Firewall:** iptables (nixos-fw)
- **Web Server:** nginx with ACME/Let's Encrypt
### Communication
- **Matrix Homeserver:** conduwuit 0.5.0-rc.8
- **Bridge Framework:** mautrix (Python-based)
- **Target Bridge:** mautrix-slack (Socket Mode)
### Development Platform
- **Git Server:** Forgejo 7.0.12
- **Database:** PostgreSQL 15.10
- **CI/CD:** Forgejo Actions (future consideration)
### Expected Project Stack (Flexible)
- Python bots (primary expectation)
- Node.js services (if needed)
- Go binaries (if needed)
- Any language with Nix packaging support
## Open Questions
### Communication Bridge
- Which Slack channels to bridge? (All? Specific list? On-demand?)
- User identity mapping: Slack display names or Matrix usernames?
- Bot integration needs: GitHub notifications? CI/CD status?
### Project Deployment
- Automated deployment on merge? Or manual trigger?
- Pull request workflow required? Or direct push to main?
- Health checks before deployment?
- Monitoring and alerting strategy?
### Team Collaboration
- How many engineers will actually join? (impacts scaling decisions)
- Shared development environments needed?
- Per-project Matrix rooms or one big room?
- Weekly syncs or async-only collaboration?
### Repository Organization
- Monorepo (ops-jrz1 + projects) or separate repos?
- Public vs private repositories?
- Who owns which repositories?
## Success Metrics
### Technical Success
- ✅ All services healthy and monitored
- ✅ Zero unplanned downtime
- ✅ Fast rollback capability (< 5 minutes)
- ✅ Clear audit trail (git history + NixOS generations)
### Team Success
- ✅ Engineers can deploy projects independently
- ✅ Onboarding time < 1 hour
- ✅ Documentation answers common questions
- ✅ Platform feels stable and trustworthy
### Project Success (Presentable State)
- ✅ Slack bridge works reliably
- ✅ Example project demonstrates the pattern
- ✅ Documentation is complete and clear
- ✅ At least one other engineer has successfully deployed
## Timeline
**Phase 1: Working Slack Bridge** (1-2 focused sessions)
- Update workspace configuration
- Slack app setup and credential management
- Debug and validate bidirectional messaging
**Phase 2: Project Pattern** (1-2 sessions after Phase 1)
- Create example bot
- Document deployment pattern
- Establish template repository
**Phase 3: Documentation** (1 session)
- Architecture documentation
- Onboarding guide
- Deployment runbook
**Phase 4: Team Onboarding** (1 session per engineer)
- Invite engineers
- Supervised first deployment
- Gather feedback and iterate
**Target:** Presentable state within 4-8 focused work sessions
**Constraint:** Not pressing, quality over speed
## References
### Internal Documentation
- [Security Test Report](worklogs/2025-10-22-security-validation-test-report.md) - Generation 31 validation
- [Deployment Log](worklogs/2025-10-22-deployment-generation-31.md) - Initial deployment
- [Forgejo Setup](worklogs/2025-10-22-forgejo-repository-setup.org) - Git server configuration
### External Resources
- [Mautrix Bridges Documentation](https://docs.mau.fi/)
- [NixOS Manual](https://nixos.org/manual/nixos/stable/)
- [Forgejo Documentation](https://forgejo.org/docs/)
- [Matrix Specification](https://spec.matrix.org/)
## Revision History
- **2025-10-22:** Initial vision document created after brainstorming session
- Defined presentable MVP criteria
- Established three-milestone roadmap
- Documented architectural principles
- Identified open questions for iteration

View file

@ -0,0 +1,237 @@
# Spec-Kit Integration for ops-jrz1
**Purpose:** How to use the spec-kit framework for structured feature development
## What is Spec-Kit?
Spec-kit is a feature development framework that provides structured planning and execution. It's designed to:
- Force clear thinking before coding
- Document decisions and rationale
- Create actionable task lists
- Provide quality checklists
- Track progress systematically
## Current Spec-Kit Usage
**Existing Feature:**
- `specs/001-extract-matrix-platform/` - The feature that established this platform
**Structure:**
```
specs/001-extract-matrix-platform/
├── spec.md # What we're building and why
├── plan.md # How we'll build it
├── tasks.md # Specific actionable tasks
├── analysis.md # Technical analysis and decisions
├── research.md # Background research
├── data-model.md # Data structures and schemas
├── quickstart.md # Quick reference guide
├── checklists/ # Validation checklists
└── contracts/ # Interface contracts
```
## Using Spec-Kit for New Features
### When to Create a Spec
**✅ Good candidates for spec-kit:**
- Complex features with multiple components (e.g., Slack bridge integration)
- Features that affect architecture or patterns
- Features requiring team coordination
- Features with unclear requirements
**❌ Not worth spec-kit overhead:**
- Bug fixes
- Minor config changes
- Documentation updates
- Quick experiments
### Workflow
1. **Create spec:** `/speckit.specify` - Describe what you want to build
2. **Plan implementation:** `/speckit.plan` - Break down into design steps
3. **Generate tasks:** `/speckit.tasks` - Create actionable task list
4. **Implement:** `/speckit.implement` - Execute tasks systematically
5. **Validate:** Use checklists to verify completion
## Recommended: Slack Bridge Feature Spec
Given our north star goal (Milestone 1: Working Slack Bridge), this is a **perfect candidate** for spec-kit.
### Why Use Spec-Kit for Slack Bridge?
**Complexity factors:**
- Requires external service integration (Slack API)
- Needs secrets management coordination
- Involves debugging unknown exit code 11
- Has architectural implications (bridge pattern for future features)
- Requires documentation for team onboarding
**Benefits:**
- Clear specification prevents scope creep
- Plan documents decision rationale
- Tasks provide clear progress tracking
- Checklists ensure quality (security, testing, docs)
- Future engineers can understand the design
### Proposed Feature: `002-slack-bridge-integration`
**Spec outline:**
```markdown
# Feature: Matrix-Slack Bridge Integration
## Goal
Enable bidirectional communication between Slack (chochacho workspace)
and Matrix homeserver to unify team communication.
## Background
- Team currently uses Slack as primary communication
- Matrix homeserver operational on clarun.xyz
- mautrix-slack module exists but exits with code 11
- Existing Slack bot needs reauthorization with updated scopes
- Need Socket Mode for reliable connection
## Success Criteria
- Send message in Slack → appears in Matrix
- Send message in Matrix → appears in Slack
- Bridge survives server restart
- Clear documentation for adding/removing channels
- Secrets properly managed via sops-nix
## Non-Goals
- WhatsApp bridge (future)
- Google Messages bridge (future)
- Multi-workspace support (future)
```
## Next Steps
1. **Start the spec process:**
```bash
# Create new feature spec
# Use /speckit.specify command or create directory manually
mkdir -p specs/002-slack-bridge-integration
```
2. **Write initial spec.md:**
- What: Slack bridge integration
- Why: Team communication unification
- Success criteria: Bidirectional messaging
- Constraints: Socket Mode, sops-nix secrets
3. **Run planning:**
```bash
# Generate implementation plan
# Use /speckit.plan command
```
4. **Generate tasks:**
```bash
# Create actionable task list
# Use /speckit.tasks command
```
5. **Execute systematically:**
- Work through tasks in order
- Document blockers and decisions
- Update worklog as you go
## Important Context: Existing Slack Bot
**Key Information from User:**
> "We have access to a bot, we need to have a manager reauthorize
> the bot because we need to change something or other, redo the
> scopes maybe, change it to socket based"
**Action Items:**
1. Identify existing Slack bot name/app
2. Document current scopes/permissions
3. Determine required scopes for mautrix-slack bridge
4. Request manager to reauthorize with new scopes
5. Enable Socket Mode in Slack app settings
6. Extract bot token and app token for sops-nix
**Socket Mode Benefits:**
- No public webhook URL needed
- More reliable than polling
- Better for localhost-based bridges
- Recommended by mautrix-slack documentation
## Coordination with Platform Vision
The Slack bridge spec should reference and align with:
- `docs/platform-vision.md` - Overall platform goals
- Milestone 1: Working Slack Bridge
- Architecture Principles → Communication Layer
## Checklist Template for Bridge Features
```markdown
# Bridge Integration Checklist
## Configuration
- [ ] External service credentials obtained
- [ ] Secrets added to sops-nix
- [ ] NixOS module configuration updated
- [ ] Database schema initialized
## Testing
- [ ] Service starts without errors
- [ ] Bidirectional messaging works
- [ ] Bridge survives restart
- [ ] Error handling tested
## Documentation
- [ ] Setup documented in worklog
- [ ] Credentials management documented
- [ ] Troubleshooting guide created
- [ ] Architecture diagram updated
## Security
- [ ] Secrets encrypted and committed
- [ ] Permissions scoped appropriately
- [ ] Network isolation verified
- [ ] Audit logging enabled
```
## Questions to Answer in Spec
When creating `002-slack-bridge-integration`, make sure to address:
1. **Scope:** Which channels to bridge? (all, specific list, on-demand)
2. **Identity:** How to map Slack users to Matrix users?
3. **Permissions:** Who can manage bridge? (admin-only vs engineer-accessible)
4. **Failure Modes:** What happens if Slack is down? Bridge crashes?
5. **Monitoring:** How to know if bridge is healthy?
6. **Scaling:** Future multi-workspace support?
## Integration with Worklogs
**Relationship:**
- Spec-kit specs are **planning documents** (what and how)
- Worklogs are **execution records** (what happened and why)
**Workflow:**
1. Create spec for the feature (e.g., `002-slack-bridge-integration`)
2. Work on implementation
3. Write worklogs documenting sessions (e.g., `2025-10-23-slack-bridge-setup.org`)
4. Update spec with learnings (if design changes)
**Cross-references:**
- Specs should link to worklogs for implementation details
- Worklogs should reference specs for context
- Platform vision should reference both
## Conclusion
For the Slack bridge work (Milestone 1 of our platform vision), I recommend:
**✅ Use spec-kit:** Create `specs/002-slack-bridge-integration/`
- Complex enough to warrant structured planning
- Clear bounded feature
- Important architectural precedent
- Needs team coordination
**Next Command:** `/speckit.specify` to start the spec process
This will ensure we build it right, document decisions, and make it easy for future engineers to understand and extend.