ops-jrz1/specs/001-extract-matrix-platform/data-model.md
Dan 894e7241f1 Initialize ops-jrz1 repository with Matrix platform extraction foundation
- Add speckit workflow infrastructure (.claude, .specify)
- Create NixOS configuration skeleton (flake.nix, configuration.nix, hosts/ops-jrz1.nix)
- Add sanitization scripts with 22 rules for personal info removal
- Add validation scripts with gitleaks integration
- Configure git hooks (pre-commit, pre-push) for security validation
- Add project documentation (README, LICENSE)
- Add comprehensive .gitignore for Nix, secrets, staging

Phase 1 and Phase 2 complete. Foundation ready for module extraction from ops-base.
2025-10-13 13:37:17 -07:00

587 lines
19 KiB
Markdown

# Data Model: Extract Matrix Platform Modules
**Date**: 2025-10-11
**Feature**: Extract Matrix Platform Modules as Public Template
## Overview
This document defines the key entities, their attributes, relationships, and lifecycle for the Matrix platform template extraction project. These entities represent both the artifacts being created and the processes managing them.
---
## Entity Diagram
```
┌─────────────────┐
│ ops-base │
│ Repository │
│ (source repo) │
└────────┬────────┘
│ contains
├────────────────┐
│ │
▼ ▼
┌────────┐ ┌──────────┐
│ Module │ │ Worklog │
└───┬────┘ └────┬─────┘
│ │
│ sanitized │ extracted
│ via │ into
│ │
▼ ▼
┌───────────────┐ ┌──────────────┐
│ Sanitization │ │ Pattern │
│ Rule │ │ Document │
└───────┬───────┘ └──────┬───────┘
│ │
│ applied to │ included in
│ │
▼ ▼
┌─────────────────────────────────┐
│ nixos-matrix-platform-template │
│ (output repo) │
└──────────────┬──────────────────┘
│ contains
├──────────┬──────────────┬─────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌────────────┐ ┌────────┐ ┌─────────┐
│ Module │ │Configuration│ │ Secret │ │ Pattern │
│(clean) │ │ (example) │ │Template│ │ Doc │
└────────┘ └────────────┘ └────────┘ └─────────┘
│ imports
┌────────┐
│ Module │
└────────┘
```
---
## Core Entities
### 1. Module
**Description**: A reusable NixOS module file providing service configuration and systemd unit definitions.
**Attributes**:
- `name`: string (e.g., "matrix-continuwuity", "mautrix-slack")
- `filepath`: absolute path (e.g., "modules/matrix-continuwuity.nix")
- `line_count`: integer (size metric)
- `service_name`: string (systemd service identifier)
- `dependencies`: array of module names (other modules it depends on)
- `options`: array of configurable options exposed
- `state`: enum [ops-base, staging, sanitized, published]
- `last_modified`: timestamp
- `source_commit`: string (ops-base git commit hash)
**Lifecycle States**:
1. **ops-base**: Original module in private repository
2. **staging**: Copied to sanitization workspace
3. **sanitized**: Processed by sanitization rules, validated
4. **published**: Committed to template repository
**Validation Rules**:
- Must build successfully with `nix flake check`
- Must contain no secrets (gitleaks scan passes)
- All personal domains/IPs replaced
- All paths sanitized
- Options documented with descriptions
**Example**:
```nix
# modules/matrix-continuwuity.nix (sanitized state)
{
name = "matrix-continuwuity"
filepath = "modules/matrix-continuwuity.nix"
line_count = 319
service_name = "continuwuity"
dependencies = ["sops-nix"]
options = ["domain", "port", "enableRegistration", "enableFederation"]
state = "sanitized"
source_commit = "abc123def456"
}
```
**Relationships**:
- **Contains**: Options (1-to-many)
- **Depends on**: Other Modules (many-to-many)
- **Imported by**: Configurations (many-to-many)
- **Created from**: ops-base Module (1-to-1)
---
### 2. Configuration
**Description**: An example deployment configuration file that imports modules and sets options for specific use cases.
**Attributes**:
- `name`: string (e.g., "example-vps", "example-dev")
- `filepath`: absolute path (e.g., "configurations/example-vps.nix")
- `use_case`: string (description of deployment scenario)
- `modules_imported`: array of module names
- `secrets_required`: array of secret names
- `network_config`: enum [public-vps, private-lan, hybrid]
- `state`: enum [draft, validated, documented]
**Lifecycle States**:
1. **draft**: Initial configuration created
2. **validated**: Passes `nix flake check` and builds successfully
3. **documented**: Referenced in getting-started.md or deployment.md
**Validation Rules**:
- Must import at least one module
- Must build successfully as NixOS configuration
- All required secrets must be documented
- Must include comments explaining key options
- Must use example domains/IPs only
**Example**:
```nix
# configurations/example-vps.nix
{
name = "example-vps"
filepath = "configurations/example-vps.nix"
use_case = "Production VPS deployment with Matrix + Forgejo + Slack bridge"
modules_imported = ["matrix-continuwuity", "dev-services", "fail2ban", "ssh-hardening"]
secrets_required = ["matrix-registration-token", "acme-email", "slack-oauth-token"]
network_config = "public-vps"
state = "validated"
}
```
**Relationships**:
- **Imports**: Modules (many-to-many)
- **Requires**: Secrets (many-to-many)
- **Documented in**: Pattern Documents (many-to-many)
---
### 3. Secret
**Description**: Sensitive data (tokens, passwords, keys) managed via sops-nix encryption.
**Attributes**:
- `name`: string (e.g., "matrix-registration-token", "acme-email")
- `type`: enum [token, password, key, email, certificate]
- `storage_path`: string (e.g., "secrets/secrets.yaml")
- `sops_key`: string (YAML path in secrets file)
- `required_by`: array of module/configuration names
- `example_value`: string (placeholder for templates)
- `generation_method`: string (how to generate the secret)
- `rotation_frequency`: enum [never, yearly, quarterly, monthly, as-needed]
**Lifecycle States**:
1. **templated**: Placeholder in secrets.yaml.example
2. **generated**: User creates actual secret value
3. **encrypted**: sops-nix encrypts with age key
4. **deployed**: Secret accessible to systemd service
**Validation Rules**:
- Must never appear in plaintext in git history
- Must have corresponding entry in secrets.yaml.example
- Must have generation instructions in docs/secrets-management.md
- Must specify which services require access
**Example**:
```yaml
# secrets/secrets.yaml (encrypted state)
matrix:
registration_token: ENC[AES256_GCM,data:...,iv:...,tag:...,type:str]
# secrets/secrets.yaml.example (templated state)
matrix:
registration_token: "GENERATE_WITH_openssl_rand_hex_32"
```
**Relationships**:
- **Required by**: Modules (many-to-many)
- **Required by**: Configurations (many-to-many)
- **Encrypted in**: secrets.yaml file (1-to-1)
- **Documented in**: secrets-management.md (1-to-1)
---
### 4. Sanitization Rule
**Description**: A find/replace or validation rule that ensures personal information is removed from files.
**Attributes**:
- `id`: integer (unique identifier)
- `pattern_type`: enum [domain, ip_address, path, username, hostname, secret_pattern]
- `pattern`: string or regex (what to search for)
- `replacement`: string (what to replace with)
- `applies_to`: enum [code, docs, comments, all]
- `validation_method`: enum [grep, gitleaks, regex, manual]
- `priority`: enum [critical, high, medium, low]
**Lifecycle States**:
1. **defined**: Rule created in contracts/sanitization-rules.yaml
2. **automated**: Implemented in scripts/sanitize-files.sh
3. **applied**: Executed on files during sanitization
4. **verified**: Validation confirms no matches remain
**Validation Rules**:
- Pattern must be specific enough to avoid false positives
- Replacement must be valid for context (e.g., valid domain format)
- Must specify which file types/patterns to apply to
- Must have corresponding validation step
**Example**:
```yaml
# contracts/sanitization-rules.yaml
- id: 1
pattern_type: domain
pattern: "clarun\\.xyz"
replacement: "example.com"
applies_to: all
validation_method: grep
priority: critical
- id: 2
pattern_type: ip_address
pattern: "192\\.168\\.1\\.(\\d+)"
replacement: "10.0.0.\\1"
applies_to: code
validation_method: regex
priority: critical
- id: 3
pattern_type: secret_pattern
pattern: "syt_[a-zA-Z0-9]{24}" # Matrix access token
replacement: null # Should not exist, validation only
applies_to: all
validation_method: gitleaks
priority: critical
```
**Relationships**:
- **Applied to**: Modules (many-to-many)
- **Applied to**: Configurations (many-to-many)
- **Applied to**: Pattern Documents (many-to-many)
- **Validated by**: CI/CD pipeline (many-to-1)
---
### 5. Pattern Document
**Description**: Extracted architectural knowledge from worklogs explaining proven implementation approaches.
**Attributes**:
- `title`: string (e.g., "Socket Mode Authentication Pattern")
- `filepath`: absolute path (e.g., "docs/patterns/config-generation.md")
- `category`: enum [pattern, bridge-setup, architecture, deployment, secrets]
- `source_worklogs`: array of worklog filenames
- `modules_referenced`: array of module names
- `prerequisites`: array of strings (what user needs before reading)
- `difficulty`: enum [beginner, intermediate, advanced]
- `word_count`: integer (size metric)
- `last_updated`: timestamp
**Lifecycle States**:
1. **extracted**: Content pulled from worklogs
2. **sanitized**: Personal context removed
3. **structured**: Organized into documentation template
4. **reviewed**: Technical accuracy verified
5. **published**: Committed to template repository
**Validation Rules**:
- Must reference specific modules or code examples
- Must contain actionable steps or explanations
- Must be sanitized (no personal infrastructure references)
- Must link to related documents
- Must include code examples if applicable
**Example**:
```markdown
# docs/patterns/config-generation.md
{
title = "Runtime Configuration Generation Pattern"
filepath = "docs/patterns/config-generation.md"
category = "pattern"
source_worklogs = ["mautrix-slack-bridge-implementation-gmessages-pattern.org"]
modules_referenced = ["mautrix-slack", "mautrix-gmessages"]
prerequisites = ["Understanding of systemd ExecStartPre", "Basic Python knowledge"]
difficulty = "intermediate"
word_count = 800
}
```
**Relationships**:
- **Extracted from**: Worklogs (many-to-1)
- **References**: Modules (many-to-many)
- **Links to**: Other Pattern Documents (many-to-many)
- **Included in**: Template repository (many-to-1)
---
### 6. Sync Checkpoint
**Description**: A record of what changes have been synced from ops-base to template at a specific point in time.
**Attributes**:
- `sync_date`: date (e.g., "2025-10-11")
- `ops_base_commit`: string (git commit hash)
- `template_commit`: string (git commit hash)
- `changes_synced`: array of strings (descriptions)
- `changes_skipped`: array of strings (descriptions with reasons)
- `sync_tag`: string (e.g., "sync-20251011-abc123")
- `synced_by`: string (maintainer name)
- `validation_passed`: boolean (all checks passed)
**Lifecycle States**:
1. **identified**: Changes in ops-base identified for sync
2. **sanitized**: Changes sanitized for template
3. **validated**: CI checks passed
4. **recorded**: Entry added to sync-log.md
5. **tagged**: Git tag created
**Validation Rules**:
- Must reference specific ops-base commit
- Must list all changes (synced and skipped)
- Must pass CI validation before recording
- Must create corresponding git tag
- Must update sync-log.md
**Example**:
```markdown
# sync-log.md entry
## Sync 2025-10-11 (ops-base: abc123)
{
sync_date = "2025-10-11"
ops_base_commit = "abc123def456"
template_commit = "def789abc012"
changes_synced = [
"[BUGFIX] Matrix registration token validation",
"[FEATURE] WhatsApp bridge reconnection logic",
"[SECURITY] sops-nix v0.16.0 upgrade"
]
changes_skipped = [
"Personal config changes in comm-talu-uno.nix (not applicable to template)"
]
sync_tag = "sync-20251011-abc123"
synced_by = "maintainer"
validation_passed = true
}
```
**Relationships**:
- **References**: ops-base commit (1-to-1)
- **Creates**: Template commit (1-to-1)
- **Documents**: Module changes (1-to-many)
- **Recorded in**: sync-log.md (many-to-1)
---
## Supporting Entities
### 7. Bridge Setup Guide
**Description**: Step-by-step documentation for configuring specific Matrix bridges with authentication and registration.
**Attributes**:
- `bridge_name`: string (e.g., "slack", "whatsapp", "gmessages")
- `filepath`: absolute path (e.g., "docs/bridges/slack-setup.md")
- `auth_method`: string (e.g., "Socket Mode OAuth", "QR code pairing")
- `prerequisites`: array of strings
- `estimated_time`: integer (minutes)
- `difficulty`: enum [easy, medium, hard]
- `common_issues`: array of strings (troubleshooting)
**Lifecycle**: Same as Pattern Document
**Example**:
```markdown
# docs/bridges/slack-setup.md
{
bridge_name = "slack"
filepath = "docs/bridges/slack-setup.md"
auth_method = "Socket Mode with App-Level Token"
prerequisites = ["Slack workspace admin access", "Matrix homeserver running"]
estimated_time = 15
difficulty = "medium"
common_issues = [
"Missing OAuth scopes",
"Socket Mode not enabled",
"App-level token vs bot token confusion"
]
}
```
**Relationships**:
- **Documents**: Module (1-to-1)
- **References**: Pattern Documents (many-to-many)
- **Included in**: Template repository (many-to-1)
---
### 8. CI/CD Pipeline
**Description**: Automated validation workflow that runs on every commit/PR.
**Attributes**:
- `name`: string (e.g., "GitHub Actions CI")
- `config_file`: absolute path (".github/workflows/ci.yml")
- `triggers`: array of events (e.g., ["push", "pull_request"])
- `jobs`: array of job names (e.g., ["validate", "security"])
- `validation_steps`: array of strings (what gets checked)
- `failure_action`: enum [block-merge, warn, notify]
- `average_duration`: integer (seconds)
**Validation Steps**:
```yaml
validate:
- nix flake check --all-systems
- nix build .#nixosConfigurations.example-vps
- nix build .#nixosConfigurations.example-dev
security:
- gitleaks detect (full repo scan)
- Check for personal domain/IP patterns
```
**Relationships**:
- **Validates**: Modules (1-to-many)
- **Validates**: Configurations (1-to-many)
- **Enforces**: Sanitization Rules (1-to-many)
- **Reports to**: GitHub PR status (1-to-1)
---
## State Transitions
### Module Lifecycle
```
ops-base (source)
↓ [copy to staging]
staging
↓ [apply sanitization rules]
sanitized
↓ [run validations: nix flake check, gitleaks]
validated
↓ [commit to template repo]
published
↓ [user deploys]
deployed
```
### Sync Checkpoint Lifecycle
```
identified (changes in ops-base)
↓ [review changes]
prioritized (classify: sync vs skip)
↓ [apply sanitization]
sanitized
↓ [run CI validation]
validated
↓ [update sync-log.md, create tag]
recorded
↓ [push to template repo]
published
```
### Secret Lifecycle (user perspective)
```
templated (secrets.yaml.example)
↓ [user copies and edits]
generated (plain yaml)
↓ [sops -e -i secrets.yaml]
encrypted (sops-managed)
↓ [nixos-rebuild deploy]
deployed (/run/secrets/*)
↓ [systemd LoadCredential]
accessible (service can read)
```
---
## Validation Matrix
| Entity Type | Validation Method | Tool | Frequency |
|-------------|------------------|------|-----------|
| Module | Syntax check | `nix flake check` | Every commit |
| Module | Secret scan | `gitleaks` | Every commit |
| Module | Build test | `nix build` | Every commit |
| Module | Manual review | Human checklist | Pre-publication |
| Configuration | Syntax check | `nix flake check` | Every commit |
| Configuration | Build test | `nix build` | Every commit |
| Configuration | Deploy test | VPS integration | Pre-v1.0 |
| Secret | Leakage check | `gitleaks` | Every commit |
| Secret | Template check | Manual review | Pre-publication |
| Sanitization Rule | Pattern match | `grep` / `rg` | After sanitization |
| Sanitization Rule | No false positives | Manual review | Pre-publication |
| Pattern Document | Accuracy | Manual review vs code | Pre-publication |
| Pattern Document | Sanitization | `grep` for personal refs | Pre-publication |
| Sync Checkpoint | Build pass | `nix flake check` | Every sync |
| Sync Checkpoint | No secrets | `gitleaks` | Every sync |
---
## Cardinality Summary
```
ops-base Repository (1)
└── contains Modules (8-10)
└── contains Worklogs (20+)
nixos-matrix-platform-template Repository (1)
├── contains Modules (8-10, sanitized copies)
├── contains Configurations (2)
├── contains Secret Templates (5-10)
├── contains Pattern Documents (10-15)
├── contains Bridge Setup Guides (3)
└── tracked by Sync Checkpoints (4+ per year)
Sanitization Rules (20-30)
└── applied to all template artifacts
CI/CD Pipeline (1)
└── validates all changes (∞ runs)
```
---
## Key Relationships
1. **Module ← depends on → Module**: Dependency graph (e.g., dev-services depends on matrix-continuwuity)
2. **Configuration → imports → Module**: Many-to-many (configs can import multiple modules)
3. **Secret ← required by → Module**: Many-to-many (modules can require multiple secrets)
4. **Sanitization Rule → applied to → Module**: Many-to-many (rules apply to multiple files)
5. **Pattern Document ← extracted from → Worklog**: Many-to-one (multiple docs from one worklog)
6. **Sync Checkpoint → references → Module**: One-to-many (sync updates multiple modules)
---
## Data Storage
### Git Repository (primary storage)
- **Location**: GitHub
- **Access**: Public (template), private (ops-base)
- **Versioning**: Git commits, tags
- **Backup**: GitHub infrastructure + local clones
### Filesystem (workspace)
- **staging/**: Temporary sanitization workspace
- **ops-base/**: Source repository (permanent)
- **nixos-matrix-platform-template/**: Output repository (permanent)
### YAML Files (structured data)
- **contracts/sanitization-rules.yaml**: Sanitization rules
- **contracts/ci-validation.yaml**: CI validation requirements
- **secrets/secrets.yaml.example**: Secret templates
- **sync-log.md**: Sync checkpoint records
---
## Next Steps
This data model will be used to:
1. Generate contracts (sanitization rules, CI validation)
2. Structure task breakdown (what artifacts to create)
3. Define acceptance criteria (validation for each entity)
4. Guide implementation (what to build in what order)