bd sync: 2026-01-02 20:25:43
This commit is contained in:
parent
40b5bf43a9
commit
66f609f10d
|
|
@ -1,8 +1,10 @@
|
|||
{"id":"ops-jrz1-00e","title":"Upgrade NixOS from 24.05 to 24.11","description":"Running NixOS 24.05.20241230 (Uakari). Current stable is 24.11. May be missing security patches. Low priority as no known critical CVEs, but should plan upgrade.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-04T21:03:22.760228514-08:00","updated_at":"2025-12-04T21:04:35.805980055-08:00","comments":[{"id":1,"issue_id":"ops-jrz1-00e","author":"dan","text":"Analysis Findings:\n1. Version Mismatch: Local flake.nix is pinned to 'nixos-24.05', but the dev environment reports '25.11' (Unstable), indicating state divergence.\n2. Upstream Bugs: Blocking issues in mautrix-slack (ops-jrz1-blh) and maubot (sync failure) are present in the current unstable revision (2025-12-02).\n3. Recommendation: Upgrade platform to NixOS 24.11 (Stable) to align environment, ensure stability, and pull fresh upstream fixes.","created_at":"2025-12-08T23:54:57Z"}]}
|
||||
{"id":"ops-jrz1-03o","title":"Upgrade mautrix-slack to v25.11","description":"Upgrade is just flake update + deploy. Current deployed: v0.2.3+dev.unknown (Oct 13). Flake lock: v25.10 (Oct 22). Latest nixpkgs-unstable: v25.11. Run: nix flake update nixpkgs-unstable \u0026\u0026 deploy. May fix edit panic (ops-jrz1-qxr).","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T18:24:18.332067067-08:00","updated_at":"2025-12-05T19:07:09.156981447-08:00","closed_at":"2025-12-05T19:07:09.156981447-08:00"}
|
||||
{"id":"ops-jrz1-1bk","title":"Add CPU watchdog timer","description":"Systemd timer that detects sustained CPU abuse and kills offending user.\n\n## Script: /usr/local/bin/cpu-watchdog\n```bash\n#\\!/usr/bin/env bash\n# Detect sustained CPU abuse, kill after 5 consecutive violations\nTHRESHOLD=180 # 180% CPU (almost 2 cores)\nCOUNTFILE=\"/var/lib/cpu-watchdog\"\nmkdir -p \"$COUNTFILE\"\n\nfor user in $(ls /home); do\n id \"$user\" \u0026\u003e/dev/null || continue\n pct=$(ps -u \"$user\" -o %cpu= 2\u003e/dev/null | awk '{s+=$1}END{print int(s)}')\n pct=${pct:-0}\n \n if [ \"$pct\" -gt \"$THRESHOLD\" ]; then\n count=$(cat \"$COUNTFILE/$user\" 2\u003e/dev/null || echo 0)\n count=$((count + 1))\n echo \"$count\" \u003e \"$COUNTFILE/$user\"\n logger -t cpu-watchdog \"User $user at ${pct}% CPU (strike $count/5)\"\n \n if [ \"$count\" -ge 5 ]; then\n /usr/local/bin/killswitch \"$user\" \"sustained CPU abuse (${pct}%)\"\n rm -f \"$COUNTFILE/$user\"\n fi\n else\n rm -f \"$COUNTFILE/$user\"\n fi\ndone\n```\n\n## Systemd timer\n```nix\nsystemd.services.cpu-watchdog = {\n script = ''/usr/local/bin/cpu-watchdog'';\n serviceConfig.Type = \"oneshot\";\n};\nsystemd.timers.cpu-watchdog = {\n wantedBy = [ \"timers.target\" ];\n timerConfig = {\n OnBootSec = \"1min\";\n OnUnitActiveSec = \"1min\";\n };\n};\n```\n\n## Behavior\n- Runs every minute\n- 5 consecutive minutes at \u003e180% CPU = kill\n- Resets counter if CPU drops below threshold","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-02T20:20:53.246401154-08:00","created_by":"dan","updated_at":"2026-01-02T20:20:53.246401154-08:00","dependencies":[{"issue_id":"ops-jrz1-1bk","depends_on_id":"ops-jrz1-396","type":"blocks","created_at":"2026-01-02T20:21:14.270063028-08:00","created_by":"dan"}]}
|
||||
{"id":"ops-jrz1-2bu","title":"Direct Slack bot path for learners","description":"Alternative path: learners write Python bots using slack-bolt, connect directly to Slack via Socket Mode. No Matrix, no bridge.\n\n## Architecture\n```\nlearner code → slack-bolt → Socket Mode WebSocket → Slack API\n```\n\n## Status\n\n**Done:**\n- [x] /etc/slack-learner.env with shared tokens (xoxb-, xapp-)\n- [x] learners group for access control (dantest is member)\n- [x] learner-add.sh adds users to group, sources env in .bashrc\n- [x] Design doc: docs/learner-slack-direct.md\n\n**Not Done:**\n- [ ] Starter template (~/slack-bot-template/)\n- [ ] Process management (systemd user services or supervisor)\n- [ ] #learner-sandbox channel in Slack\n- [ ] End-to-end test with real learner\n\n## Tradeoffs vs Maubot/Matrix (ops-jrz1-2pm)\n- Faster feedback (direct to Slack)\n- Excellent slack-bolt docs\n- But: shared bot identity, manual process management\n\n## Ready to Use NOW\nWorks today with terminal editors (vim/nano):\n```bash\nssh alice@ops-jrz1\npip install slack-bolt\npython bot.py # responds in Slack\n```\n\nVS Code Remote-SSH needs nix-ld deployed first.","status":"open","priority":2,"issue_type":"epic","created_at":"2025-12-29T18:56:10.239324326-05:00","created_by":"dan","updated_at":"2026-01-02T10:04:58.786306917-08:00"}
|
||||
{"id":"ops-jrz1-2pm","title":"Remote dev environment for learners","description":"Set up dev environments for learners to build maubot plugins (Matrix bots that can bridge to Slack).\n\n## Approach\nVS Code Remote-SSH + shared maubot + per-user Unix accounts\n\n## Architecture\n```\nlearner code → maubot → Matrix → mautrix-slack bridge → Slack\n```\n\n## Status\n\n**Done:**\n- [x] learner-add.sh / learner-remove.sh scripts\n- [x] Hello-world plugin template (templates/plugin-skeleton/)\n- [x] Test user `dantest` created with ~/plugins/hello-dantest/\n- [x] Maubot running and healthy\n\n**Not Done:**\n- [ ] nix-ld for VS Code Remote-SSH (config added, not deployed)\n- [ ] Test full VS Code Remote-SSH flow\n- [ ] Test Claude Code extension over Remote-SSH\n- [ ] #learners-sandbox Matrix room\n- [ ] Onboarding doc polish\n\n## Tradeoffs vs Direct Slack (ops-jrz1-2bu)\n- Slower feedback (bridge hop)\n- Sparse maubot docs\n- But: managed process lifecycle, per-bot identity\n\n## Docs\n- docs/learner-onboarding.md\n- docs/learner-admin.md","status":"open","priority":2,"issue_type":"epic","created_at":"2025-12-28T10:13:21.90764918-05:00","created_by":"dan","updated_at":"2026-01-02T10:04:58.472361796-08:00"}
|
||||
{"id":"ops-jrz1-30e","title":"Add gastown (gt) to system packages via flake input","description":"Add gastown CLI (gt) as a system-wide package.\n\n## What is gastown?\nGastown (gt) is the orchestration layer to beads' memory layer. Together they form a workflow for AI-supervised coding.\n\n## Status\n**BLOCKED**: No releases available yet. Repo uses GoReleaser but no tags/releases published.\n- Requires Go 1.24 to build from source\n- Once releases exist, add like beads: flake input + systemPackages\n\n## Source\ngithub:steveyegge/gastown\n\n## Workaround\nUsers can build locally: `go install github.com/steveyegge/gastown@latest`","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-02T16:37:47.093900357-08:00","created_by":"dan","updated_at":"2026-01-02T19:08:28.742017652-08:00","closed_at":"2026-01-02T19:08:28.742017652-08:00","close_reason":"Won't fix - not adding gastown to this server"}
|
||||
{"id":"ops-jrz1-396","title":"Add killswitch script","description":"Script to immediately terminate all processes for a user.\n\n## Script: /usr/local/bin/killswitch\n```bash\n#!/usr/bin/env bash\n# Usage: killswitch \u003cusername\u003e [reason]\nset -euo pipefail\nUSER=\"$1\"\nREASON=\"${2:-manual kill}\"\n\nif ! id \"$USER\" \u0026\u003e/dev/null; then\n echo \"User not found: $USER\" \u003e\u00262\n exit 1\nfi\n\nlogger -t killswitch \"Killing all processes for $USER: $REASON\"\npkill -u \"$USER\" || true\nloginctl terminate-user \"$USER\" 2\u003e/dev/null || true\necho \"Killed $USER: $REASON\"\n```\n\n## Usage\n```bash\n# Manual kill\nkillswitch dan \"investigating suspicious activity\"\n\n# From watchdog\nkillswitch dan \"sustained CPU abuse (250%)\"\n```\n\n## Notes\n- Logs to syslog with 'killswitch' tag\n- Terminates user session and all processes\n- Safe to run if user has no processes","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-02T20:19:35.306146948-08:00","created_by":"dan","updated_at":"2026-01-02T20:25:38.950670564-08:00","closed_at":"2026-01-02T20:25:38.950670564-08:00","close_reason":"Closed"}
|
||||
{"id":"ops-jrz1-3au","title":"Research: Learner deployment pipeline","description":"How does learner code get to \"prod\" (running services)?\n\n## Current context\n- Maubot: Upload .mbp via web UI\n- Slack bots: Manual `python bot.py` or systemd user service\n\n## Questions\n1. Can learners run persistent services? (`systemctl --user`)\n2. Should they have access to maubot admin UI?\n3. Git-based deploy? Push to trigger reload?\n4. Who can restart what?\n\n## Options\n- **Manual only** - Learner runs in foreground/tmux\n- **User systemd** - `systemctl --user enable mybot`\n- **Supervised** - Central supervisor manages learner procs\n- **GitOps** - Push to deploy (complex)\n\n## Security considerations\n- What if learner bot crashes in loop?\n- Resource limits on user services?\n- Can learner affect other learners' services?","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-02T12:27:34.107447487-08:00","created_by":"dan","updated_at":"2026-01-02T12:27:34.107447487-08:00"}
|
||||
{"id":"ops-jrz1-3b1","title":"Research: Agentic coder sandboxing","description":"When Claude Code runs on server, what can it do? Should we limit it?\n\n## Current state\n- No sandbox\n- Claude has full user privileges\n- Can run any command user can run\n\n## Risks\n- `rm -rf ~` (accidental or hallucinated)\n- Network exfiltration\n- Resource exhaustion (fork bomb, disk fill)\n- Credential theft from env/files\n\n## Options\n1. **Trust the agent** - User's problem if Claude breaks things\n2. **Command allowlist** - Only approved commands\n3. **Container sandbox** - Run agent in container\n4. **Snapshot/rollback** - Easy recovery if things break\n5. **Audit logging** - At least know what happened\n\n## Questions\n- What do other agentic coding setups do?\n- Is this overkill for a learning environment?\n- Does Claude Code have built-in safety?","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-02T12:27:33.705283658-08:00","created_by":"dan","updated_at":"2026-01-02T12:27:33.705283658-08:00"}
|
||||
{"id":"ops-jrz1-3ca","title":"Persist opencode state/cache across restarts","description":"opencode may store index/cache in ~/.cache or other dirs not covered by current bind mounts. AI context could be lost on container restart. Verify and add mounts.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:30.90315778-08:00","updated_at":"2025-12-28T00:05:44.753074955-05:00","closed_at":"2025-12-28T00:05:44.753074955-05:00","close_reason":"Parent epic cancelled - browser-based dev approach abandoned","dependencies":[{"issue_id":"ops-jrz1-3ca","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.247361009-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||
|
|
@ -19,6 +21,7 @@
|
|||
{"id":"ops-jrz1-6of","title":"AI cost/rate limiting per user","description":"One user could drain API credits with runaway script. Need rate limiting per user, either via proxy middleware or opencode config. Track usage.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:30.772304538-08:00","updated_at":"2025-12-05T17:42:42.773613559-08:00","closed_at":"2025-12-05T17:42:42.773613559-08:00","dependencies":[{"issue_id":"ops-jrz1-6of","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.206816868-08:00","created_by":"daemon","metadata":"{}"},{"issue_id":"ops-jrz1-6of","depends_on_id":"ops-jrz1-wj2","type":"blocks","created_at":"2025-12-05T17:17:38.658742196-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||
{"id":"ops-jrz1-7j4","title":"Git credential strategy for non-programmers","description":"Non-programmers can't manage SSH keys. Pre-configure git-credential-store or provide simple PAT workflow with docs. Store in persistent home with 600 perms.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T15:32:19.673999683-08:00","updated_at":"2025-12-05T17:38:54.788694408-08:00","closed_at":"2025-12-05T17:38:54.788694408-08:00","dependencies":[{"issue_id":"ops-jrz1-7j4","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.139749437-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||
{"id":"ops-jrz1-88o","title":"Implement backup strategy for VPS","description":"No backups configured. Critical data: Matrix DB (622M), PostgreSQL (161M), Forgejo (2.5M), maubot (320K). No recovery path if disk fails. Need automated backups with off-site storage.","status":"closed","priority":1,"issue_type":"task","created_at":"2025-12-04T22:55:25.546850172-08:00","updated_at":"2025-12-05T00:56:27.720623612-08:00","closed_at":"2025-12-05T00:56:27.720623612-08:00"}
|
||||
{"id":"ops-jrz1-8m7","title":"Add cgroups limits for user slices","description":"Add soft resource limits to prevent one user/agent from crashing server.\n\n## Config\n```nix\nsystemd.slices.\"user\".sliceConfig = {\n MemoryMax = \"80%\";\n TasksMax = 500;\n CPUWeight = 100; # Fair sharing, no hard quota\n};\n```\n\n## Behavior\n- Memory: Users collectively can't exceed 80% RAM\n- Tasks: Max 500 processes per user (prevents fork bombs)\n- CPU: Fair sharing when contended, bursts allowed\n\n## Testing\n- Verify with `systemctl show user-1001.slice`\n- Test fork bomb doesn't crash server","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-02T20:16:22.600133044-08:00","created_by":"dan","updated_at":"2026-01-02T20:16:22.600133044-08:00"}
|
||||
{"id":"ops-jrz1-9gd","title":"Upgrade VPS RAM for dev environments","description":"Current: 2GB. Need 4-8GB for multiple code-server containers. Coordinate with Vultr, plan maintenance window.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.267689439-08:00","updated_at":"2025-12-28T00:08:06.748175273-05:00","closed_at":"2025-12-28T00:08:06.748175273-05:00","close_reason":"Browser-based dev environment cancelled","dependencies":[{"issue_id":"ops-jrz1-9gd","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.331146543-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||
{"id":"ops-jrz1-9pe","title":"Research: System packages for learner accounts","description":"How do dev users get access to toolchains (Go, Node, Rust, etc.)?\n\n## Findings\n\n**Users CAN self-install packages:**\n```bash\nnix profile install nixpkgs#go\nnix profile install nixpkgs#nodejs\nnix profile install nixpkgs#rustc\n```\n\nPackages go to `~/.nix-profile/bin`, already in PATH. Works today.\n\n**Devshells work too:**\n```bash\n# In project with flake.nix\nnix develop\n```\n\n## Options\n\n| Option | Pros | Cons |\n|--------|------|------|\n| **Self-service only** | Minimal config, user learns nix | Cold start friction |\n| **Global defaults** | Zero friction for common tools | Bloats system, version conflicts |\n| **Starter script** | One command setup, customizable | Another thing to maintain |\n| **direnv + devshells** | Per-project envs, reproducible | Needs direnv installed globally |\n\n## Current State\n- `nix profile install` works for users ✅\n- `nix develop` works ✅\n- direnv NOT installed globally\n- Only python3, uv in system packages\n\n## Recommendation\n1. Add `direnv` to global packages (enables per-project devshells)\n2. Document `nix profile install` for quick one-offs\n3. Provide example flake.nix templates for Go, Node, Rust projects\n4. Keep system packages minimal (python3, uv, direnv, git, vim)\n\n## Test Results\n```\n$ nix profile install nixpkgs#go\n$ go version\ngo version go1.22.8 linux/amd64\n```","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-02T12:27:32.894163417-08:00","created_by":"dan","updated_at":"2026-01-02T12:32:32.502649201-08:00","closed_at":"2026-01-02T12:32:32.502649201-08:00","close_reason":"Users can self-install via nix profile. Added direnv globally for devshells."}
|
||||
{"id":"ops-jrz1-9x8","title":"Claude CLI update mechanism","description":"Claude Code CLI is manually installed to /usr/local/bin/claude.\n\n## Current state\n- Installed via: curl -fsSL https://claude.ai/install.sh | bash\n- Copied to /usr/local/bin/claude\n- No automatic updates\n\n## Options\n1. Periodic manual update (run install script again)\n2. Systemd timer to check for updates\n3. Package via nix (would need custom derivation)\n\n## Acceptance criteria\nDocument the update process at minimum.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-02T16:46:03.908575951-08:00","created_by":"dan","updated_at":"2026-01-02T16:46:03.908575951-08:00"}
|
||||
|
|
@ -29,6 +32,7 @@
|
|||
{"id":"ops-jrz1-bbn","title":"Research: Resource limits and quotas","description":"Should we limit CPU/memory/disk per learner?\n\n## Current state\n- No limits configured\n- Single VPS shared by all users\n- 2GB RAM, 1 vCPU (Vultr $6 tier?)\n\n## Options\n1. **No limits** - Trust learners, monitor manually\n2. **Systemd slices** - cgroups for user sessions\n3. **Disk quotas** - Limit ~/\n4. **ulimits** - Process limits\n\n## Questions\n- What resources does a typical dev session use?\n- What about `go build` or `npm install`?\n- Is this premature optimization?","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-02T12:27:34.884865507-08:00","created_by":"dan","updated_at":"2026-01-02T12:27:34.884865507-08:00"}
|
||||
{"id":"ops-jrz1-bhk","title":"Add disk quotas for user workspaces","description":"User could fill host disk via /var/lib/vscode/\u003cuser\u003e/. Add per-directory quotas or monitoring/alerting on disk usage.","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-05T15:32:41.199417226-08:00","updated_at":"2025-12-28T00:05:44.7635372-05:00","closed_at":"2025-12-28T00:05:44.7635372-05:00","close_reason":"Parent epic cancelled - browser-based dev approach abandoned","dependencies":[{"issue_id":"ops-jrz1-bhk","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:05:47.309592029-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||
{"id":"ops-jrz1-blh","title":"mautrix-slack edit panic persists in v25.11","description":"mautrix-slack panic on rapid message edits (race condition)\n\n**Root cause**: Edit event arrives before original message is stored in DB. ConvertEdit accesses nil metadata.\n\n**Location**: handleslack.go:575 - has TODO comment: 'this can panic?'\n\n**Reproduction**: Edit a Slack message within ~1 second of sending\n\n**Upstream status**: \n- v25.11 is latest (we're on it)\n- Known to devs (TODO in code)\n- No open issue filed yet\n\n**Stack trace**:\ngo.mau.fi/mautrix-slack/pkg/connector.(*SlackMessage).ConvertEdit\n handleslack.go:575\nmaunium.net/go/mautrix/bridgev2.(*Portal).handleRemoteEdit\n portal.go:2838","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-05T19:40:33.255395189-08:00","updated_at":"2025-12-28T00:06:14.637057055-05:00","closed_at":"2025-12-28T00:06:14.637057055-05:00","close_reason":"Duplicate of ops-jrz1-f15 which has fix ready","comments":[{"id":2,"issue_id":"ops-jrz1-blh","author":"dan","text":"Confirmed panic exists in nixpkgs-unstable from 2025-12-02. Fix will be addressed via platform upgrade (see ops-jrz1-00e).","created_at":"2025-12-08T23:54:57Z"}]}
|
||||
{"id":"ops-jrz1-cmv","title":"Add egress rate limiting (iptables)","description":"Hard limit outbound connections per user to prevent mass exfil/scanning.\n\n## Config\n```nix\nnetworking.firewall.extraCommands = ''\n # Rate limit new outbound connections for regular users (uid 1000+)\n iptables -A OUTPUT -m state --state NEW -m owner --uid-owner 1000:65534 \\\n -m limit --limit 30/min --limit-burst 60 -j ACCEPT\n iptables -A OUTPUT -m state --state NEW -m owner --uid-owner 1000:65534 \\\n -j LOG --log-prefix \"EGRESS-LIMIT: \"\n iptables -A OUTPUT -m state --state NEW -m owner --uid-owner 1000:65534 \\\n -j REJECT\n'';\n```\n\n## Behavior\n- 30 new connections/min sustained, burst of 60\n- Over limit: logged and rejected\n- Doesn't affect established connections\n\n## Testing\n- `for i in {1..100}; do curl -s ifconfig.me \u0026 done`\n- Should see EGRESS-LIMIT in journal after ~60","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-02T20:16:32.276607792-08:00","created_by":"dan","updated_at":"2026-01-02T20:16:32.276607792-08:00"}
|
||||
{"id":"ops-jrz1-d38","title":"Add tmux to system packages","description":"Add tmux for session persistence. Users can run bots in tmux, disconnect, reconnect.","status":"closed","priority":2,"issue_type":"task","created_at":"2026-01-02T15:13:58.514256583-08:00","created_by":"dan","updated_at":"2026-01-02T17:25:59.102158299-08:00","closed_at":"2026-01-02T17:25:59.102158299-08:00","close_reason":"Closed"}
|
||||
{"id":"ops-jrz1-d58","title":"Build custom code-server container image","description":"Dockerfile with: code-server, opencode CLI, opencode VS Code extension (Open VSX), Python, Node, Git. Push to registry or build locally.","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-05T17:16:54.507577308-08:00","updated_at":"2025-12-28T00:05:44.736614157-05:00","closed_at":"2025-12-28T00:05:44.736614157-05:00","close_reason":"Parent epic cancelled - browser-based dev approach abandoned","dependencies":[{"issue_id":"ops-jrz1-d58","depends_on_id":"ops-jrz1-3so","type":"parent-child","created_at":"2025-12-05T17:17:36.369590207-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||
{"id":"ops-jrz1-dg9","title":"Document pattern for adding dev tools to system","description":"Create documentation for the standard pattern of adding dev tools.\n\n## Pattern\n1. Add flake input (if not in nixpkgs)\n2. Add to environment.systemPackages\n3. Run nixos-rebuild switch\n4. Config stays per-user/per-repo\n\n## Document should cover\n- How to add a tool from nixpkgs\n- How to add a tool from external flake\n- How to package a tool not yet packaged\n- How to update a tool (flake lock update)\n\n## Location\ndocs/adding-dev-tools.md or similar","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-02T16:36:04.613581812-08:00","created_by":"dan","updated_at":"2026-01-02T16:36:04.613581812-08:00"}
|
||||
|
|
@ -53,7 +57,9 @@
|
|||
{"id":"ops-jrz1-nir","title":"RFC: SSH log noise reduction strategy","description":"Research showed 99.8% of SSH logs are scanner noise (9000 failed attempts/day). Options: (1) Change SSH port - simple, ~99% reduction (2) journald filter - surgical but complex (3) LogLevel ERROR - loses successful login audit trail (4) fail2ban - bans IPs, partial reduction. Orch consensus: Gemini opposed LogLevel ERROR due to losing audit trail, GPT supported. Need RFC to decide approach. See posture review from Dec 2025 session.","status":"open","priority":3,"issue_type":"task","created_at":"2025-12-04T22:55:13.990334935-08:00","updated_at":"2025-12-04T22:55:13.990334935-08:00"}
|
||||
{"id":"ops-jrz1-nvx","title":"Slack bot architecture: Matrix-first approach","description":"**Decision**: Use Matrix as primary platform for Slack bot development.\n\n**Architecture**: Bots run as maubot plugins (or Matrix bots), communicate to Slack via mautrix-slack bridge.\n\n**Rationale**:\n- Existing infrastructure (maubot deployed, bridge working)\n- Single platform to manage\n- Bots work with Matrix users too\n- Avoid Socket Mode contention (only one xapp- connection allowed)\n\n**Trade-offs accepted**:\n- Bridge dependency (edit panic bug exists)\n- Extra latency through bridge hop\n- Limited to bridged channels\n\n**Alternative considered (Option B - direct Slack API)**:\n- Could use xoxb- token for outbound-only (REST)\n- Would need new Slack app for full Socket Mode independence\n- Deferred for now\n\n**Credentials available**:\n- slack-oauth-token (xoxb-) - shareable for REST calls if needed\n- slack-app-token (xapp-) - reserved for bridge Socket Mode\n\n**Status**: DECIDED - staying with Matrix-first","status":"closed","priority":3,"issue_type":"task","created_at":"2025-12-05T23:12:22.011872713-08:00","updated_at":"2025-12-05T23:12:28.329467732-08:00","closed_at":"2025-12-05T23:12:28.329467732-08:00"}
|
||||
{"id":"ops-jrz1-nwv","title":"Package graphite CLI (gt) for NixOS","description":"Graphite CLI (gt) is not in nixpkgs. Need to package it.\n\n## Research needed\n- How is gt distributed? (npm, binary, go?)\n- Is there an existing nix package or flake?\n- If not, create minimal derivation\n\n## Options\n1. Find existing flake/overlay\n2. Use buildNpmPackage if it's npm-based\n3. Fetch pre-built binary\n\n## Once packaged\nAdd to system packages via flake input pattern (same as beads).","status":"closed","priority":3,"issue_type":"task","created_at":"2026-01-02T16:36:04.374192123-08:00","created_by":"dan","updated_at":"2026-01-02T16:37:46.981193033-08:00","closed_at":"2026-01-02T16:37:46.981193033-08:00","close_reason":"Wrong tool - gt is gastown, not graphite"}
|
||||
{"id":"ops-jrz1-p2d","title":"Add egress connection logging","description":"Log all new outbound connections for forensics.\n\n## Config\n```nix\nnetworking.firewall.extraCommands = ''\n # Log all new outbound from regular users\n iptables -A OUTPUT -m state --state NEW -m owner --uid-owner 1000:65534 \\\n -j LOG --log-prefix \"EGRESS: \" --log-level info\n'';\n```\n\n## Usage\n```bash\n# View egress logs\njournalctl -k | grep EGRESS\n\n# Watch live\njournalctl -kf | grep EGRESS\n```\n\n## Notes\n- Logs before rate limit rules (if both implemented)\n- Includes source UID, dest IP, dest port","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-02T20:17:39.566590459-08:00","created_by":"dan","updated_at":"2026-01-02T20:17:39.566590459-08:00"}
|
||||
{"id":"ops-jrz1-qxr","title":"mautrix-slack message edit panic (upstream bug)","description":"Bridge upgraded to v25.11. Need to verify if edit panic is fixed by testing a Slack message edit. Watch logs: journalctl -u mautrix-slack -f | grep -E 'ERR|panic|edit'","status":"closed","priority":2,"issue_type":"bug","created_at":"2025-12-05T18:22:38.18203834-08:00","updated_at":"2025-12-05T19:36:00.556011621-08:00","closed_at":"2025-12-05T19:36:00.556011621-08:00","dependencies":[{"issue_id":"ops-jrz1-qxr","depends_on_id":"ops-jrz1-03o","type":"blocks","created_at":"2025-12-05T18:24:23.259399275-08:00","created_by":"daemon","metadata":"{}"}]}
|
||||
{"id":"ops-jrz1-rkp","title":"Add egress abuse watchdog","description":"Monitor for users hitting egress rate limits, kill if sustained.\n\n## Script: /usr/local/bin/egress-watchdog\n```bash\n#\\!/usr/bin/env bash\n# Kill users who keep hitting egress limits\nTHRESHOLD=10 # EGRESS-LIMIT hits per minute\nCOUNTFILE=\"/var/lib/egress-watchdog\"\nmkdir -p \"$COUNTFILE\"\n\n# Count recent limit hits per UID\njournalctl -k --since \"1 minute ago\" 2\u003e/dev/null | grep \"EGRESS-LIMIT\" | \\\n grep -oP 'UID=\\K[0-9]+' | sort | uniq -c | while read count uid; do\n \n user=$(getent passwd \"$uid\" | cut -d: -f1)\n [ -z \"$user\" ] \u0026\u0026 continue\n \n if [ \"$count\" -gt \"$THRESHOLD\" ]; then\n strikes=$(cat \"$COUNTFILE/$user\" 2\u003e/dev/null || echo 0)\n strikes=$((strikes + 1))\n echo \"$strikes\" \u003e \"$COUNTFILE/$user\"\n logger -t egress-watchdog \"User $user hit egress limit $count times (strike $strikes/3)\"\n \n if [ \"$strikes\" -ge 3 ]; then\n /usr/local/bin/killswitch \"$user\" \"egress abuse ($count hits)\"\n rm -f \"$COUNTFILE/$user\"\n fi\n else\n rm -f \"$COUNTFILE/$user\"\n fi\ndone\n```\n\n## Behavior\n- Runs every minute (same timer as CPU watchdog, or separate)\n- 3 consecutive minutes of \u003e10 blocked connections = kill\n- Works with egress rate limiting (ops-jrz1-cmv)\n\n## Dependencies\n- Requires ops-jrz1-cmv (egress rate limiting)\n- Requires ops-jrz1-396 (killswitch script)","status":"open","priority":2,"issue_type":"task","created_at":"2026-01-02T20:21:09.516724064-08:00","created_by":"dan","updated_at":"2026-01-02T20:21:09.516724064-08:00","dependencies":[{"issue_id":"ops-jrz1-rkp","depends_on_id":"ops-jrz1-396","type":"blocks","created_at":"2026-01-02T20:21:14.314011866-08:00","created_by":"dan"},{"issue_id":"ops-jrz1-rkp","depends_on_id":"ops-jrz1-cmv","type":"blocks","created_at":"2026-01-02T20:21:14.352411765-08:00","created_by":"dan"}]}
|
||||
{"id":"ops-jrz1-t73","title":"Rename learner to dev in scripts and docs","description":"Rename terminology from \"learner\" to \"dev\" or \"user\" across:\n\n- scripts/learner-add.sh → dev-add.sh\n- scripts/learner-remove.sh → dev-remove.sh\n- /etc/slack-learner.env → /etc/slack-dev.env\n- learners group → devs group\n- docs/learner-*.md\n- tests/test-learner-env.sh\n\nLow priority cleanup.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-02T12:32:40.340984626-08:00","created_by":"dan","updated_at":"2026-01-02T12:32:40.340984626-08:00"}
|
||||
{"id":"ops-jrz1-u0w","title":"Security review of running server","status":"closed","priority":2,"issue_type":"task","created_at":"2025-12-04T21:03:22.420507724-08:00","updated_at":"2025-12-04T21:04:31.989886731-08:00","closed_at":"2025-12-04T21:04:31.989886731-08:00"}
|
||||
{"id":"ops-jrz1-vix","title":"Evaluate home-manager for per-user config","description":"Evaluate whether home-manager adds value for our setup.\n\n## What home-manager could manage\n- Shell config (.bashrc, .zshrc)\n- Git config (.gitconfig)\n- Tool configs (~/.config/*)\n- direnv integration\n\n## Questions\n- Do we need declarative per-user dotfiles?\n- Is the complexity worth it for a small team?\n- Can we start without it and add later?\n\n## Recommendation from consensus\n\"Optional but recommended\" - good for pushing default configs to all devs.\nStart without it, add if pain point emerges.","status":"open","priority":3,"issue_type":"task","created_at":"2026-01-02T16:36:04.849881753-08:00","created_by":"dan","updated_at":"2026-01-02T16:36:04.849881753-08:00"}
|
||||
|
|
|
|||
Loading…
Reference in a new issue