ops-jrz1/scripts/cpu-watchdog
Dan 89f2987f1e Add cgroups limits and CPU watchdog
- User slice: MemoryMax 80%, TasksMax 500, CPUWeight 100
- CPU watchdog: detects sustained abuse (>180% for 5 min), kills user
- Fixed scripts for NixOS (shebang, PATH)
- Closes ops-jrz1-8m7, ops-jrz1-1bk

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-02 21:02:18 -08:00

44 lines
1.2 KiB
Bash
Executable file

#!/run/current-system/sw/bin/bash
# cpu-watchdog - Detect sustained CPU abuse, kill after 5 consecutive violations
# Runs every minute via systemd timer
set -euo pipefail
# NixOS paths
PATH="/run/current-system/sw/bin:$PATH"
THRESHOLD=180 # 180% CPU (almost 2 cores)
MAX_STRIKES=5
COUNTDIR="/var/lib/cpu-watchdog"
mkdir -p "$COUNTDIR"
for homedir in /home/*; do
user=$(basename "$homedir")
# Skip if not a real user
id "$user" &>/dev/null || continue
# Get total CPU usage for user
pct=$(ps -u "$user" -o %cpu= 2>/dev/null | awk '{s+=$1}END{print int(s)}' | tr -d '[:space:]' || echo 0)
pct=${pct:-0}
[[ "$pct" =~ ^[0-9]+$ ]] || pct=0
if [ "$pct" -gt "$THRESHOLD" ]; then
# Increment strike counter
count=$(cat "$COUNTDIR/$user" 2>/dev/null || echo 0)
count=$((count + 1))
echo "$count" > "$COUNTDIR/$user"
logger -t cpu-watchdog "User $user at ${pct}% CPU (strike $count/$MAX_STRIKES)"
if [ "$count" -ge "$MAX_STRIKES" ]; then
/usr/local/bin/killswitch "$user" "sustained CPU abuse (${pct}%)"
rm -f "$COUNTDIR/$user"
fi
else
# Reset counter if below threshold
rm -f "$COUNTDIR/$user"
fi
done