skills/tests/scenarios/medium/add-caching-to-api.yaml
dan 4da1890fc3 docs: add scenario schema and example test fixtures
Scaffold for agent capability benchmark harness (skills-qng9):
- docs/specs/scenario-schema.md: YAML schema for test scenarios
- tests/scenarios/: Easy, medium, hard example scenarios
- tests/fixtures/: Python fixtures for testing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 21:24:28 -08:00

155 lines
4.7 KiB
YAML

# Medium scenario: Add caching to existing API endpoint
id: add-caching-to-api
title: Add caching to user lookup endpoint
difficulty: medium
tags: [python, flask, caching, refactor, multi-file]
fixture:
source: flask-user-api
task:
description: |
The `/api/users/<id>` endpoint is slow because it queries the database
on every request. Add caching to improve performance.
Requirements:
- Cache user lookups for 5 minutes
- Use the existing `cache` module (see src/cache.py)
- Cache key should include user ID
- Cache should be invalidated when user is updated (PUT /api/users/<id>)
- Add a header `X-Cache: HIT` or `X-Cache: MISS` to responses
context:
- path: src/routes/users.py
hint: The endpoint to modify
- path: src/cache.py
hint: Existing cache utilities to use
- path: src/models/user.py
hint: User model for reference
execution:
mode: both
timeout: 10m
scripted:
actions:
- type: worker
command: start
- type: edit
path: src/routes/users.py
old: |
@bp.route('/api/users/<int:user_id>')
def get_user(user_id):
user = User.query.get_or_404(user_id)
return jsonify(user.to_dict())
new: |
@bp.route('/api/users/<int:user_id>')
def get_user(user_id):
cache_key = f"user:{user_id}"
cached = cache.get(cache_key)
if cached:
response = jsonify(cached)
response.headers['X-Cache'] = 'HIT'
return response
user = User.query.get_or_404(user_id)
user_dict = user.to_dict()
cache.set(cache_key, user_dict, ttl=300)
response = jsonify(user_dict)
response.headers['X-Cache'] = 'MISS'
return response
- type: edit
path: src/routes/users.py
old: |
@bp.route('/api/users/<int:user_id>', methods=['PUT'])
def update_user(user_id):
user = User.query.get_or_404(user_id)
# ... update logic ...
db.session.commit()
return jsonify(user.to_dict())
new: |
@bp.route('/api/users/<int:user_id>', methods=['PUT'])
def update_user(user_id):
user = User.query.get_or_404(user_id)
# ... update logic ...
db.session.commit()
cache.delete(f"user:{user_id}") # Invalidate cache
return jsonify(user.to_dict())
- type: edit
path: src/routes/users.py
old: |
from flask import Blueprint, jsonify
new: |
from flask import Blueprint, jsonify
from src import cache
- type: shell
run: git add -A && git commit -m "Add caching to user lookup endpoint"
- type: worker
command: done
verify:
properties:
- type: file_contains
path: src/routes/users.py
pattern: "cache\\.get"
- type: file_contains
path: src/routes/users.py
pattern: "cache\\.set"
- type: file_contains
path: src/routes/users.py
pattern: "X-Cache"
- type: file_contains
path: src/routes/users.py
pattern: "cache\\.delete"
- type: tests_pass
command: pytest tests/ -v
- type: custom
command: |
# Functional test: verify caching behavior
python -c "
from src import create_app
app = create_app('testing')
with app.test_client() as client:
# First request should be MISS
r1 = client.get('/api/users/1')
assert r1.headers.get('X-Cache') == 'MISS', 'First request should be MISS'
# Second request should be HIT
r2 = client.get('/api/users/1')
assert r2.headers.get('X-Cache') == 'HIT', 'Second request should be HIT'
"
llm_judge:
enabled: true
model: sonnet # Use better model for nuanced evaluation
rubric:
- criterion: Caching is implemented correctly with appropriate TTL
weight: 1.0
- criterion: Cache invalidation on update is implemented
weight: 1.0
- criterion: X-Cache header correctly indicates HIT/MISS
weight: 0.8
- criterion: Existing cache module is used (not reinvented)
weight: 0.5
- criterion: Code follows existing patterns in the codebase
weight: 0.5
- criterion: No obvious bugs or edge cases missed
weight: 0.7
threshold: 0.75
human:
required: false
rubric:
- Cache logic is correct and efficient
- Invalidation covers all update paths
- No security issues with cached data
benchmark:
enabled: true
runs: 3