Scaffold for agent capability benchmark harness (skills-qng9): - docs/specs/scenario-schema.md: YAML schema for test scenarios - tests/scenarios/: Easy, medium, hard example scenarios - tests/fixtures/: Python fixtures for testing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
155 lines
4.7 KiB
YAML
155 lines
4.7 KiB
YAML
# Medium scenario: Add caching to existing API endpoint
|
|
id: add-caching-to-api
|
|
title: Add caching to user lookup endpoint
|
|
difficulty: medium
|
|
tags: [python, flask, caching, refactor, multi-file]
|
|
|
|
fixture:
|
|
source: flask-user-api
|
|
|
|
task:
|
|
description: |
|
|
The `/api/users/<id>` endpoint is slow because it queries the database
|
|
on every request. Add caching to improve performance.
|
|
|
|
Requirements:
|
|
- Cache user lookups for 5 minutes
|
|
- Use the existing `cache` module (see src/cache.py)
|
|
- Cache key should include user ID
|
|
- Cache should be invalidated when user is updated (PUT /api/users/<id>)
|
|
- Add a header `X-Cache: HIT` or `X-Cache: MISS` to responses
|
|
|
|
context:
|
|
- path: src/routes/users.py
|
|
hint: The endpoint to modify
|
|
- path: src/cache.py
|
|
hint: Existing cache utilities to use
|
|
- path: src/models/user.py
|
|
hint: User model for reference
|
|
|
|
execution:
|
|
mode: both
|
|
timeout: 10m
|
|
|
|
scripted:
|
|
actions:
|
|
- type: worker
|
|
command: start
|
|
- type: edit
|
|
path: src/routes/users.py
|
|
old: |
|
|
@bp.route('/api/users/<int:user_id>')
|
|
def get_user(user_id):
|
|
user = User.query.get_or_404(user_id)
|
|
return jsonify(user.to_dict())
|
|
new: |
|
|
@bp.route('/api/users/<int:user_id>')
|
|
def get_user(user_id):
|
|
cache_key = f"user:{user_id}"
|
|
cached = cache.get(cache_key)
|
|
if cached:
|
|
response = jsonify(cached)
|
|
response.headers['X-Cache'] = 'HIT'
|
|
return response
|
|
|
|
user = User.query.get_or_404(user_id)
|
|
user_dict = user.to_dict()
|
|
cache.set(cache_key, user_dict, ttl=300)
|
|
response = jsonify(user_dict)
|
|
response.headers['X-Cache'] = 'MISS'
|
|
return response
|
|
- type: edit
|
|
path: src/routes/users.py
|
|
old: |
|
|
@bp.route('/api/users/<int:user_id>', methods=['PUT'])
|
|
def update_user(user_id):
|
|
user = User.query.get_or_404(user_id)
|
|
# ... update logic ...
|
|
db.session.commit()
|
|
return jsonify(user.to_dict())
|
|
new: |
|
|
@bp.route('/api/users/<int:user_id>', methods=['PUT'])
|
|
def update_user(user_id):
|
|
user = User.query.get_or_404(user_id)
|
|
# ... update logic ...
|
|
db.session.commit()
|
|
cache.delete(f"user:{user_id}") # Invalidate cache
|
|
return jsonify(user.to_dict())
|
|
- type: edit
|
|
path: src/routes/users.py
|
|
old: |
|
|
from flask import Blueprint, jsonify
|
|
new: |
|
|
from flask import Blueprint, jsonify
|
|
from src import cache
|
|
- type: shell
|
|
run: git add -A && git commit -m "Add caching to user lookup endpoint"
|
|
- type: worker
|
|
command: done
|
|
|
|
verify:
|
|
properties:
|
|
- type: file_contains
|
|
path: src/routes/users.py
|
|
pattern: "cache\\.get"
|
|
|
|
- type: file_contains
|
|
path: src/routes/users.py
|
|
pattern: "cache\\.set"
|
|
|
|
- type: file_contains
|
|
path: src/routes/users.py
|
|
pattern: "X-Cache"
|
|
|
|
- type: file_contains
|
|
path: src/routes/users.py
|
|
pattern: "cache\\.delete"
|
|
|
|
- type: tests_pass
|
|
command: pytest tests/ -v
|
|
|
|
- type: custom
|
|
command: |
|
|
# Functional test: verify caching behavior
|
|
python -c "
|
|
from src import create_app
|
|
app = create_app('testing')
|
|
with app.test_client() as client:
|
|
# First request should be MISS
|
|
r1 = client.get('/api/users/1')
|
|
assert r1.headers.get('X-Cache') == 'MISS', 'First request should be MISS'
|
|
|
|
# Second request should be HIT
|
|
r2 = client.get('/api/users/1')
|
|
assert r2.headers.get('X-Cache') == 'HIT', 'Second request should be HIT'
|
|
"
|
|
|
|
llm_judge:
|
|
enabled: true
|
|
model: sonnet # Use better model for nuanced evaluation
|
|
rubric:
|
|
- criterion: Caching is implemented correctly with appropriate TTL
|
|
weight: 1.0
|
|
- criterion: Cache invalidation on update is implemented
|
|
weight: 1.0
|
|
- criterion: X-Cache header correctly indicates HIT/MISS
|
|
weight: 0.8
|
|
- criterion: Existing cache module is used (not reinvented)
|
|
weight: 0.5
|
|
- criterion: Code follows existing patterns in the codebase
|
|
weight: 0.5
|
|
- criterion: No obvious bugs or edge cases missed
|
|
weight: 0.7
|
|
threshold: 0.75
|
|
|
|
human:
|
|
required: false
|
|
rubric:
|
|
- Cache logic is correct and efficient
|
|
- Invalidation covers all update paths
|
|
- No security issues with cached data
|
|
|
|
benchmark:
|
|
enabled: true
|
|
runs: 3
|