Exercise 5: Error Handling + Retry Wrapper¶
Corresponds to Stage 1 — LLM Basics Exercise 5.
Why this matters¶
Production agents in Stages 3-7 will absolutely hit API errors:
- Rate limit (429) — different subscription tiers, you can hit it anytime
- Network jitter (connection reset) — cross-DC / VPN happens daily
- Expired API key (401) — rotation out of sync
- Context overflow (400) — you appended too much history
Some errors should be retried (rate limit / network), others should not (key, context full). Not distinguishing them is one of the most common production-agent mistakes.
How to run — two paths¶
Path A (default, free, local)¶
pip install -r requirements.txt
ollama pull gemma4:e4b
ollama serve
python starter.py
Budget: $0. The local demo shows connection / context-window behavior.
Path B (Anthropic, real cloud errors)¶
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-...
python starter_anthropic.py
Budget: ~$0.0005 per run (only "scenario 2 normal call" actually hits the API).
Expected output (Path A, local):
[Scenario 1] Pointing at a dead Ollama port
✅ Caught APIConnectionError
💡 Production handling: retry (network errors are typically transient)
[Scenario 2] Normal call wrapped in with_retry (needs Ollama running)
✅ Succeeded on first try: 👋
[Scenario 3] Prompt over context window (Ollama usually truncates or raises)
⚠ Ollama didn't raise (it likely truncated). Cloud APIs would 400.
✅ Exercise 5 passed — you understand which errors raise, when to retry, when to stop. $0/run
Validate the logic without network failures (mock-based)¶
python test.py # validates Path A (Ollama) retry wrapper
python test_anthropic.py # validates Path B (Anthropic) retry wrapper
6 tests use unittest.mock to fabricate errors + fake sleep (0 real seconds), validating the retry logic:
✅ test_no_retry_when_success_first_time
✅ test_retry_on_connection_error_then_success
✅ test_retry_on_rate_limit
✅ test_raise_after_max_attempts
✅ test_no_retry_on_auth_error
✅ test_exponential_backoff_delays
🎉 All passed — retry wrapper logic correct
Local advantage: Ollama can't hit a real
RateLimitError(no quota), so the "rate limit demo" is invisible. But the mock-based tests cover the retry logic completely and reproduce in 0 seconds — which is precisely what makes the Ollama path ideal for learning retry patterns: fast, free, deterministic.
Program structure walkthrough¶
| Section | What it does |
|---|---|
RETRIABLE = (APIConnectionError, RateLimitError) |
Whitelist: only these two retry; everything else raises |
with_retry(fn, ...) |
Exponential backoff wrapper: 1s, 2s, 4s, 8s + jitter |
demo_bad_key() |
Triggers a network / 401 error to inspect the raised exception |
demo_with_retry() |
Normal call wrapped in with_retry, expected to succeed on first try |
demo_too_long_prompt() |
Oversize prompt — see how the context-window limit surfaces |
Exception-class mapping between SDKs¶
| Anthropic SDK | OpenAI SDK (Ollama) | Meaning | RETRIABLE? |
|---|---|---|---|
anthropic.APIConnectionError |
openai.APIConnectionError |
Network down | ✅ |
anthropic.RateLimitError |
openai.RateLimitError |
429 throttle | ✅ |
anthropic.AuthenticationError |
openai.AuthenticationError |
401 bad key | ❌ |
anthropic.APIStatusError |
openai.APIStatusError |
Generic HTTP error | Depends on status |
The two SDKs use nearly identical class names; the retry logic is fully backend-agnostic. Swap SDKs = change the import line.
Common pitfalls¶
- Blindly retry every exception — you'll retry
AuthenticationError4 times for nothing. The RETRIABLE whitelist is the heart of this pattern - No jitter — 1000 workers rate-limited together, all retry after 1s, hit the limit again → deadlock. Add
random.uniform(0, 0.3)to spread them max_attemptstoo high — 8 retries = 1+2+4+8+16+32+64+128 = 255s, user gave up long ago.max_attempts=4usually suffices- No attempt-count metric — production must emit retry-count metrics; alert above threshold
- Ignoring
Retry-After— rate-limit responses include this header; SDKs handle it automatically, but a custom wrapper shouldn't ignore the hint
Extensions¶
- Better jitter — try decorrelated jitter for stability
- Circuit breaker — after N consecutive failures, stop calling for a while
- Use tenacity — production code shouldn't roll its own retry; this starter is just to show what's inside
- Finer error classification — different backoffs for 429 / 503 / 502 / 500
- Stage 3 tool-level errors — see
../../stage-3/05-error-handling/for structured tool errors that let the LLM decide within a ReAct loop