Exercise 5: Error Handling + Retry Wrapper¶

Corresponds to Stage 1 — LLM Basics Exercise 5.

Why this matters¶

Production agents in Stages 3-8 will absolutely hit API errors:

Rate limit (429) — different subscription tiers, you can hit it anytime
Network jitter (connection reset) — cross-DC / VPN happens daily
Expired API key (401) — rotation out of sync
Context overflow (400) — you appended too much history

Some errors should be retried (rate limit / network), others should not (key, context full). Not distinguishing them is one of the most common production-agent mistakes.

How to run — two paths¶

Path A (default, free, local)¶

pip install -r requirements.txt
ollama pull gemma4:e4b
ollama serve
python starter.py

Budget: $0. The local demo shows connection / context-window behavior.

Path B (Anthropic, real cloud errors)¶

pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-...
python starter_anthropic.py

Budget: ~$0.0005 per run (only "scenario 2 normal call" actually hits the API).

Expected output (Path A, local):

[Scenario 1] Pointing at a dead Ollama port
  ✅ Caught APIConnectionError
  💡 Production handling: retry (network errors are typically transient)

[Scenario 2] Normal call wrapped in with_retry (needs Ollama running)
  ✅ Succeeded on first try: 👋

[Scenario 3] Prompt over context window (Ollama usually truncates or raises)
  ⚠ Ollama didn't raise (it likely truncated). Cloud APIs would 400.

✅ Exercise 5 passed — you understand which errors raise, when to retry, when to stop. $0/run

Validate the logic without network failures (mock-based)¶

python test.py            # validates Path A (Ollama) retry wrapper
python test_anthropic.py  # validates Path B (Anthropic) retry wrapper

6 tests use unittest.mock to fabricate errors + fake sleep (0 real seconds), validating the retry logic:

✅ test_no_retry_when_success_first_time
✅ test_retry_on_connection_error_then_success
✅ test_retry_on_rate_limit
✅ test_raise_after_max_attempts
✅ test_no_retry_on_auth_error
✅ test_exponential_backoff_delays

🎉 All passed — retry wrapper logic correct

Local advantage: Ollama can't hit a real RateLimitError (no quota), so the "rate limit demo" is invisible. But the mock-based tests cover the retry logic completely and reproduce in 0 seconds — which is precisely what makes the Ollama path ideal for learning retry patterns: fast, free, deterministic.

Program structure walkthrough¶

Section	What it does
`RETRIABLE = (APIConnectionError, RateLimitError)`	Whitelist: only these two retry; everything else raises
`with_retry(fn, ...)`	Exponential backoff wrapper: 1s, 2s, 4s, 8s + jitter
`demo_bad_key()`	Triggers a network / 401 error to inspect the raised exception
`demo_with_retry()`	Normal call wrapped in `with_retry`, expected to succeed on first try
`demo_too_long_prompt()`	Oversize prompt — see how the context-window limit surfaces

Exception-class mapping between SDKs¶

Anthropic SDK	OpenAI SDK (Ollama)	Meaning	RETRIABLE?
`anthropic.APIConnectionError`	`openai.APIConnectionError`	Network down	✅
`anthropic.RateLimitError`	`openai.RateLimitError`	429 throttle	✅
`anthropic.AuthenticationError`	`openai.AuthenticationError`	401 bad key	❌
`anthropic.APIStatusError`	`openai.APIStatusError`	Generic HTTP error	Depends on status

The two SDKs use nearly identical class names; the retry logic is fully backend-agnostic. Swap SDKs = change the import line.

Common pitfalls¶

Blindly retry every exception — you'll retry AuthenticationError 4 times for nothing. The RETRIABLE whitelist is the heart of this pattern
No jitter — 1000 workers rate-limited together, all retry after 1s, hit the limit again → deadlock. Add random.uniform(0, 0.3) to spread them
max_attempts too high — 8 retries = 1+2+4+8+16+32+64+128 = 255s, user gave up long ago. max_attempts=4 usually suffices
No attempt-count metric — production must emit retry-count metrics; alert above threshold
Ignoring Retry-After — rate-limit responses include this header; SDKs handle it automatically, but a custom wrapper shouldn't ignore the hint

Extensions¶

Better jitter — try decorrelated jitter for stability
Circuit breaker — after N consecutive failures, stop calling for a while
Use tenacity — production code shouldn't roll its own retry; this starter is just to show what's inside
Finer error classification — different backoffs for 429 / 503 / 502 / 500
Stage 3 tool-level errors — see ../../stage-3/05-error-handling/ for structured tool errors that let the LLM decide within a ReAct loop