Exercise 5: Tool Error Handling¶
Corresponds to Stage 3 — Tool Use & Agent Intro Exercise 5.
Why this matters¶
Real agents rarely walk the happy path only: APIs time out, third parties go down, users send bad inputs. This exercise deliberately makes fetch_weather(city) return a structured error on the first call ({"error": "network timeout", "retry_hint": "try again in 1s"}) and succeed on the second; you observe how the ReAct loop hands the error observation back to the LLM and lets the model decide whether to retry, change the query, or give up.
Core idea: tool errors are data, not exceptions. Return structured dicts, don't raise.
How to run — two paths¶
Path A (default, free, local)¶
pip install -r requirements.txt
ollama pull qwen2.5:3b
ollama serve
python starter.py
Budget: $0. A 3-round loop takes ~10-60s.
Path B (Anthropic, cloud-quality comparison)¶
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-...
python starter_anthropic.py
Budget: ~$0.003 per run (claude-haiku-4-5, 3 rounds of accumulating messages).
Expected output (Path A, local, ideal retry-then-succeed path):
❓ Question: Will it rain in Taipei today? (using Ollama qwen2.5:3b)
------------------------------------------------------------
[step 0] tool: fetch_weather({'city': 'Taipei'}) → {'error': 'network timeout', 'retry_hint': 'try again in 1s'}
[step 1] tool: fetch_weather({'city': 'Taipei'}) → {'city': 'Taipei', 'forecast': 'rain', 'temperature_c': 24}
------------------------------------------------------------
✅ Final answer: It will rain in Taipei today (24°C).
✅ Exercise 5 passed — tool errors are data, not exceptions, $0/run
Validate the logic without API credits (mock-based)¶
python test.py # validates Path A (Ollama) starter.py logic
python test_anthropic.py # validates Path B (Anthropic) starter_anthropic.py logic
Both test suites use unittest.mock, no real API call, $0/run.
Design reminders¶
Errors should be structured data, so the LLM has context to make decisions:
| Bad | Good |
|---|---|
raise Exception("failed") |
return {"error": "network timeout", "retry_hint": "try again in 1s"} |
return "failed" |
return {"error": "...", "category": "transient", "retry_hint": "..."} |
| Unbounded retry | max_iter safety + business-layer retry quota |
Returning just "failed" leaves the model with nothing to act on. Adding retry_hint, error category, and recovery suggestions gives the model enough context to choose. And cap your retries — otherwise the agent loops forever on a broken tool.
What to watch on each path¶
Side observation: small models (qwen2.5:3b) follow retry_hint less reliably than Claude — they might give up immediately or ignore the hint and repeat the same call. That's exactly the teaching point: in production, the same retry pattern produces different behaviors depending on how well a model reads structured errors — a real consideration when picking a model (we'll revisit in Stage 7).
| Observation | Anthropic Claude haiku | Ollama qwen2.5:3b |
|---|---|---|
Retries on retry_hint |
High | Medium (may give up) |
| Graceful end after repeated failure | Stable | May retry a third time |
| Distinguishing transient vs permanent | Finer | Coarser |
Want smarter answers?¶
Default is claude-haiku-4-5 (cheapest). Try Sonnet:
MODEL=claude-sonnet-4-6 python starter_anthropic.py
Or on the Ollama path, swap to a larger model:
MODEL=qwen2.5:7b python starter.py
Extensions¶
- Add a retry quota — track
error_countand give up after N - Add a circuit breaker — after consecutive failures, stop calling for a while (avoids wave-after-wave on a broken downstream)
- Classify errors — transient (429 / connection) vs permanent (401 / 400) get different handling
- Production tier — see
../../stage-1/05-error-handling/for an API-level retry wrapper with exponential backoff + jitter