Skip to content

examples/ — Runnable hands-on exercises

← Back to main path README

Every stage in the learning roadmap has a "Hands-on Exercises" section that tells you what to do. This folder adds the actual runnable starter code — copy → install deps → python starter.py → see expected output.

Directory layout

examples/
├── stage-3/                     # Tool Use & Agent intro
│   ├── 03-react-from-scratch/   # Exercise 3: ReAct from scratch
│   │   ├── starter.py           # Main program (~70 LOC runnable)
│   │   ├── test.py              # Self-check (pure assert, no pytest)
│   │   ├── README.md            # 200-400-word walkthrough (+.zh-Hans.md +.en.md)
│   │   └── requirements.txt     # Pinned deps
│   └── ...
├── stage-1/
└── ...

Short exercises (≤30 LOC) stay inline as <details> blocks in the stage doc — no folder. Longer ones (>30 LOC) get their own folder so stage docs don't get bloated by code blocks.

How to run any example

cd examples/stage-3/03-react-from-scratch
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-...   # Each example header lists the key it needs
python starter.py                     # Hits the real API to see output (~$0.001 in credits)
python test.py                        # Runs validation (mock-based, free)

Design rules

Dimension Rule
Program length starter ≤80 LOC, split if longer
Dependencies stdlib + ≤2 pip packages, pinned versions
Tests Plain assert, no pytest; reader runs python test.py to see ✅
Comments Chinese (zh-TW primary), English variable / function names
Self-check Every starter.py ends with a # === Self-check === block
Environment vars Header comment must list required keys
Free-tier friendly Use the cheapest model (claude-haiku / Ollama); note how to switch to Sonnet
Windows encoding Every .py must reconfigure stdout to UTF-8 (see below)

Windows cp950 encoding fix (mandatory in every starter.py / test.py)

Windows consoles default to cp950 (Big5) and can't print emoji or non-Big5 Chinese. Add this right after imports in every .py:

import sys
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

Without it, Windows readers running in PowerShell / cmd hit UnicodeEncodeError: 'cp950' codec can't encode character '✅'.

Three paths — default is Ollama (cost-driven)

💰 Why default to Ollama? Running 1000 practice iterations on Sonnet costs ~$4; on haiku ~$0.25; on local Ollama $0. API cost should not block learning. Reserve cloud LLMs for "want to see high-quality answers / production deployment".

Every exercise ships with all three paths:

  • Default starter.py / first inline <details> block uses a local model
  • Requires Ollama; pull a model based on the stage:
  • Stage 1 + 2 (plain chat / prompt eng): ollama pull gemma4:e4b (~7.5 GB; multimodal (text + image + audio); CPU-friendly)
  • Stage 3+ (tool use / agent): ollama pull qwen2.5:3b (1.9 GB; reliable tool-use support)
  • $0, offline, fine for privacy-sensitive data
  • SDK uses the openai package (OpenAI-compatible API) with base_url="http://localhost:11434/v1"
  • Best for: all readers (this is the default recommendation)

Path B (optional) — Anthropic API (when you want cloud quality)

  • Companion starter_anthropic.py (folder) or the second inline <details> block
  • Requires ANTHROPIC_API_KEY; ~$0.001 per run (haiku) / ~$0.004 (sonnet)
  • Higher answer quality and lower latency than local 3-4B Ollama models
  • Best for: production-quality demands, long-context work, the Stage 7 production tier

Path C (verify logic, no API call)

  • Every test.py uses unittest.mock; python test.py validates code logic without spending
  • Complements A / B — mock first, then real call

Trade-offs

Dimension A Ollama (default) B Anthropic C Mock
Cost per call $0 ~$0.001-0.004 $0
Requires Ollama install API key nothing
Answer quality medium (3-4B model) high canned, unrepresentative
Speed 5-30 s/call (no GPU) ~1-3 s/call <0.1 s
Offline
Privacy-sensitive data
Stage 3+ tool use ✅ (qwen2.5 / llama3.2)
Best for default, no budget pressure production upgrade logic verification

Recommended flow: C first (validate logic, no cost), then A (see real model behaviour locally), then B at the Stage 7 production stage if cloud quality is needed.

Local + cloud, user-perspective.
💡 You don't need to install every model — this table shows "which to use for practice" and "which to upgrade to for production". Claude is the canonical / production reference; Ollama is the practice default.

Local LLMs (practice default, via Ollama)

Model Download Recommended RAM Stage Tool-use Speed (CPU/GPU) Primary use
gemma4:e4b 7.5 GB 8 GB 1+2 basic slow / med Stage 1-2 plain chat / prompt eng (default)
qwen2.5:3b 1.9 GB 4 GB 3+ reliable med / fast Stage 3+ tool use / agent (default)
llama3.2:3b 2.0 GB 4 GB 3+ reliable med / fast qwen2.5:3b alternative
mistral-nemo:12b 7.1 GB 16 GB 3+ strong slow / med When you want closer-to-cloud quality
qwen2.5:14b 9.0 GB 16 GB advanced strong slow / med Larger-model comparison (GPU preferred)
gemma4:e2b 4.0 GB 4 GB 1+2 basic med / fast 4 GB-RAM-machine alternative

Install: ollama pull <model> + ollama serve. Hardware tuning details: resources/cli-agents-guide.en.md.

Cloud LLMs (canonical / production stack, via Anthropic)

Model $/1M input $/1M output Context Primary use
claude-haiku-4-5 $1 $5 200k Cheapest; fine for Stage 1-7 cloud-quality comparisons
claude-sonnet-4-6 $3 $15 1M Production default; Stage 5+ agent development
claude-opus-4-7 $5 $25 1M Highest quality; complex reasoning / long-context refactors

Subscription alternative: Claude Pro $20/month (includes Sonnet usage); Claude Max $100/month (includes Opus). Details: resources/cli-agents-guide.en.md.

Cloud LLM Chinese / open-source alternatives (region limits / budget / Chinese-language scenarios)

Can't or don't want to use Anthropic? These APIs are all OpenAI-compatible — change base_url and model name to run the same exercises.

Provider Main model $/1M input $/1M output OpenAI-compat? Key selling point
DeepSeek deepseek-chat (V3) $0.27 $1.10 Cheapest cloud (4× cheaper than haiku $1/$5); strong CN & EN; free web at chat.deepseek.com
DeepSeek R1 deepseek-reasoner $0.55 $2.19 Reasoning model (o1-class), still 1/30 the price of OpenAI o1
Moonshot Kimi kimi-k2-turbo-preview $5-10 $15-30 1M-token context (key selling point); good for large files / long conversations. Free web at kimi.com
Qwen (Alibaba) qwen-max / qwen-turbo $0.50-1.50 $1.50-6 ✅ (DashScope) Native Chinese; same models also run locally via Ollama (cloud + local both work)
GLM (ZhipuAI) glm-4.5 / glm-4-plus $0.30-2 $1.50-9 China-native, has free tier. Free web chatglm.cn
NVIDIA NIM Llama / Mistral / DeepSeek / Qwen etc. hosted free tier 1000 credits (same) Hosts 10+ open models; new accounts get credits; no local GPU needed. build.nvidia.com

API endpoints (OpenAI SDK usage):

# DeepSeek
client = OpenAI(api_key=os.environ["DEEPSEEK_API_KEY"], base_url="https://api.deepseek.com/v1")
r = client.chat.completions.create(model="deepseek-chat", messages=[...])

# Moonshot Kimi (China endpoint; international uses .ai)
client = OpenAI(api_key=os.environ["MOONSHOT_API_KEY"], base_url="https://api.moonshot.cn/v1")
r = client.chat.completions.create(model="kimi-k2-turbo-preview", messages=[...])

# Qwen (Alibaba DashScope)
client = OpenAI(api_key=os.environ["DASHSCOPE_API_KEY"],
                base_url="https://dashscope.aliyuncs.com/compatible-mode/v1")
r = client.chat.completions.create(model="qwen-turbo", messages=[...])

# GLM (ZhipuAI)
client = OpenAI(api_key=os.environ["ZHIPUAI_API_KEY"], base_url="https://open.bigmodel.cn/api/paas/v4")
r = client.chat.completions.create(model="glm-4.5-flash", messages=[...])

# NVIDIA NIM (hosted open-source)
client = OpenAI(api_key=os.environ["NVIDIA_API_KEY"], base_url="https://integrate.api.nvidia.com/v1")
r = client.chat.completions.create(model="meta/llama-3.3-70b-instruct", messages=[...])

How to pick:

Scenario Pick Why
Mainland China, no cloud access Ollama local / DeepSeek API Local is free; DeepSeek has an in-China endpoint
Tight budget (< $1/month) DeepSeek API 4× cheaper than haiku; quality close
Large files / long-doc RAG Moonshot Kimi 1M-token context
Chinese-native task (classical Chinese, CN search) Qwen / GLM Higher Chinese training corpus ratio
Want to try 10+ open models without GPU NVIDIA NIM One key, play with Llama / Mixtral / Qwen / DeepSeek
Production agent (tool use) Anthropic Claude (canonical) This repo's Path B default; tool calling most reliable

Budget estimate (completing all 54 exercises across Stage 1-7)

Learning path Total time Total cost Best for
All local Ollama ~30 hr (CPU) / ~10 hr (GPU) $0 Budget-conscious, privacy needs, China-mainland no-cloud-access
Mixed: local practice + haiku final review ~30 hr $2-5 Recommended default — practice locally, run final 1-2 iterations on haiku to see cloud quality
All haiku ~10 hr $5-15 Want speed, budget allows, want full cloud experience
All sonnet ~8 hr $20-50 Production-grade practice, want high-quality answers
Mixed: sonnet + opus on hard problems ~8 hr $30-80 Already a production agent developer

🎯 Beginner default: run everything locally first; cap budget at $5. Only consider upgrading to sonnet at the Stage 7 production tier.

Index by stage

Stage Exercises Example location
1 LLM basics 6 inline 4 + folder 2 (examples/stage-1/)
2 Prompt engineering 4 all inline
3 Tool use 6 inline 1 + folder 5 (examples/stage-3/)
4 Frameworks 5 all folder (examples/stage-4/)
5 Claude Code ecosystem 11 inline 6 + folder 5 (examples/stage-5/)
6 Memory/RAG 5 all folder (examples/stage-6/)
7 Multi-agent 5 inline 1 + folder 4 (examples/stage-7/)
Track A1-A3 12 all inline + 2 small folders (CLI-9 / CLI-10)

→ T1 scope: Stage 3 全 6 exercises only (remaining stages roll out per plan tiers).

Contributing / reporting issues

If something doesn't run, output doesn't match expectations, or you want to add a new example: - File an issue tagged examples - Or open a PR following the "Design rules" table above

Why this split (instead of stuffing everything into stage docs)

  1. Stage docs stay readable — roadmap readers don't always want code, they want concepts; long code blocks break that
  2. Examples evolve independently — SDK bumps, model rename, example needs its own commit without polluting the roadmap's git log
  3. Readers can clone one examplesvn export or git clone --filter=tree:0 grabs a single folder
  4. Future CI — example failures shouldn't block mdbook deploy; this split lets CI run examples conditionally