Stage 1 — LLM Fundamentals¶

繁體中文 | 简体中文 | English

⏱ Time estimate: 1 week (~5-8 hours)

👋 Coming from Stage 0? Nice — your toolchain is set. The next 5-8 hours: your first working call to Claude / GPT / Gemini, how token / context window / temperature shape the output, and per-token cost estimation. Jumped straight here? Make sure you can run a Python script and have an API key from one provider — if not, head back to Stage 0.

💡 Don't recognize a term? (LLM / token / context window / temperature / RAG / agent / …) → check resources/glossary.en.md for 30-second definitions.

3 Core Terms (memorize these—all later stages use them)¶

Term	Chinese	One-liner
token	詞元	the unit LLMs use to count text length and price (1 Chinese char ≈ 1.5-2 tokens; 1 English word ≈ 1.3 tokens)
context window	上下文視窗	How many tokens the model sees at once (Claude 1M / GPT ~400k / Gemini 2M)
temperature	隨機程度參數	Controls how stable or creative the output is (0 = deterministic, 1 = creative; use 0.0-0.3 for classification, 0.7-1.0 for creative writing)

→ These 3 terms run through every later stage. The goal of Stage 1 is to call the API yourself and feel firsthand how they shape the output.

🧠 Why you can tune temperature: first, next-token: an LLM's core action is predicting the next token. It computes a probability distribution over the next token and samples one from it. temperature and top_p reshape that distribution: low temperature makes it sharper, almost always picking the most likely token (stable, reproducible); high temperature flattens it, so it more readily picks rare tokens (creative but prone to wandering). max_tokens just caps how many samples before stopping. So these are not magic knobs; they control how the model picks tokens from a probability distribution.

📌 Learning Goals¶

After this stage you will be able to: - Explain what an LLM is, what tokens are, and what context window means - Make your first API call to Claude / GPT / Gemini and parse the response - Compare the four major LLM families (Claude / GPT / Gemini / Llama) on strengths - Estimate cost per task using per-token pricing

🌐 Major LLM Family Comparison (2026-05 snapshot)¶

"How is Claude different from GPT?" "Can I use Chinese models?" "Which OSS model should I run with Ollama?" This section gives you an objective side-by-side view. It does not declare a single "best" model: it compares strengths / good-fit tasks / weaknesses and includes official docs URLs so you can verify the claims yourself.

💡 First, a few terms: - Context window = the amount of conversation an LLM can remember in one pass; it is capped (for example, 200k tokens ~= 150k Chinese characters) - Apache 2.0 / MIT = open-source terms that permit commercial use, modification, and closed-source redistribution; Llama Community License = open-source but with conditions (for example, orgs with >= 700M MAU need a license) - Frontier model = each provider's strongest flagship; OSS = open-source, with weights downloadable for self-hosting

🇺🇸 US Commercial Frontier (3 providers)¶

These 3 are SaaS APIs: you pay per token and cannot self-host them.

Model family	Flagship (2026-06)	Context	Strengths	Best for	Official docs
Claude (Anthropic)	Opus 4.8 / Sonnet 5 / Haiku 4.5	1M	long-form / coding / agent / safety alignment	writing papers / code review / agent runtime	platform.claude.com/docs
GPT (OpenAI)	GPT-5.6 (preview) / GPT-5.5	~400k	general-purpose / function calling / broadest ecosystem	broad queries / function-call frameworks / GPTs ecosystem	platform.openai.com/docs/models
Gemini (Google)	3.5 Flash / 3.5 Pro (in dev)	2M	long context / native multimodal / Google integration	PDF / video and audio / large document sets / Google Workspace	ai.google.dev

Note: (preview) = still in limited preview, not open to everyone yet; (in dev) = not released yet. ⚠️ Claude Fable 5 (originally the top tier, positioned above Opus) launched 2026-06-09 but was suspended on 2026-06-12 and can't be used right now → use Opus 4.8 (the best tier you can actually use today). Context is the flagship's ceiling: Gemini Pro series 2M, Flash 1M; Claude 1M (Haiku 4.5 is 200k). Also, Sonnet 5 (launched 2026-06-30) is the current Sonnet: 1M context, fast, and cheaper than Opus ($3/$15 vs Opus $5/$25).

🇨🇳 Chinese Commercial + Open-Source Frontier (7 providers)¶

These are the main choices for Chinese-language work, in two groups: API-only (cloud, paid, can't self-host) and open weights (can run on your own machine).

① API-only (cloud, mostly paid)

Model family	Flagship (2026-05)	Context	Strengths	Best for	Official
DeepSeek	V3 (`deepseek-chat`) / R1 (`deepseek-reasoner`)	128k	reasoning / coding / lowest cost	high-token workloads / code generation / math	api-docs.deepseek.com
Kimi (Moonshot)	K2.6 multimodal + Agent	very long 1M+	long context / Chinese long-form writing	whole-book reading / literature triage	platform.moonshot.cn
Hunyuan (Tencent)	T1 (deep-thinking) + TurboS	128k	DeepSeek R1-comparable reasoning, Chinese	Chinese reasoning / Tencent ecosystem	hunyuan.tencent.com
MiniMax	abab6.5 + M2.7	200k	multimodal / Chinese long prose	Chinese writing / video and audio multimodal	platform.minimax.io

Note: This group is mostly cloud-API and proprietary. DeepSeek also has some open weights (on HF), but its V4 consumer API isn't fully public yet, so the API is still the main way to use it.

② Open weights (self-hostable)

Model family	Flagship (2026-05)	Context	Strengths	Best for	Official
Qwen (Alibaba)	Qwen3	128k+	strongest Chinese OSS / multimodal / agent	Chinese long-form writing / agent / self-host	qwen.ai · DashScope
GLM (Zhipu)	GLM-5 / GLM-5.1	128k	Chinese / tool use / agent	Chinese agents / multi-turn chat	open.bigmodel.cn · chatglm.cn
Yi (01.AI / Kai-Fu Lee)	Yi-Lightning / Yi-34B-Chat	200k	Chinese OSS alternative to Llama	Chinese self-host / Chinese API	01.ai · GitHub

Note: All three offer both an Apache 2.0 open version and a paid cloud API (GLM's open version is 5.1). The open versions run on your own machine via Ollama.

⚠️ Xiaomi MiMo is listed in resources/cli-agents-guide.md for Hermes Agent routing, but as of 2026-05 there is no authoritative official source to verify it, so it is not included in this table. To try it, connect through Hermes Agent 200+ provider routing.

🌍 Western Open-Source (4 providers, self-host defaults)¶

These are the main choices for running on your own hardware, avoiding API fees, or handling privacy-sensitive work. You can install them in one command through Ollama.

Model family	Active size	License	Strengths	Best for	Official
Llama (Meta)	3.3 70B	Llama Community License	general-purpose / broadest ecosystem / Ollama default	self-hosting intro / fine-tune base	llama.com · HF Meta
Gemma (Google)	Gemma 4 26B MoE + 31B dense	Apache 2.0	small and efficient / strong Apple MLX integration / multimodal	edge / mobile / 4-8 GB RAM machines	ai.google.dev/gemma
Mistral (Mistral AI)	7B / Mixtral 8x7B / Codestral	Apache 2.0 (OSS parts)	strongest open-source 7B class	commercial self-host / EU sovereignty	mistral.ai · HF Mistral
Phi (Microsoft)	Phi-4 14B + multimodal	MIT	small but strong / reasoning / edge-friendly	4 GB+ RAM / mobile / reasoning intro	HF microsoft

Note: Llama 4 hadn't shipped as of 2026-05 (the table shows 3.3); Gemma 4 was released 2026-04, ranked #3 on LMArena's open-weights board; Phi-4 also has a multimodal version.

🎯 Which One Should I Pick? (by scenario)¶

Your scenario	Pick + why
First time learning an LLM API, prioritize complete tutorials	Claude — Anthropic Cookbook + Courses are widely considered the most complete
Long-form writing / papers / code review	Claude Sonnet — long-form prose is a core strength
Multimodal (PDF / video and audio / images)	Gemini or Kimi — native multimodal
Broad queries + function calling frameworks	GPT — broadest ecosystem and deepest SDK integration
Chinese scenarios + commercial API	Kimi (strong long context; can fit whole books), DeepSeek (lowest cost), or GLM (agent-friendly)
Chinese scenarios + open-source self-host	Qwen 3 (Apache 2.0; currently the strongest Chinese OSS)
Reasoning / math (reasoning model)	DeepSeek R1 / Hunyuan T1 / OpenAI o-series
Privacy / offline / no API fees	Llama 3.3 / Gemma 4 / Qwen 3 OSS via Ollama
Edge / 4 GB RAM machine	Gemma 4 / Phi-4 / Qwen 3 (`qwen3-3B` or smaller variants)
100k+ token large documents	Gemini 3.1 (2M context) or Kimi K2.6 (1M+)
Want the lowest cost (API-bill sensitive)	DeepSeek V4-Flash — lowest token price among same-tier English models

📊 Neutral Benchmark Resources (verify for yourself; do not rely on one source)¶

Resource	Use	URL	2026-05 status
Artificial Analysis	Third-party benchmarks plus price/latency aggregation, including Chinese models	https://artificialanalysis.ai/	✓ Active
Arena AI (formerly LMSYS Chatbot Arena)	Human blind-test ELO leaderboard	https://arena.ai/leaderboard/text	✓ Active
Vellum LLM leaderboard	Aggregates multiple benchmarks	https://www.vellum.ai/llm-leaderboard	✓ Active
HuggingFace OpenLLM Leaderboard	Open-source model rankings	https://huggingface.co/spaces/open-llm-leaderboard	⚠️ Occasional runtime errors as of 2026-05; use the Arena AI open-source tab as fallback
SuperCLUE	Authoritative benchmark for Chinese-language scenarios	https://www.superclueai.com/	✓ Active

⚠️ Important Caveats¶

⚠️ Benchmark != production performance: run a small eval on your specific task (for example, paste 10 real prompts and see which model answers closest to what you need); do not pick only from rankings
⚠️ Frontier changes every 6 months: all numbers above are a 2026-05 snapshot; afterward, rely on official docs / Artificial Analysis
⚠️ "Strength" is relative, not absolute: every frontier model can handle basic tasks; differences matter at the margin
⚠️ For Chinese scenarios, check SuperCLUE: general international benchmarks such as MMLU are English-heavy, and Chinese-language performance may diverge

🚪 Entry Conditions¶

You should already: - Be able to run a Python script - Know what HTTP / REST is conceptually - Have an API key from at least one provider (Anthropic / OpenAI / Google)

If not — go back to Stage 0 first.

📚 Required Reading¶

Anthropic — Claude Model Overview — official model family overview, including 2026's Claude Fable 5 (claude-fable-5, Mythos-class, GA 2026-06-09) plus Opus 4.8 / Sonnet 5 / Haiku 4.5. ⚠️ Both Fable 5 and its sibling Mythos 5 (claude-mythos-5) had access suspended on 2026-06-12 by a US export-control directive (status · statement) and are currently unavailable with no restoration timeline; Opus 4.8 is the current top usable Claude tier.
anthropics/courses — Anthropic API Fundamentals ⭐⭐⭐⭐⭐ ★ 21k+ — Anthropic's official 5-course umbrella; module 1 "Anthropic API Fundamentals" maps to this stage. Jupyter notebooks, runs on Claude 3 Haiku (cheapest), hands-on walkthrough of API essentials.
OpenAI Quickstart — first API call walkthrough
A Visual Guide to LLM Tokenizers — Hugging Face's intro
Anthropic API Pricing — read the pricing table, calculate cost for 1k input + 1k output

🛠 Hands-on Exercises (foundational, illustrative)¶

🦙 This stage defaults to Ollama (cost-driven; gemma4:e4b runs locally for $0/run). Every exercise has Path A (Ollama, default) + Path B (Anthropic, optional — use it when you want to see cloud-quality answers). Full three-path trade-off in examples/README.en.md.

💰 Stage 1 budget estimate (all 6 exercises, 3-5 runs each): all local = $0, all haiku ≈ $0.30, all sonnet ≈ $0.90. Full model list + Stage 1-7 total budget: examples/README.en.md#recommended-llm-list.

💡 No Ollama yet? Each exercise also ships a Path B Anthropic version — pick one. To enable Path A in one step: pip install openai && ollama pull gemma4:e4b.

Exercise 1: LLM API (hello world)¶

Five-line Python script that calls an LLM and prints the response. Defaults to local Ollama (free, offline); switch to Path B Anthropic when you want cloud-quality answers. Details in examples/README.en.md.

📋 Starter code — Path A (local Ollama gemma4:e4b, default) (copy to practice_1.py and run python practice_1.py)

# Requires: pip install openai      (OpenAI-compatible SDK talks to Ollama)
# Pre-req: ollama pull gemma4:e4b && ollama serve
import sys
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",  # Ollama doesn't check this — anything works
)

r = client.chat.completions.create(
    model="gemma4:e4b",   # swap to qwen2.5:3b / llama3.2:3b if preferred
    max_tokens=100,
    messages=[{"role": "user", "content": "Introduce yourself in one sentence."}],
)

# === Self-check ===
text = r.choices[0].message.content
print("Response:", text)
print("usage:", r.usage)

assert r.choices[0].finish_reason in ("stop", "length"), f"unexpected finish_reason: {r.choices[0].finish_reason}"
assert len(text) > 0, "response should not be empty"
assert r.usage.completion_tokens > 0, "output token count should be > 0"
print("✅ Exercise 1 passed — local Ollama gemma4:e4b answered for $0")

**How slow?** Gemma 4B on CPU: ~5-30 s/answer; on GPU (RTX 3060+): <2 s. For speed use `gemma3:1b`; for quality use `qwen2.5:14b` / `llama3.3:8b` (needs 8 GB+ VRAM).

📋 Starter code — Path B (Anthropic API, optional, when you want cloud quality) (copy to practice_1_anthropic.py)

# Requires: pip install anthropic
# Env: export ANTHROPIC_API_KEY=sk-ant-...
import sys
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

import anthropic

client = anthropic.Anthropic()
msg = client.messages.create(
    model="claude-haiku-4-5",  # haiku = cheapest; switch to sonnet by changing this line
    max_tokens=100,
    messages=[{"role": "user", "content": "Introduce yourself in one sentence."}],
)

# === Self-check ===
text = msg.content[0].text
print("Response:", text)
print("usage:", msg.usage)

assert msg.stop_reason in ("end_turn", "max_tokens"), f"unexpected stop_reason: {msg.stop_reason}"
assert len(text) > 0, "response should not be empty"
assert msg.usage.input_tokens > 0 and msg.usage.output_tokens > 0, "token counts should be > 0"
print("✅ Exercise 1 passed — Anthropic API is reachable from your machine")

**Cost**: ~$0.001/run (haiku) or ~$0.004/run (sonnet); this hello-world is also 5-15× faster than Ollama.

Exercise 2: Tokens¶

Run the same prompt 100 times and watch token counts vary. - Notice: temperature ≠ 0 produces variation - Notice: token count for the SAME English vs Chinese sentence

📋 Starter code — Path A (local Ollama gemma4:e4b, default) (copy to practice_2.py)

# Requires: pip install openai
# Pre-req: ollama pull gemma4:e4b && ollama serve
import sys, statistics
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

PROMPTS = {
    "Chinese": "用一句話描述一隻貓在做什麼。",
    "English": "Describe in one sentence what a cat is doing.",
}

N = 10  # local is slower; start small
for label, prompt in PROMPTS.items():
    output_tokens = []
    for _ in range(N):
        r = client.chat.completions.create(
            model="gemma4:e4b",
            max_tokens=80,
            temperature=1.0,  # high temp to amplify variance
            messages=[{"role": "user", "content": prompt}],
        )
        output_tokens.append(r.usage.completion_tokens)
    print(f"\n[{label}] prompt: {prompt}")
    print(f"  input tokens: {r.usage.prompt_tokens}")
    print(f"  output tokens — min={min(output_tokens)} max={max(output_tokens)} mean={statistics.mean(output_tokens):.1f} stdev={statistics.stdev(output_tokens):.1f}")

# === Self-check ===
assert max(output_tokens) > min(output_tokens), "with temperature=1.0, output length should vary"
print("\n✅ Exercise 2 passed — observed temperature → token variance, $0/run")
print("💡 Chinese prompts typically use MORE input tokens (one Chinese character ≈ 2 tokens)")

📋 Starter code — Path B (Anthropic API, optional) (copy to practice_2_anthropic.py)

# Requires: pip install anthropic
import sys, statistics
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

import anthropic
client = anthropic.Anthropic()
PROMPTS = {"Chinese": "用一句話描述一隻貓在做什麼。", "English": "Describe in one sentence what a cat is doing."}

for label, prompt in PROMPTS.items():
    output_tokens = []
    for _ in range(20):
        msg = client.messages.create(model="claude-haiku-4-5", max_tokens=80, temperature=1.0,
                                     messages=[{"role": "user", "content": prompt}])
        output_tokens.append(msg.usage.output_tokens)
    print(f"[{label}] input={msg.usage.input_tokens} output min/max/mean={min(output_tokens)}/{max(output_tokens)}/{sum(output_tokens)/len(output_tokens):.1f}")

**Key SDK diffs**: `messages.create` → `chat.completions.create`; `usage.output_tokens` → `usage.completion_tokens`; `usage.input_tokens` → `usage.prompt_tokens`. **Cost**: 40 runs ≈ $0.01.

Exercise 3: Pricing / Latency¶

Cost-sensitive work required: compute how long and how much it takes to run 1000 hello-world inferences. Local Ollama is $0 but has latency cost; cloud LLMs cost money but are faster. Knowing this trade-off is how you pick the right model.

📋 Starter code — Path A (local Ollama gemma4:e4b, measure latency) (copy to practice_3.py)

# Requires: pip install openai
# Pre-req: ollama pull gemma4:e4b && ollama serve
import sys, time
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

latencies = []
for _ in range(5):
    t0 = time.time()
    r = client.chat.completions.create(
        model="gemma4:e4b",
        max_tokens=200,
        messages=[{"role": "user", "content": "Hi! Please introduce yourself."}],
    )
    latencies.append(time.time() - t0)

avg_latency = sum(latencies) / len(latencies)
out_tok_avg = r.usage.completion_tokens
tps = out_tok_avg / avg_latency if avg_latency > 0 else 0

print(f"model: gemma4:e4b (local)")
print(f"5 latencies (sec): min={min(latencies):.2f} max={max(latencies):.2f} mean={avg_latency:.2f}")
print(f"avg output: {out_tok_avg} tokens, ~{tps:.1f} tokens/sec")
print(f"\n1000-run cost: $0 (local); projected duration: {avg_latency * 1000 / 60:.1f} minutes")

# === Self-check ===
assert avg_latency > 0, "latency should be > 0"
assert out_tok_avg > 0, "output token count should be > 0"
print(f"\n✅ Exercise 3 passed — local model is $0 but takes ~{avg_latency * 1000 / 60:.0f} min for 1000 runs")
print("💡 Compare Path B Anthropic: 1000 runs is ~10-20 min at $0.25 (haiku)")

📋 Starter code — Path B (Anthropic API, compute $ cost) (copy to practice_3_anthropic.py)

# Requires: pip install anthropic
import sys
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

import anthropic

# Anthropic public pricing 2026 Q2 (per 1M tokens, USD) — verify at https://www.anthropic.com/pricing
PRICING = {
    "claude-haiku-4-5":   {"input": 1.00, "output":  5.00},
    "claude-sonnet-5":    {"input": 3.00, "output": 15.00},
    "claude-opus-4-8":    {"input": 5.00, "output": 25.00},  # Opus 4.8 (May 2026, Dynamic Workflows) — same 5/25 pricing
    "claude-fable-5":     {"input": 10.00, "output": 50.00},  # Fable 5 (Mythos-class, GA 2026-06-09; suspended 2026-06-12, unavailable) ~2x Opus 4.8
}

client = anthropic.Anthropic()
MODEL = "claude-haiku-4-5"
msg = client.messages.create(model=MODEL, max_tokens=200,
                             messages=[{"role": "user", "content": "Hi! Please introduce yourself."}])
in_tok, out_tok = msg.usage.input_tokens, msg.usage.output_tokens
rates = PRICING[MODEL]
cost_one = (in_tok * rates["input"] + out_tok * rates["output"]) / 1_000_000

print(f"model: {MODEL}")
print(f"single: input={in_tok} output={out_tok} → ${cost_one:.6f}")
print(f"1000 calls cost across model tiers:")
for name, r in PRICING.items():
    c = (in_tok * r["input"] + out_tok * r["output"]) / 1_000_000 * 1000
    print(f"  {name:<22} ${c:.4f}")

assert cost_one > 0, "Cloud LLM always has a cost"
print(f"\n✅ Exercise 3 passed (Anthropic) — 1000 runs: haiku ≈ $0.25, sonnet 5 ≈ $0.76, opus 4.8 ≈ $1.27")

**Expected output**:

model: claude-haiku-4-5
single: input=14 output=48 → $0.000254
1000 calls cost across model tiers:
  claude-haiku-4-5       $0.2540
  claude-sonnet-5        $0.7620
  claude-opus-4-8        $1.2700

**Trade-off**: local Ollama is $0 for 1000 runs but takes ~2 hr; Anthropic haiku is ~10 min for $0.25; sonnet ~10 min for $0.76. **Use cloud only for production; learning / experiments / debug stay local.**

Exercise 4: Cross-Provider Comparison¶

Send the same prompt to Claude, GPT, and Gemini simultaneously, compare their responses. Notice "why does the same input produce different answers" — answer style, length, and judgment all differ. Use the OpenAI, Anthropic, and Google SDKs side-by-side.

→ Starter template → examples/stage-1/04-cross-provider/ (parallel calls to all three SDKs + comparison table; missing keys are skipped gracefully; illustrative, not a chapter-length tutorial)

Exercise 5: Error Handling¶

Trigger error conditions deliberately and write retry logic: - Wrong API key → see how it raises - Over-long prompt → what happens when the context window is full - Network drop → write a retry wrapper with exponential backoff

This is foundational for Stage 3-8's production agent code.

→ Starter template → examples/stage-1/05-error-handling/ (mock-based tests so you can verify the retry logic without unplugging your ethernet cable; illustrative, not a chapter-length tutorial)

Exercise 6: Local LLM¶

No API fees, runs on your machine: use Ollama to pull a small model (recommend llama3.2:3b or qwen2.5:3b), call it via OpenAI-compatible API.

# 1. Install Ollama: https://ollama.com
ollama pull qwen2.5:3b
ollama serve  # default port 11434

📋 Starter code (copy to practice_6.py)

# Requires: pip install openai
# Pre-req: Ollama is running, qwen2.5:3b is pulled
import sys
if hasattr(sys.stdout, "reconfigure"):
    sys.stdout.reconfigure(encoding="utf-8", errors="replace")

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",  # Ollama doesn't check this — anything works
)

r = client.chat.completions.create(
    model="qwen2.5:3b",
    messages=[{"role": "user", "content": "Explain ReAct in 3 sentences."}],
)

text = r.choices[0].message.content
print("Response:", text)

# === Self-check ===
assert len(text) > 10, "response is too short — Ollama may not be running"
print(f"✅ Exercise 6 passed — local Ollama reachable through the OpenAI-compatible API")
print(f"💡 This run cost you $0 (except for electricity)")

**Why do this**: once you can run local LLMs, Stage 3-6 experiments aren't bottlenecked on API costs; privacy-sensitive work also stays offline.

🎯 Curated Projects¶

5 categories, 17 projects in one table. Pick by "Best for"; click through for depth on the repo / course site.

Category	Project	⭐	Best for	Why / Notes
Official cookbook / starting point	Anthropic Cookbook	⭐⭐⭐⭐⭐	Starting with Claude API; reference lookup	Full-feature Claude API notebooks (tool use / batch / prompt cache), ★ 42k+, MIT
	Anthropic Courses	⭐⭐⭐⭐⭐	Systematic Claude learning from zero	Anthropic's own 5-course set (API fundamentals / prompt eval / real-world prompting / tool use), ★ 21k+. Start with `anthropic_api_fundamentals`
	OpenAI Cookbook	⭐⭐⭐⭐⭐	OpenAI API + structured output / function calling	Pair with Anthropic Cookbook, ★ 73k+, MIT. Much bigger than Anthropic's — use search
	Anthropic Claude API Quickstart	⭐⭐⭐⭐	5-minute start	Official docs, bookmark it
Chinese textbook (chapter-style)	datawhalechina/happy-llm	⭐⭐⭐⭐⭐	Chinese readers wanting LLM internals	Karpathy "Zero to Hero" Chinese counterpart, ★ 29k+. Equivalent to HF LLM Course in Chinese
	datawhalechina/llm-universe	⭐⭐⭐⭐⭐	Chinese newcomers building with LLM	API basics / knowledge base / RAG / advanced tricks, ★ 13k+
	datawhalechina/llm-cookbook	⭐⭐⭐⭐	Full Chinese LLM learning path	Adapted Chinese translation of Andrew Ng's courses (⚠️ updates slowed after 2025-06, CC BY-NC-SA)
	jingyaogong/minimind	⭐⭐⭐⭐	Post-Karpathy, want a real training run	2hr to train a 64M LLM from scratch — Pretrain + SFT + LoRA + DPO + RLHF, ★ 48k+, Apache-2.0
English course (systematic)	HuggingFace — LLM Course	⭐⭐⭐⭐⭐	Transformer internals + HF ecosystem	Transformer theory + applications, Apache 2.0
	LangChain Academy	⭐⭐⭐⭐	Visual learners who like video courses	LangChain's official free course, includes RAG / agent. Skip the LangChain marketing segments
Local execution (no API costs)	ollama/ollama	⭐⭐⭐⭐⭐	First-time local LLM	This repo's Path A default, OpenAI-compat API, ★ 170k+
	ggml-org/llama.cpp	⭐⭐⭐⭐⭐	Understanding quantization / how 7B fits in 8GB RAM	Ollama's underlying inference engine, ★ 108k+, MIT
	mudler/LocalAI	⭐⭐⭐⭐	Team compliance, self-host full OpenAI replacement	Drop-in OpenAI API replacement (chat / embedding / image / TTS / STT), ★ 46k+
	ml-explore/mlx	⭐⭐⭐⭐	Mac dev, squeeze Apple Silicon	Apple's ML framework for M1+, ★ 25k+. Pair with `mlx-lm` for ease
Build from scratch (understand internals)	karpathy — Let's build GPT from scratch	⭐⭐⭐⭐⭐	Understand LLM internals, not just API calls	2hr high-density video, build GPT in PyTorch from scratch. Pause and code along, don't passive-watch
	rasbt/LLMs-from-scratch	⭐⭐⭐⭐⭐	Book-pace read of the same material	Book version of Karpathy's video: tokenizer → attention → pretraining → finetuning, ★ 91k+, Apache-2.0
	karpathy/LLM101n	⭐⭐	Historical reference	⚠️ Archived (2024-08), outline only, course never finished. Watch "Build GPT from scratch" above instead

💡 Suggested reading order: API-first → Anthropic / OpenAI Cookbook · Chinese systematic path → happy-llm + llm-universe · deep internals → Karpathy video + rasbt book with code · local-only → start with Ollama, then llama.cpp.

✅ Self-Check Before Stage 2¶

Can you:

Make a Claude API call from Python in 5 lines
Explain why "你好" might use 2 tokens but "Hello" uses 1
Quote roughly the per-token price for Claude Sonnet vs Opus
Name one strength of Claude vs GPT vs Gemini vs Llama

If yes → proceed to Stage 2 — Prompt Engineering.

If no → re-read the Anthropic Quickstart + run all 3 hello-X projects above.

✅ Done with Stage 1? Next, Stage 2 — Prompt Engineering takes 5-12 hours to walk you through writing reusable structured prompts, using few-shot and chain-of-thought for reasoning tasks, and learning to quantify prompt improvement with evals. Keep going →