Skip to content

A3 — Integration & Production

繁體中文 | 简体中文 | English

← A2 — CLI Workflow Patterns · Track A: CLI Power User — Stop 3 (final)

Time estimate: 1-2 weeks (~8-15 hours)

📋 Chapter structure: Learning goals → Entry conditions → Required reading → Hands-on exercises → Curated Projects → Self-check 🔑 Key terms (used in this chapter): - Required here: MCP (connect CLI to external data / tools), CI (run checks automatically on every push) - Further-reading terms: observability (trace CLI behavior), eval (measure CLI quality), prompt caching (reduce repeated-context cost), cost tracking (record token spend)

Full definitions: resources/glossary.en.md 5 + 6

After your CLI runs smoothly, the next step is to wire the CLI into your real team workflow. This stop does 3 things:

  1. Tool connection — MCP servers connect the CLI to Slack / Gmail / your internal API
  2. Automated checks — CI (GitHub Actions) runs CLI review on every PR
  3. Cost and logs — observability tools track cost / latency for each task

After this stop, the CLI is no longer just your personal tool — it's part of your team's workflow.

📌 Learning Goals

  • Connect 1-3 MCP servers to your CLI (Slack / Gmail / internal API / DB)
  • Set up GitHub Actions to auto-run Claude Code (PR review, release notes, etc.)
  • Add observability (trace, cost, latency) to CLI workflows
  • Plan a cost budget — know roughly what a big task costs in tokens

📚 Required Reading

  1. Stage 5.2 — MCP (Model Context Protocol) — MCP concept and basics
  2. Anthropic — Prompt Caching — can significantly reduce repeated context cost under cache-eligible conditions (unchanged context, ≤5-minute reuse window, etc.); actual savings depend on the workflow, so use the official article's conditions as the reference
  3. Stage 7 — Observability section — langfuse / Helicone / weave
  4. resources/cli-agents-guide.en.md "Common pitfalls" — most common production issues with CLIs

🛠 Hands-on Exercises

Exercise CLI-9: MCP server connected to CLI

Following Stage 5.2 Exercise: MCP client, connect at least one useful MCP server to your CLI: - filesystem server → let the CLI read files outside its default scope - github server → let it read PRs / issues directly - Custom server → connect your internal API / DB

Success: in a CLI conversation, ask "does my PR have conflicts?" and have the CLI answer via MCP (without you opening a browser).

Exercise CLI-10: GitHub Actions + CLI

Write .github/workflows/cli-review.yml: - Trigger: PR opened / synchronize - Run: in the GH Actions runner, execute Claude Code (or Codex), feed it git diff + your .claude/commands/review.md - Output: PR comment

Success: open a new PR, see a review comment within 1-2 minutes.

Starting points: Anthropic's official claude-code-action; Codex has GitHub App and CLI modes.

Exercise CLI-11: Cost tracking

Run a daily task. Predict the token usage first, then actually run it and check the usage. The gap is usually big (you typically underestimate). - Math: input tokens + output tokens × model price each - Connect langfuse or Helicone (Stage 7 Observability) for tracing - Observe: which sub-task consumes the most tokens? Are you sending unnecessary long context?

Exercise CLI-12: Skill / plugin team sharing

Package your .claude/commands/ and CLAUDE.md into a plugin, publish to internal marketplace or GitHub. Teammates claude plugin install and get the same workflow. - Skill / plugin details in Stage 5.3 + 5.4 - Template: anthropics/claude-plugins-official

🧭 Advanced Concepts in Daily CLI Work (7 Playbooks) 🆕

Track A users are already using Stage 7.5 advanced concepts — they just have not named them yet. Pick the 2-3 playbooks you use most often and treat the rest as further reading — each in ≤ 6 lines. Want the deeper theory → go to Stage 7.5.

📌 Rule: after each playbook, ask yourself "will I do something differently in the next PR?" Yes → applied; No → skip to the next one.

📋 Playbook 1: Scope unclear, agent overreaches

  • When: You send Codex/Gemini on a sweep and are not sure whether it will silently touch unrelated files (the F11/F12 kind of failure)
  • Do: At the top of the brief, state "change X / do not cross Y" explicitly; add a path filter to the acceptance preset
  • Concepts: Work Boundary + Hierarchical Task Decomposition · 📊 See concept-cluster, Service × orchestration cluster
  • Read more:
Source Link
HumanLayer Writing a good CLAUDE.md
Anthropic How Anthropic teams use Claude Code (PDF)
Internal Stage 7.5 🧭 work boundary stack

📋 Playbook 2: Multi-agent parallel runs, results conflict

  • When: Claude planner + 2-3 Codex agents run in parallel and the merge ends up with conflicts / drift
  • Do: Give each agent its own commit; use a reviewer pattern to catch drift (not one giant merge); standardize the brief format + result.json schema
  • Concepts: Contract Hand-offs + Speculative Parallel · 📊 See concept-cluster, Service × orchestration + Types × orchestration
  • Read more:
Source Link
Addy Osmani Code Agent Orchestra
Daniel Vaughan Running Multiple Codex Agents Parallel
Internal agent-collab-skills (agent-task-splitter + agent-output-reconciler)

📋 Playbook 3: Reviewing agent output

  • When: An agent finished the PR, you do not want to merge it blindly, and human review cannot keep up with the throughput
  • Do: Add an LLM-as-judge subagent for automatic evaluation (binary pass/fail); humans only spot-check edge cases; run the acceptance-gate preset before commit
  • Concepts: Agent-as-Judge + Plan-Act-Reflect · 📊 See reading-decision-tree, blue eval branch
  • Read more:
Source Link
Hamel Husain LLM-as-a-Judge: Complete Guide
Hamel Husain Your AI Product Needs Evals
Simon Willison Sub-agents in Claude Code

📋 Playbook 4: Dispatching subagents for independent tasks

💡 First time hearing about subagents? In one sentence: a subagent is a “child Claude” spawned from the main Claude session. It has its own isolated context and reports back when done. Dispatch means asking the subagent to do work, like assigning a task to a teammate. Full concept → Stage 5.5.

  • When: before committing a large change / entering an unfamiliar repo / running an LLM-as-judge auto-eval / applying the same review to 4 targets
  • Do: invoke Claude Code built-in subagents (no custom file required):
  • code-reviewer — review staged diff, find bugs + security issues
  • Explore — read-only codebase search, find entry points / symbols
  • Plan — design a step-by-step implementation plan
  • general-purpose — fallback when you are unsure which one to use, or for multi-step research
  • Concepts: Hierarchical Task Decomposition + Context Isolation · 📊 See concept-cluster, Service × orchestration cluster
  • Read more:
  • Stage 5.5 Subagents (full theory + decision table)
  • resources/subagent-cookbook.en.md (15 recipes with copy-paste prompt templates)

📋 Playbook 5: Running CLI agent in CI

  • When: You wire codex exec / claude --print into GitHub Actions, cannot require a human to hit yes every time, and bandwidth constraints mean you cannot always use Opus
  • Do: Use layered autonomy (preset auto-runs / commit requires review / push requires human sign-off); set a fallback cheaper model (if Opus is down, fall back to Haiku)
  • Concepts: Autonomy Gradients + Graceful Degradation · 📊 See concept-cluster, Config × governance cluster
  • Read more:
Source Link
Anthropic How Anthropic teams use Claude Code (PDF)
Anthropic Engineering Equipping Agents with Skills
Internal Stage 5.5 Subagents + Exercise CLI-10

📋 Playbook 6: Controlling cost

  • When: You use Codex for a large batch of work, the monthly API bill is getting out of control, and you want to stay inside budget
  • Do: Set max_cost_usd in plan.yml; use a cheap model (Haiku) for exploration and an expensive model (Opus) only for polish; turn on prompt caching (can significantly reduce repeated context cost under cache-eligible conditions); automate QA instead of spending human time
  • Concepts: Cost-aware Budget Gates + Throughput-Merge Philosophy · 📊 See concept-cluster, Config × resilience cluster
  • Read more:
Source Link
Simon Willison Sub-agents
Anthropic Prompt Caching
Internal This stage's Exercise CLI-11 (token tracking + langfuse integration)

📋 Playbook 7: Hardening workflow, preventing drift

  • When: You wrote rules in CLAUDE.md / SKILL.md but nobody enforces them, or you added a preset YAML and do not know whether it actually works
  • Do: Intentionally break one rule and run the acceptance gate to see whether it catches it (chaos test); treat docs/ as the single source of truth and keep CLAUDE.md as an entry map only
  • Concepts: Failure Injection + System of Record · 📊 See failure-lifecycle (the F11-F14 evolution loop)
  • Read more:
Source Link
HumanLayer Writing a good CLAUDE.md
agent-collab-skills observed-failure-modes.md
Internal Stage 7.5 🔁 failure-mode lifecycle

7 playbooks = a bridge from 7 triggers to 12 concepts and the corresponding reading sources. Want the underlying theory / the full set of 12 concepts / all 8 cross-vendor principles → Stage 7.5.

🎯 Curated Projects

MCP server collection (CLI-friendly)

💡 Looking for MCPs that connect to daily tools (Notion / Obsidian / Excel / Postgres / Playwright / Slack / Linear / Figma…): see resources/mcp-skills-catalog.en.md — 62 entries grouped by category, each with stars / license / audience. The list below is for "writing your own MCP server / finding reference implementations".

modelcontextprotocol/servers ⭐⭐⭐⭐⭐

★ 85k+ — Official reference servers. filesystem, github, sqlite, git, time, fetch, memory, sequential-thinking.

See Stage 5.2.

wong2/awesome-mcp-servers

Community MCP server catalog. 150+ servers categorized.


CI Integration Patterns

anthropics/claude-code-action

Official GitHub Action template. PR review, issue triage, auto-fix.

continuedev/continue ⭐⭐⭐⭐

★ 33k+ — Wire AI checks into CI; enforce in PR pipeline.

Full intro in branches/for-developer.en.md.


Observability + Cost

langfuse/langfuse ⭐⭐⭐⭐⭐

★ 26k+ — Open-source LLM observability. Trace, cost, sessions in one place.

See Stage 7 Observability.

Helicone ⭐⭐⭐⭐

★ 5k+ — Proxy-based monitoring. Just change base_url and you get logging + caching.

promptfoo/promptfoo ⭐⭐⭐⭐⭐

★ 20k+ — Eval framework. Run regression tests before promoting CLI workflows to production.

See Stage 7 Eval.


Production CLI Workflow Templates

obra/superpowers ⭐⭐⭐⭐

★ 178k+ — Production-ready skill collection. See how someone else does a complete CLI workflow.

obra/superpowers-marketplace

★ 900+ — Minimal marketplace template. Reference when packaging your team's CLI workflow.

✅ Track A Full Self-Check

Can you: - [ ] Have at least 1 MCP server connected to your daily CLI - [ ] Have at least 1 CI workflow auto-running a CLI agent - [ ] State the rough token / cost / latency for some specific task you run - [ ] Packaged your CLAUDE.md / commands at least once (even just for yourself) - [ ] Know which tasks deserve observability and which don't

If yes → Track A complete. We recommend continuing to Stage 8 — Agent Interfaces (a shared hub for both tracks: Computer Use / Browser Use / Code Sandbox, ~1-2 weeks from the Track A angle), or pick a specialized branch and continue (researcher / developer / teacher / knowledge-worker / everyday-users).

If you want to go deeper into "how to write your own CLI agent" (not use existing) → jump to Track B Stage 3. Track A and Track B are complementary.

💡 What's Next

After Track A you're a CLI power user. Next phase choices:

  1. Deepen CLI workflow (keep refining your setup)
  2. Subscribe to Anthropic / OpenAI changelogs
  3. Quarterly review of resources/cli-agents-guide.en.md for new tools
  4. Share CLAUDE.md / skills with your team

  5. Cross to Track B (learn to write your own agent)

  6. Stage 3-4: tool use + frameworks
  7. Stage 5: deep dive into Claude Code internals
  8. Stage 7: write your own multi-agent system

  9. Walk a specialized branch (apply CLI to a specific domain)

  10. Researcher / developer / knowledge-worker / teacher / everyday-users
  11. Each branch uses what you learned in Track A