CLI Agents Comparison Guide¶
📌 This is a reference doc (depth comparison, selection logic, pitfalls, recommended setups). First time touching CLI agents, want step-by-step onboarding → see
tracks/cli/A1-cli-intro.en.md(Track A first stop). First want to understand "why does one agent live in a terminal, another in Telegram, another on a Jetson board?" mental model → seeresources/agent-paradigms.en.md(5 agent paradigms). Already using one, want to decide / compare / upgrade → stay here.
A cross-branch reference shared by Track A (A1-A3) + all 5 specialized branches: how to choose between Claude Code / Codex / OpenCode / Gemini CLI / goose / Aider / Hermes Agent? Every branch references CLI agents but no single branch "owns" this comparison, so it lives in resources/.
📋 7 Major Terminal CLI Agents¶
Only terminal-based CLI agents are included. IDE-based agents (Cursor / Cline / Continue) live in for-developer. The first 6 numbers verified via gh api on 2026-05-06; Hermes Agent verified on 2026-05-10.
| Tool | Provider | License | Primary LLM | Auth / Pricing | Stars |
|---|---|---|---|---|---|
| Claude Code | Anthropic (official) | NOASSERTION | Claude | Claude subscription OR Anthropic Console API key | ★ 120k+ |
| Codex | OpenAI (official) | Apache-2.0 | GPT family | ChatGPT account sign-in OR OpenAI API key | ★ 80k+ |
| OpenCode | community (repo now at anomalyco/opencode) |
MIT | Any (multi-provider) | BYO API key, or built-in OpenCode Zen hosted | ★ 155k+ |
| Gemini CLI | Google (official) | Apache-2.0 | Gemini | Generous free tier, paid above quota | ★ 103k+ |
| goose | Agentic AI Foundation (repo now at aaif-goose/goose) |
Apache-2.0 | 15+ providers (incl. Ollama) | BYO API key, or existing Claude / ChatGPT / Gemini subscription via ACP | ★ 43k+ |
| Aider | Aider-AI (community) | Apache-2.0 | Any | BYO API key | ★ 44k+ |
| Hermes Agent | Nous Research | MIT | 200+ via OpenRouter / NVIDIA NIM / Zhipu GLM / Kimi / Xiaomi MiMo / MiniMax / HF / OpenAI | BYO API key (multi-provider) | ★ 156k+ |
🎯 Which to Pick? Decide by Use Case¶
Writing papers / literature / research¶
Top pick: Claude Code (long context, strong reasoning, good hallucination resistance). Gemini CLI is the alternative — its million-token context fits whole-PDF / whole-dataset workflows.
Writing code / refactoring a codebase¶
Top pick: Aider (git-native — auto-commits each change, easy to revert) or Claude Code. OpenCode fits when you need to switch between LLMs.
Privacy / offline / no cloud¶
Top pick: goose or OpenCode + local Ollama. Both support BYO LLM and connect to http://localhost:11434/v1 (Ollama default).
Already subscribed to ChatGPT Plus / Pro¶
Top pick: Codex — same account, no separate billing.
Want 1M-token long context + Google ecosystem¶
Top pick: Gemini CLI. Generous free tier and long context are the differentiators. Note: Google service integration (Gmail / Drive / Docs) goes through MCP extensions, not built-in connectors — same setup pattern as other CLIs.
Want to avoid vendor lock-in¶
Top pick: OpenCode > goose > Aider. None tie you to a specific provider; models are swappable.
First time installing a CLI agent — wanting easiest start¶
Top pick: Claude Code. Broad ecosystem, CLAUDE.md mechanism for version-controlled prompts, plenty of community resources when you hit issues.
Want it running on a cloud VM, talking to it via Telegram / Slack / Discord, with mainland China LLMs as primary¶
Top pick: Hermes Agent. Three differentiators: - Decoupled from your laptop — agent runs on a $5 VPS / Modal serverless / Vercel Sandbox; you message it from Telegram / Discord / Slack / WhatsApp / Signal - Model-neutral — supports GLM / Kimi / Xiaomi MiMo / MiniMax, matching the 11 Chinese-ecosystem catalog entries - Built-in self-improving skill loop + cron scheduler — agent autonomously generates skills from interaction, refines them across sessions, runs scheduled jobs unattended - ⚠️ Self-evolving skills is a frontier feature with no independent audit yet; for production tasks, start with low-stakes experiments
📝 Portable Prompts Across CLIs¶
If you want prompts that work across CLI tools (or want to switch without rewriting), follow these principles:
- Specify file paths explicitly — "modify
src/auth.py" beats "modify that auth file" - Ask for multi-step breakdowns — "first list a plan, then act after I confirm" works in every CLI
- Avoid CLI-specific magic —
/init/compactare Claude-Code-specific; OpenCode doesn't have them - Use
.cursorrules/CLAUDE.md/AGENTS.mdfor persistent preferences — Claude Code readsCLAUDE.md, Codex readsAGENTS.md, OpenCode readsOPENCODE.md, content can be the same - State review scope clearly — "review only my diff" vs "review the whole repo"
Cross-CLI prompts are usually 5-10% more verbose than CLI-specific ones, but the upside is you can switch tools without rewriting.
⚠️ Common Pitfalls¶
File path handling¶
- Windows uses backslashes (
C:\Users\...); most CLIs translate internally but sometimes get confused - Recommendation: in git-bash / WSL use forward slashes, avoid weird quoting
Git integration differences¶
- Aider auto-commits every change (by design, not a bug)
- Claude Code / Codex / OpenCode / goose don't auto-commit by default — manual or via prompt
Default sandbox (each CLI varies; verify against official docs before use)¶
- Claude Code: bash writes default to cwd; reads broader (except deny-rule paths)
- Codex: in version-controlled folders,
Auto(workspace-write + on-request escalation) is recommended; in non-git folders,read-only - goose / OpenCode: relatively permissive — add explicit sandbox / approval rules; don't rely on defaults
Token cost accumulation¶
- Running a
grepon a large codebase can consume 100k+ tokens - Summarizing a long PDF can hit 500k tokens (Gemini handles this; other tools need to be cost-aware)
- Recommendation: estimate cost before each operation; set a monthly cap
Multi-CLI session interference¶
- Two CLIs in the same repo (e.g. Claude Code + Aider) can race-condition file edits
- Recommendation: one repo, one CLI (unless you genuinely need parallelism)
🔧 Real-World Setups¶
Three common combinations; pick one that fits:
Setup A: Claude Code primary + OpenCode backup¶
- Claude Code handles 90% of daily work (code, docs, debug)
- OpenCode + Ollama for privacy-sensitive data (medical, financial)
- One prompt, runs in either
Setup B: Codex (GPT) + Aider (Claude) mix¶
- Codex handles small tasks within ChatGPT Plus quota
- Aider with Claude API key handles big refactors (git-native commit convenient)
- Separate billing, no interference
Setup C: Gemini CLI primary (long-context scenarios)¶
- Whole PDF / whole codebase fed at once
- Add Aider for scenarios needing precise git diff
- Fits scholars, knowledge workers
Setup D: Hermes Agent + Local Ollama (multi-platform + mainland China LLMs + offline)¶
- Hermes Agent runs on a low-cost VPS or your own machine as a multi-platform agent gateway
- LLM endpoint can be Ollama (
http://localhost:11434/v1), or swapped to providers such as z.ai GLM / Kimi - Chat entrypoint can be Telegram / Slack / Discord; Hermes routes platform messages into the agent workflow
- When you want zero Anthropic / OpenAI dependency, this setup fits offline, privacy-sensitive, and low-cost repeat experiments
- Step-by-step walkthrough:
resources/cookbook.mdRecipe 6
Linking Back to Branches¶
Different audiences have different CLI needs:
- for-developer: also see IDE-based agents (Cursor, Cline, Continue)
- for-everyday-users Tier 2: CLI is the advanced option; try Tier 0 / 1 (Web / Desktop App) first
- for-researcher: also see paper-specific tools (paper-qa, gpt-researcher, ChatPaper)
- for-knowledge-worker: also see workflow automation (n8n, Make)
- for-teacher: CLI is advanced for teachers; start with prompt libraries
Maintenance Notes¶
- 7 CLI tools' stars / license / pushed_at refreshed quarterly via
bash scripts/refresh-stars.py - The CLI market moves fast — new tools require evaluation before inclusion (bar: 30k+ stars, actively maintained, true CLI not IDE)
- The comparison table deliberately leaves out "strengths / weaknesses" columns — avoiding subjective bias and letting the use-case section + readers' own judgment do that work