JuliusBrussee/caveman: a skill that makes your AI agent talk in fewer tokens

caveman is a skill with a joke for a tagline (“why use many token when few token do trick”) and a real engineering point underneath. It rewrites how your AI agent talks: drop articles, filler, and pleasantries, keep the technical content, and you spend far fewer output tokens per reply. The brain stays big, the mouth gets small. It installs into Claude Code and 30+ other agents, and despite the meme branding it ships benchmarks, versioned releases, and CI. This page separates the measured reality from the meme.

What it actually does

caveman is a system-prompt rule set with levels, not a model change:

/caveman lite removes filler but keeps grammar.
/caveman full (the default) drops articles, uses short synonyms and fragments.
/caveman ultra abbreviates domain terms (DB, auth, config) and uses arrows for cause and effect.
/caveman wenyan goes to an extreme classical-Chinese-style terseness.

Beyond reply style, it ships companion commands: /caveman-commit for terse commit messages, /caveman-review for one-line PR comments, /caveman-stats for running token-savings receipts, and /caveman-compress <file> to rewrite a CLAUDE.md or notes file into terse form. There is also caveman-shrink, an MCP middleware that compresses tool descriptions.

The numbers, honestly

The README publishes a benchmark (10 tasks) showing roughly 65% fewer output tokens on average (about 1,214 down to 294), with a per-task range from the low 20s to high 80s percent. caveman-compress reports around 46% fewer input tokens on memory files. Two honest qualifiers the project itself states: code blocks are not compressed (they stay byte-accurate), and reasoning or thinking tokens are unaffected. So the saving is on the visible answer, not the model’s internal work. It cited a 2026 paper arguing brevity constraints can even improve accuracy, which is an interesting angle but should be read as a claim, not settled fact.

Install

# One-liner, auto-detects agents (macOS / Linux / WSL / Git Bash)
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows PowerShell
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

# Claude Code plugin
claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman

It needs Node >= 18. On Claude Code, Gemini CLI, opencode, and OpenClaw it auto-activates; on Cursor, Windsurf, and Copilot you toggle it per session with /caveman (or --with-init to keep it always on).

When it fits, and when it does not

It fits high-volume agent use where you pay per token and the answers can afford to be terse: code review comments, commit messages, quick technical Q&A. It fits less well when you want the agent to teach or explain at length, where terseness costs clarity. And it does not make the agent smarter or cheaper to think; it only shortens what it says. Read it as a cost lever for output, applied where verbosity was waste.

How it compares

Project	What it compresses	Stars (2026-06)
JuliusBrussee/caveman	Agent reply style (output tokens)	~71k
chopratejas/headroom	Tool outputs, logs, RAG chunks (input)	~22k
RyanCodrai/turbovec	Vector index efficiency	~11k

These are complementary, not rivals. caveman trims what the agent says; headroom trims what goes into the agent (tool outputs and context); turbovec works at the vector layer. A token-conscious setup could use more than one.

Gotchas from the issue tracker

caveman moved fast (71k stars within about two months of its 2026-04 creation) and the open issues reflect cross-agent integration pain more than core flaws:

On VS Code and Copilot, users reported no actual token or credit reduction (#506), so the savings are agent-dependent.
OpenCode installs broke on a missing command file and a schema mismatch (#494, #491).
Gemini and Antigravity compatibility issues surfaced (#492, #497).
A plugin SessionStart hook could silently fall back to a minimal rule set due to a wrong SKILL.md path (#507).

The pattern: the savings are real where the skill actually loads and activates, and activation across 30+ agents is the fragile part. Verify with /caveman-stats that it is doing anything on your specific agent.

FAQ

Is caveman free? Yes. caveman is MIT-licensed and installs as a skill or plugin, with no paid tier.

Does caveman actually save tokens? Yes on output, about 65% fewer on average in its 10-task benchmark, but it is agent-dependent: users reported no real reduction on VS Code and Copilot (#506), while Claude Code, Gemini CLI, opencode, and OpenClaw auto-activate it. Confirm with /caveman-stats.

Does caveman make the agent dumber? No. caveman shortens what the agent says, not how it thinks; code blocks stay byte-accurate and reasoning tokens are untouched. The trade is prose clarity, not correctness, so use lite when explanation matters.

How do I install caveman in Claude Code? Run claude plugin marketplace add JuliusBrussee/caveman && claude plugin install caveman@caveman, or the cross-agent curl ... install.sh | bash one-liner.

For input-side savings see chopratejas/headroom; for the vector layer see RyanCodrai/turbovec. All three live in the same token-efficiency space that grew alongside expensive frontier models.

JuliusBrussee/caveman: a skill that makes your AI agent talk in fewer tokens

Star growth

What it actually does

The numbers, honestly

Install

When it fits, and when it does not

How it compares

Gotchas from the issue tracker

FAQ

Repository data

JuliusBrussee/caveman: a skill that makes your AI agent talk in fewer tokens

Star growth

What it actually does

The numbers, honestly

Install

When it fits, and when it does not

How it compares

Gotchas from the issue tracker

FAQ

Related reading

Repository data