improve is an agent skill from shadcn that audits a codebase and writes implementation plans, and never writes the code itself. The plan is the product. The premise is an economic one: spend your most capable model on the part where intelligence compounds, which is understanding the repo, judging what is worth doing, and specifying it, then hand the mechanical execution to a cheaper model or another agent.
The idea, and why it is interesting
Most coding-agent workflows use one expensive model for everything, including the rote edits where a smaller model would do fine. improve splits that. The advisor model does recon, audit, and planning. The executor, which can be a much smaller model, follows a spec. The flow is three roles:
you → /improve (expensive model, advises)
plans/ → 001-fix-n-plus-one.md (self-contained specs)
other agent → implements, tests, ships (cheap model, executes)
Whether this saves money depends on your pricing. The win is real when the gap between your advisor and executor model tiers is large, and smaller when you would have used one mid-tier model anyway. The more durable benefit is structural: separating “decide what to do” from “do it” produces a reviewable artifact in between, which is something a single end-to-end agent run never gives you.
How a run works
The pipeline has five stages, and the design choices in each are what distinguish it from a generic “review my code” prompt.
Recon. It maps the stack and conventions, and extracts the repo’s exact build, test, and lint commands, which become verification gates in every plan. It also ingests intent docs when present, such as ADRs under docs/adr/, PRDs, CONTEXT.md, DESIGN.md, and PRODUCT.md, so decided tradeoffs are not re-flagged as findings and suggestions stay grounded in stated product direction.
Audit. Parallel subagents fan out across nine categories: correctness, security, performance, test coverage, tech debt, dependencies and migrations, developer experience, docs, and direction. Every finding has to carry file:line evidence, impact, effort, and confidence. Direction suggestions must cite evidence from the repo, not generic ideas.
Vet. This is the stage most review tools skip. Subagents over-report, so the advisor re-reads every cited location itself before showing you anything. False positives get dropped, wrong attributions corrected, and rejections recorded with a reason so they do not resurface next run.
Prioritize and plan. Findings land in a table ranked by payoff (impact divided by effort, weighted by confidence). You pick which become plans. Each selected finding becomes one file in plans/, with an index, a priority order, and a dependency graph.
What makes the plans executable
The plans are written for the weakest plausible executor, a model that never saw the advisor session and may be much smaller. Three properties carry that load. Plans are self-contained, with file paths, current-state code excerpts, repo conventions, and verified commands all inlined, so there is no “as discussed above”. Every step ends with a command and its expected output, so done criteria are machine-checkable and the executor never has to judge its own success. And plans set hard boundaries: explicit out-of-scope lists and STOP conditions for when reality does not match the spec, instead of letting a small model improvise. Each plan also stamps the git commit it was written against, so an executor runs a drift check before touching anything.
Closing the loop
Plans are not fire-and-forget. /improve execute <plan> spawns a cheaper executor in an isolated git worktree, hands it the plan, then reviews the result the way a tech lead would: it re-runs every done criterion, checks scope compliance, and reads the diff against intent. The verdict is approve, send back for revision (capped at two rounds), or block and refine the plan. Merging is always your call. /improve reconcile processes what changed since last time, verifying landed plans, rewriting blocked ones around the obstacle, and retiring findings that got fixed independently. --issues publishes plans as GitHub issues so any agent or human can pick them up.
The safety model
The hard rules are the reason you can point this at a real repo. It never modifies source itself; the only writes go to plans/. Executors edit only in disposable worktrees, and merging is always yours. It never runs commands that mutate your working tree, staying to read, search, and read-only analysis. It never reproduces secret values, reporting locations and credential types only and always recommending rotation. Asked to implement directly, it declines and points you at the plan or execute. For a tool that runs autonomous audits, those constraints are the difference between useful and dangerous.
Install
npx skills add shadcn/improve
It works in any agent that supports the Agent Skills format. Because the plans it writes are plain Markdown, any agent or human can pick one up, even one that has never run improve.
Commands worth knowing
/improveruns the full audit;/improve quickis a cheap hotspot pass;/improve deepis exhaustive./improve security(alsoperf,tests,bugs) scopes the audit to one category./improve branchaudits only what the current branch changes, which is the natural pre-PR check./improve nextsuggests where to take the project, with evidence required./improve plan <description>skips the audit and specs one thing;/improve review-plan <file>critiques an existing plan.
Where it fits and where it does not
Use improve when you have a real codebase with accumulated debt and you want a prioritized, reviewable backlog rather than an agent that starts editing immediately. It is strong before a release, on a repo you inherited, or when you want to drive down compute cost by reserving the expensive model for judgment.
It is a weaker fit for greenfield work, where there is little to audit yet, and for tiny scripts, where the planning overhead outweighs the task. The plan style is deliberately verbose because it targets a weak executor, so for trivial fixes the spec can be longer than the change. And the executor still caps quality: a plan handed to a model too small to follow it will fail the verification gates, which is the intended outcome but still a failure you have to handle.
How it compares
Star counts are as of 2026-06.
| Repo | Stars | Angle |
|---|---|---|
| improve | ~4.7k | Audits existing code, vets findings, plans for a cheaper executor |
| github/spec-kit | ~112k | Spec-driven development, spec first then generate, greenfield-leaning |
| DanMcInerney/architect-loop | ~430 | Architect-and-executor planning loop for agents |
| obra/superpowers | ~227k | Broad agentic methodology and skills framework |
The clearest contrast is with spec-kit. spec-kit starts from a spec you write and scaffolds forward, which suits new work. improve starts from code that already exists, finds what is worth changing, and writes the spec for you. architect-loop is closer in spirit, separating an architect from an executor, but improve adds the audit front-end and the read-only safety guarantees.
Star history
The curve is a sharp launch in June 2026 with the author’s existing audience behind it, which explains the fast start. As with any days-old skill, the shape so far is attention, and durability needs a longer window to read.
Related repositories
- github/spec-kit for spec-driven development from the other direction.
- DanMcInerney/architect-loop, a closely related architect-and-executor pattern.
- obra/superpowers and addyosmani/agent-skills for broader skill frameworks.
- anthropics/skills for the official Agent Skills examples.
FAQ
Does improve modify my code?
No. It only writes to plans/. The execute command runs a cheaper model in a disposable git worktree, and merging is always your decision. The advisor itself runs read-only analysis and never mutates your working tree.
Which agents support it?
Any host that implements the Agent Skills format, listed at agentskills.io. Install with npx skills add shadcn/improve. Because plans are plain Markdown, even an agent that cannot run the skill can execute a plan it produced.
How is improve different from spec-kit? spec-kit is spec-driven development: you write a specification and it generates from there, which leans toward new projects. improve works backward from an existing codebase, audits it, vets the findings, and writes the specs itself. They meet in the middle but start from opposite ends.
What do quick, deep, and security do?
quick is a cheap pass over hotspots and top findings, deep is an exhaustive sweep across every package and category, and security (like perf or tests) scopes the whole audit to one category. Start with quick to keep cost down, escalate when a finding warrants it.
Is it free? Yes, MIT licensed. The skill is free; your cost is the model tokens it spends on auditing and planning, which the cheap-executor split is designed to reduce.