autoship Documentation
An open-source CLI that builds complete software from plain-english user stories — powered by Claude Code.
What is autoship?
autoship is an autonomous software builder. It's an orchestration + evaluation layer on top of Claude Code that takes plain-english requirements, builds complete software autonomously, and iterates until everything works.
You describe what you want in natural language — user stories like "User can sign up with email" or "App has a clean dark mode." autoship hands those stories to Claude Code, evaluates the output against every requirement, feeds failures back, and loops until the scorecard is green. The entire cycle runs unattended.
Think of it as CI/CD for the AI-native era: instead of running tests someone else wrote, autoship writes the software and judges it against your intent. You go grab coffee. When you come back, there's a working app, a scorecard, and a deploy command.
Quick Start
Prerequisites
- Node.js 18+
- Claude Code CLI — installed and authenticated (
npm install -g @anthropic-ai/claude-code)
Install
npm install -g autoship
Your first app in 3 commands
# 1. Describe what you want
autoship new my-app
# 2. Go grab coffee
autoship build
# 3. Ship to production
autoship ship
Commands
autoship provides a small set of composable commands that cover the full lifecycle: create, build, inspect, run, review, and ship.
autoship new <name>
Create a new project. Three modes:
Interactive (recommended):
autoship new my-app
Starts a conversation — autoship asks what you want to build, suggests features, and generates user stories automatically.
With story flags:
autoship new my-app \
-d "A recipe manager" \
-s "User can add recipes" \
"User can search by ingredient" \
"User can filter by cuisine"
From a spec file:
autoship new my-app --spec requirements.md
Reads a markdown file containing your full product spec and extracts user stories from it.
autoship build
Start the autonomous build loop. This is where the magic happens.
autoship build # build in current directory
autoship build --dir ./my-app # build a specific project
autoship build --max-loops 5 # limit iterations
autoship build --no-emergent # skip emergent feature generation
What happens during build:
- Claude Code writes production code in
src/ - autoship evaluates every user story in one batch call
- Produces a scorecard (which stories pass/fail)
- Feeds the scorecard back to Claude Code
- Claude Code fixes failures
- Repeats until all stories pass or max loops reached
- Generates variant implementations for marked stories
- Suggests and auto-builds emergent features
autoship status
Check build progress at any time, even while a build is running in another terminal.
autoship status --dir ./my-app
Example output:
autoship status
Project: my-app
Loop: 3 / 10
Stories: 7 pass, 1 fail, 2 pending
Phase: build
Elapsed: 4m 32s
autoship run
Start the built app locally with the feedback widget active.
autoship run --dir ./my-app
The feedback widget is a small overlay injected into the running app that lets you submit inline feedback ("this button should be bigger", "add a loading spinner here"). Feedback is saved to feedback/ and automatically incorporated in the next autoship build loop.
autoship review
Review emergent features — bonus features that autoship suggested and built after all your original stories passed.
autoship review --dir ./my-app
For each emergent feature, you can keep it (merge into your codebase) or remove it (auto-reverted cleanly). This gives you creative suggestions without any risk to your working build.
autoship ship
Final build for production. Runs a complete pipeline to get your app deploy-ready.
autoship ship --dir ./my-app
What happens:
- Security scan — secrets detection, SAST, dependency audit
- Strip feedback widget — removes the dev-only overlay from production bundles
- Git tag — tags the release as
v1.0.0 - Deploy — runs your configured deploy command or outputs the production bundle
autoship telegram
Configure a Telegram bot for async build updates so you can monitor progress from your phone.
autoship telegram --token BOT_TOKEN --chat-id CHAT_ID --dir ./my-app
Setup: Create a bot via @BotFather on Telegram, get your bot token and chat ID, then run the command above.
What you receive: Loop completion notifications with pass/fail counts, emergent feature proposals, security scan results, and a final "build complete" or "build failed" summary.
autoship web
Start the web dashboard for a visual overview of your project's build status, scorecards, and emergent features.
autoship web --port 3000
User Stories
User stories are the atomic unit of work in autoship. Each story describes one thing the software should do, and autoship evaluates each one independently.
Story types
| Type | Example | How it's evaluated |
|---|---|---|
| feature | "User can sign up with email" | Claude tests the flow end-to-end |
| visual | "App has clean modern design with dark mode" | Scored 0.0–1.0 on 8 design dimensions |
| outcome | "Page loads in under 2 seconds" | Measured against threshold |
Auto-detection: autoship detects the story type from keywords. Stories mentioning "design", "animation", "layout", or "theme" are classified as visual. Stories mentioning "performance", "seconds", "latency", or "size" are classified as outcome. Everything else is feature. You can also set the type explicitly in stories.yaml.
Evaluation
Batch eval — autoship sends all stories to Claude in a single call and receives a structured scorecard back. This is more efficient and produces more consistent scoring than evaluating stories one at a time.
After each loop, a scorecard is saved to evals/scorecards/loop-N.json with the following format:
{
"loop": 3,
"stories": [
{ "id": "signup", "pass": true, "score": 1.0, "reason": "..." },
{ "id": "dark-mode", "pass": false, "score": 0.6, "reason": "..." }
],
"summary": { "total": 7, "pass": 6, "fail": 1 }
}
The scorecard is what drives the build loop: autoship feeds failing stories and their reasons back to Claude Code, which uses this structured feedback to make targeted fixes.
Emergent Features
Once all user stories pass, autoship enters the emergent phase. It analyzes the built application and proposes 2–3 bonus features that complement what's already been built — things like "add keyboard shortcuts", "add export to CSV", or "add a toast notification system."
How they work: Each emergent feature is built on a separate branch. After building, autoship re-evaluates all original stories to make sure nothing regressed. If any original story breaks, the emergent feature is auto-reverted — no human intervention needed.
Use autoship review to interactively keep or remove each emergent feature.
Security
Every autoship ship and optionally every build loop runs three security scanners:
| Scanner | What it checks |
|---|---|
| Secrets scanner | Detects API keys, tokens, passwords, and private keys committed to source. Uses pattern matching and entropy analysis. |
| SAST scanner | Static analysis for common vulnerabilities: SQL injection, XSS, path traversal, eval usage, and insecure crypto. |
| Dependency audit | Checks package.json dependencies against known vulnerability databases (npm audit). |
If critical issues are found during autoship ship, the deploy is blocked and findings are reported.
Plugin Architecture
autoship is built on a plugin system. Every phase of the pipeline is a plugin, and you can add your own.
interface Plugin {
name: string;
phase: "spec" | "build" | "eval" | "review" | "ship";
run(context: PipelineContext): Promise<PluginResult>;
onEvent?(event: PipelineEvent): void;
}
Built-in plugins
| Plugin | Phase | Description |
|---|---|---|
| spec-chat | spec | Interactive conversation to generate user stories |
| planner | spec | Creates a build plan from stories |
| builder | build | Orchestrates Claude Code to write application code |
| evaluator | eval | Batch-evaluates all stories and produces scorecards |
| variant-gen | build | Generates alternative implementations for marked stories |
| emergent | review | Proposes and builds bonus features |
| security | ship | Runs secrets, SAST, and dependency scanners |
| deployer | ship | Handles git tagging and deployment |
| telegram | build | Sends async build notifications via Telegram |
Writing a custom plugin
Create a file in plugins/ that exports an object conforming to the Plugin interface:
// plugins/my-plugin.ts
export default {
name: 'my-plugin',
phase: 'build',
async run(context) {
// Access context.project, context.stories, context.loop
// Return { success: true, data: {...} }
},
onEvent(event) {
// React to pipeline events (optional)
}
};
Project Structure
After autoship new and autoship build, your project directory looks like this:
my-app/
CLAUDE.md # Auto-generated context for Claude Code
progress.md # Loop-by-loop build history
spec/
spec.md # Product specification
stories.yaml # User stories with types and metadata
config.yaml # Build configuration
build-plan.md # Generated build plan
src/ # Built application code
evals/
scorecards/ # Evaluation results per loop (loop-1.json, etc.)
feedback/ # User feedback from widget
plugins/ # Custom plugins (optional)
.buildloop # Pipeline state (JSON)
Contributing
autoship is MIT licensed. The codebase is ~3000 lines of TypeScript with 147 tests.
git clone https://github.com/prateek/autoship
cd autoship && npm install && npm test
Areas to contribute:
- New plugins (linting, accessibility audits, performance budgets)
- Improved evaluation heuristics for visual and outcome stories
- Additional deploy targets (Vercel, Fly.io, Cloudflare Workers)
- Dashboard UI improvements
- Documentation and examples