For AI agents — copy full docs in one click

autoship Documentation

An open-source CLI that builds complete software from plain-english user stories — powered by Claude Code.

What is autoship?

autoship is an autonomous software builder. It's an orchestration + evaluation layer on top of Claude Code that takes plain-english requirements, builds complete software autonomously, and iterates until everything works.

You describe what you want in natural language — user stories like "User can sign up with email" or "App has a clean dark mode." autoship hands those stories to Claude Code, evaluates the output against every requirement, feeds failures back, and loops until the scorecard is green. The entire cycle runs unattended.

Think of it as CI/CD for the AI-native era: instead of running tests someone else wrote, autoship writes the software and judges it against your intent. You go grab coffee. When you come back, there's a working app, a scorecard, and a deploy command.

Quick Start

Prerequisites

  • Node.js 18+
  • Claude Code CLI — installed and authenticated (npm install -g @anthropic-ai/claude-code)

Install

npm install -g autoship

Your first app in 3 commands

# 1. Describe what you want
autoship new my-app

# 2. Go grab coffee
autoship build

# 3. Ship to production
autoship ship

Commands

autoship provides a small set of composable commands that cover the full lifecycle: create, build, inspect, run, review, and ship.

autoship new <name>

Create a new project. Three modes:

Interactive (recommended):

autoship new my-app

Starts a conversation — autoship asks what you want to build, suggests features, and generates user stories automatically.

With story flags:

autoship new my-app \
  -d "A recipe manager" \
  -s "User can add recipes" \
     "User can search by ingredient" \
     "User can filter by cuisine"

From a spec file:

autoship new my-app --spec requirements.md

Reads a markdown file containing your full product spec and extracts user stories from it.

autoship build

Start the autonomous build loop. This is where the magic happens.

autoship build                    # build in current directory
autoship build --dir ./my-app     # build a specific project
autoship build --max-loops 5      # limit iterations
autoship build --no-emergent      # skip emergent feature generation

What happens during build:

  1. Claude Code writes production code in src/
  2. autoship evaluates every user story in one batch call
  3. Produces a scorecard (which stories pass/fail)
  4. Feeds the scorecard back to Claude Code
  5. Claude Code fixes failures
  6. Repeats until all stories pass or max loops reached
  7. Generates variant implementations for marked stories
  8. Suggests and auto-builds emergent features

autoship status

Check build progress at any time, even while a build is running in another terminal.

autoship status --dir ./my-app

Example output:

autoship status
  Project:  my-app
  Loop:     3 / 10
  Stories:  7 pass, 1 fail, 2 pending
  Phase:    build
  Elapsed:  4m 32s

autoship run

Start the built app locally with the feedback widget active.

autoship run --dir ./my-app

The feedback widget is a small overlay injected into the running app that lets you submit inline feedback ("this button should be bigger", "add a loading spinner here"). Feedback is saved to feedback/ and automatically incorporated in the next autoship build loop.

autoship review

Review emergent features — bonus features that autoship suggested and built after all your original stories passed.

autoship review --dir ./my-app

For each emergent feature, you can keep it (merge into your codebase) or remove it (auto-reverted cleanly). This gives you creative suggestions without any risk to your working build.

autoship ship

Final build for production. Runs a complete pipeline to get your app deploy-ready.

autoship ship --dir ./my-app

What happens:

  1. Security scan — secrets detection, SAST, dependency audit
  2. Strip feedback widget — removes the dev-only overlay from production bundles
  3. Git tag — tags the release as v1.0.0
  4. Deploy — runs your configured deploy command or outputs the production bundle

autoship telegram

Configure a Telegram bot for async build updates so you can monitor progress from your phone.

autoship telegram --token BOT_TOKEN --chat-id CHAT_ID --dir ./my-app

Setup: Create a bot via @BotFather on Telegram, get your bot token and chat ID, then run the command above.

What you receive: Loop completion notifications with pass/fail counts, emergent feature proposals, security scan results, and a final "build complete" or "build failed" summary.

autoship web

Start the web dashboard for a visual overview of your project's build status, scorecards, and emergent features.

autoship web --port 3000

User Stories

User stories are the atomic unit of work in autoship. Each story describes one thing the software should do, and autoship evaluates each one independently.

Story types

Type Example How it's evaluated
feature "User can sign up with email" Claude tests the flow end-to-end
visual "App has clean modern design with dark mode" Scored 0.0–1.0 on 8 design dimensions
outcome "Page loads in under 2 seconds" Measured against threshold

Auto-detection: autoship detects the story type from keywords. Stories mentioning "design", "animation", "layout", or "theme" are classified as visual. Stories mentioning "performance", "seconds", "latency", or "size" are classified as outcome. Everything else is feature. You can also set the type explicitly in stories.yaml.

Evaluation

Batch eval — autoship sends all stories to Claude in a single call and receives a structured scorecard back. This is more efficient and produces more consistent scoring than evaluating stories one at a time.

After each loop, a scorecard is saved to evals/scorecards/loop-N.json with the following format:

{
  "loop": 3,
  "stories": [
    { "id": "signup", "pass": true, "score": 1.0, "reason": "..." },
    { "id": "dark-mode", "pass": false, "score": 0.6, "reason": "..." }
  ],
  "summary": { "total": 7, "pass": 6, "fail": 1 }
}

The scorecard is what drives the build loop: autoship feeds failing stories and their reasons back to Claude Code, which uses this structured feedback to make targeted fixes.

Emergent Features

Once all user stories pass, autoship enters the emergent phase. It analyzes the built application and proposes 2–3 bonus features that complement what's already been built — things like "add keyboard shortcuts", "add export to CSV", or "add a toast notification system."

How they work: Each emergent feature is built on a separate branch. After building, autoship re-evaluates all original stories to make sure nothing regressed. If any original story breaks, the emergent feature is auto-reverted — no human intervention needed.

Use autoship review to interactively keep or remove each emergent feature.

Security

Every autoship ship and optionally every build loop runs three security scanners:

Scanner What it checks
Secrets scanner Detects API keys, tokens, passwords, and private keys committed to source. Uses pattern matching and entropy analysis.
SAST scanner Static analysis for common vulnerabilities: SQL injection, XSS, path traversal, eval usage, and insecure crypto.
Dependency audit Checks package.json dependencies against known vulnerability databases (npm audit).

If critical issues are found during autoship ship, the deploy is blocked and findings are reported.

Plugin Architecture

autoship is built on a plugin system. Every phase of the pipeline is a plugin, and you can add your own.

interface Plugin {
  name: string;
  phase: "spec" | "build" | "eval" | "review" | "ship";
  run(context: PipelineContext): Promise<PluginResult>;
  onEvent?(event: PipelineEvent): void;
}

Built-in plugins

Plugin Phase Description
spec-chat spec Interactive conversation to generate user stories
planner spec Creates a build plan from stories
builder build Orchestrates Claude Code to write application code
evaluator eval Batch-evaluates all stories and produces scorecards
variant-gen build Generates alternative implementations for marked stories
emergent review Proposes and builds bonus features
security ship Runs secrets, SAST, and dependency scanners
deployer ship Handles git tagging and deployment
telegram build Sends async build notifications via Telegram

Writing a custom plugin

Create a file in plugins/ that exports an object conforming to the Plugin interface:

// plugins/my-plugin.ts
export default {
  name: 'my-plugin',
  phase: 'build',
  async run(context) {
    // Access context.project, context.stories, context.loop
    // Return { success: true, data: {...} }
  },
  onEvent(event) {
    // React to pipeline events (optional)
  }
};

Project Structure

After autoship new and autoship build, your project directory looks like this:

my-app/
  CLAUDE.md              # Auto-generated context for Claude Code
  progress.md            # Loop-by-loop build history
  spec/
    spec.md              # Product specification
    stories.yaml         # User stories with types and metadata
    config.yaml          # Build configuration
    build-plan.md        # Generated build plan
  src/                   # Built application code
  evals/
    scorecards/          # Evaluation results per loop (loop-1.json, etc.)
  feedback/              # User feedback from widget
  plugins/               # Custom plugins (optional)
  .buildloop             # Pipeline state (JSON)

Contributing

autoship is MIT licensed. The codebase is ~3000 lines of TypeScript with 147 tests.

git clone https://github.com/prateek/autoship
cd autoship && npm install && npm test

Areas to contribute:

  • New plugins (linting, accessibility audits, performance budgets)
  • Improved evaluation heuristics for visual and outcome stories
  • Additional deploy targets (Vercel, Fly.io, Cloudflare Workers)
  • Dashboard UI improvements
  • Documentation and examples