evidence-based coding · the platform

Your agents code on evidence —
three layers deep.

One shared repository of everything your team and its agents know — piped back into every spec, review, and PR so the loop actually closes. Not vibes. Not a wiki nobody reads. Evidence.

bb-specify · CSV export flow · all three layers on

❯/bb-specify "improve the CSV export flow"

✻What did customers actually want here?

⏺Pulling evidence from BuildBetter…

✓7 customers asked for faster CSV export · High severity · Growth

✓#1 request from the “Data-heavy Admin” persona

✓grounded in 3 calls, 2 tickets, 1 Slack thread

→highest-value item in your backlog this quarter

⏺Drafting the spec from those 7 conversations…

#the roadmap meeting for this took zero minutes.

01 · the evidence stack

Three layers. Stacked.

Layer one ships with the free CLI. Layers two and three connect BuildBetter.ai — and are why the spec above wrote itself.

layer 01 · sessionsIncluded with ZeroShot

Your agent sessions

Every former agent session, skill, and PR review — pulled in automatically, so your agents already learn from how your team actually codes.

layer 02 · companyWith BuildBetter.ai

Company evidence

Connect BuildBetter and bring in your calls, Slack, docs, and knowledge base — the decisions and conventions behind the code.

layer 03 · customersWith BuildBetter.ai

Customer context

BuildBetter also brings your customers — support tickets, CRM, complaints, and feature requests — so agents build what was actually asked for.

└ most agents get layer zero: whatever fits in the context window. Yours get all three.

Explore BuildBetter.ai →

02 · one shared repository

Stop paying twice
for the same context.

Every session — chat, file edits, tool calls — is saved and shared across your whole team. Check out an old branch and pull up the chat that produced it. Resume a teammate’s session on your machine. No agent ever pays tokens to rediscover what someone already figured out.

→Sessions synced across repo, branch, PR, commit, and teammate
→Resume any teammate’s session with one command
→Switch Claude Code → Codex mid-task, same memory

resume · 6mo old branch

❯bb agent-sessions resume

?Resume which session?

❯auth/oauth-rewrite · spencer · 6mo ago

payments/stripe-webhook · nikhil · 4mo ago

✓Hydrating session 1f2c8d — 312 messages, 28 file edits

✻This is the OAuth rewrite from June. The blocker was the refresh-token race in the worker queue.

#zero tokens spent re-deriving six months of decisions.

03 · session handoff

Hand off the chat.
Not just the code.

Any session resumes on any machine, by any teammate. The agent inherits the reasoning — the decisions, the dead ends, the why — not just the diff. Work hands off cleanly across CS → Product → Eng, and six months later it still remembers.

sarah’s terminal · fri 4:51pm

❯zeroshot handoff --to mike

⏺session packaged · 214 turns · decisions, dead ends & the why included

─── mike’s terminal · mon 9:04am ───

❯zeroshot resume sarah/rate-limiting

⏺context loaded in 2s · picking up where sarah left off

✻We chose sliding-window over token-bucket — see incident-42. Continuing.

#no monday-morning “so where did sarah leave this” meeting.

04 · same question. different answer.

The same question.
A radically different answer.

Same agent, same PR, same teammate’s work. The difference is whether it can pull the session that produced the code — or has to burn tokens re-deriving it from scratch.

coding-assistant.ai · context: whatever fits in the window

❯What was Spencer trying to accomplish with the LLM data source PR (#3308)?

$gh pr view 3308 --json title,body

$git diff main...HEAD

⏺It adds an exec plan to ingest LLM chats into the signal pipeline. The PR description is sparse — let me read the whole implementation.

❯why is he using the conversation model? was that intentional?

✗I don’t have visibility into Spencer’s reasoning. Want me to guess?

❯forget it — write a spec for the export feature. start with what customers asked for

✗I don’t have access to your customer signals or tickets. I can write something generic — want me to make assumptions?

❯forget it. just review the PR.

⏺Reviewing for syntax and obvious bugs. I can’t compare against your team’s conventions — no prior reviews to reference.

─── six months later ───

❯customer ticket #4127 — agent logs aren’t showing up in reports. why did we structure agent_logs this way?

$git log --all --oneline src/agent_logs

⏺27 commits across 4 contributors. PR descriptions don’t mention reporting trade-offs.

❯who owns this if we change it?

✗I can’t tell from git blame alone. You’d have to track down each contributor and ask.

⚠ Starts from zero every time — and burns tokens re-deriving what someone already knew.

coding-assistant + ZeroShot CLI● Context loaded

❯What was Spencer trying to accomplish with the LLM data source PR (#3308)?

✓Found Spencer’s Codex session on this branch — 247 messages. Pulling context (0 tokens spent re-deriving it).

spencer’s session · PR #3308

Goal:ingest LLM chat history as a new signal source

Decision:reused conversation as a placeholder — wants to migrate to agent_logs

Open Q for Nikhil:“should agent_logs be its own model? need your call before I split it.”

❯draft my review with /bb-review

✓Posted to PR #3308. Three suggestions, all citing your team’s prior conventions on the conversation table.

─── six months later ───

❯customer ticket #4127 — agent logs aren’t showing up in reports. pull context.

✓Found 14 sessions across 6 months tied to agent_logs. Pulling project history.

project history · agent_logs

Mar ’26spencer scoped initial schema (PR #3308)

Apr ’26maya denormalized for streaming reports “to keep queries under 50ms” (PR #408)

Aug ’26daniel added session_type column (PR #1547)

✓The structure is intentional. Ticket #4127 is missing an index on session_type from Daniel’s August change. Draft /bb-specify for the migration?

❯yes

✓Spec drafted. Clustered #4127 with 7 other customers on slow exports and pulled their quotes — then /bb-plan and /bb-tasks queued the migration.

❯/trust-but-verify before merge

✓Walked the export flow end-to-end. 4 customer-cited paths pass, 0 regressions on the existing reports.

❯ship it

✓PR #2089 merged. The 8 customers who reported it were notified automatically.

⚑Heads up — your team re-flagged this session_type index convention 3× this quarter. Opened a weekly skill PR so your agents catch it next time.

✓ Six months ago, six minutes ago — same context, zero tokens to rebuild it.

└ Same story, everywhere you work

Every job is cheaper with the context.

Onboarding, debugging, scoping, reviewing — the same question costs a fraction of the tokens when your agent can pull the work that came before. Each card shows the skills it leans on.

Onboard a new engineer

“Why is the payments module built this way?”

✗Reads the code blind and pings four people on Slack.

✓Pulls the original sessions and the decisions behind them.

bb agent-sessions resume/bb-analyze

Debug a production incident

“Ticket #4127 — exports failing for big orgs.”

✗git blame archaeology across 27 commits, 4 authors.

✓Surfaces 14 related sessions and the exact missing index.

/bb-specify/trust-but-verify

Scope from customer feedback

“Spec the CSV export fix.”

✗Writes something generic on made-up assumptions.

✓Clusters 8 customer tickets into a spec with real quotes.

/bb-specify/bb-clarify

Review to your conventions

“Review PR #3308.”

✗Flags syntax and obvious bugs only.

✓Cites your team’s prior calls on the conversation table.

/bb-review

Resume half-finished work

“Pick up the OAuth rewrite from June.”

✗Starts from zero and re-derives the whole plan.

✓Hydrates the 312-message session and the open blocker.

bb agent-sessions resume

Never repeat a mistake

“Add a retry to the worker queue.”

✗Invents a brand-new retry abstraction — again.

✓Reuses RetryablePolicy, the convention set in PR #408.

/bb-review/payments-review

05 · Close the loop

A ticket in the morning.
A PR by end of day.

A ticket comes in. ZeroShot clusters it with the other customers who hit the same thing, /bb-specify pulls those quotes into a spec, /bb-review checks your conventions, and the PR ships. The whole loop runs inside the CLI — and feeds back on itself, so the library compounds.

close the loop · ticket #4127 → PR

─── morning ───

⏺ticket #4127 arrives · clustered with the other customers who hit the same thing

❯/bb-specify

✓customer quotes pulled straight into the spec

❯/bb-review

✓checked against your team’s conventions

─── end of day ───

⏺PR shipped · grounded in what customers actually asked for

the loop · signals in → skills out → customer notified

Customer signals

CallTicketSurveySlackFeedbackCRMEvent

From your agents (via CLI)

CommitChat sessionSteering eventsProject context

BuildBetter AIPaid platform

Signal Generation — Signals

SeverityTopicPersonaImpactKeywordsSegmentation

Triage / Projects

TopicsClustersProjectsKnowledge baseCompany context

ZeroShot CLI — captures your work and runs the workflowFree

Runs the spec workflow

/bb-specify→/bb-plan→/bb-tasks→/bb-review→/trust-but-verify

Across your coding agents

Claude CodeCodexCopilot CLIGemini CLIWindsurfAmazon Q

Then, on the platform Metered

Extracts signals from your sessions

repeated patternswasted tool callsconventionsseveritytopic

Suggests & ships skills

find patterns→write / sharpen skill→weekly skill PR

Metered by tool call. You only pay for the signal extraction, skill suggestions, and skill runs you actually use — the CLI and open-source skills stay free.

← Close the loop — customer notified, feedback restarts the cycle →

└ Ship at the speed of insight

A customer ticket, end-to-end. Same day.

9:14am

Ticket #4127Export CSV failing for orgs > 10k rows

9:32am

Signal clustered+ 7 prior tickets · Enterprise · High severity

10:48am

/bb-specifySpec drafted with 8 customer quotes

2:15pm

/bb-reviewpasses · PR #2089 merged

4:42pm

8 customers notifiedLoop closed.

└ The full pipeline, end to end

Every stage of the loop —
and the skills that run it.

From capturing the work to shipping the PR to learning from it — the whole thing runs inside the CLI, and feeds back on itself so the library compounds.

1
Capture
Every session, commit, steering event, file edit, and tool call — synced into one shared repo across your whole team.
2
Scope/bb-specify/bb-clarify/bb-constitution
Turn customer signals into a spec, resolve the unknowns before you build, and encode your team’s principles.
3
Plan/bb-plan/bb-tasks/bb-analyze
Break the spec into a build plan and a dependency-ordered task list — checked for cross-artifact consistency.
4
Build/bb-implement
Execute the plan task by task — in whichever agent you’re already using, with the full context loaded.
5
Verify/bb-review/trust-but-verify/generate-tests/bb-checklist
Review against your conventions, walk the UI like a real user, generate tests, and run the pre-merge checklist.
6
Ship
The PR merges. The customers who reported it hear back — same day, not next quarter.
7
Learnweekly skill PR
Signals run over the sessions, identify the patterns your team keeps repeating, and open a weekly PR with new and sharper skills.

↻ Learn feeds back into Scope — the library compounds, and every loop costs fewer tokens than the last.

06 · A new data type

Agent sessions are
the new call recording.

Every chat your team has with an agent is company knowledge — prompts, decisions, conventions — that’s lost the moment the window closes. ZeroShot captures it as a first-class data type. Nobody would let sales calls evaporate after they end. Your team’s agent sessions have been doing exactly that.

SESSION ARTIFACT

promptskept

decisionskept

dead ends & the whykept

conventionskept

file edits · tool callskept

FILED ASCOMPANY KNOWLEDGE

note ─ before ZeroShot, all of this evaporated when the window closed. Every single time.

Prompt library

Brand voice42 uses

Marketing copy28 uses

Image generation19 uses

PR description63 uses

The prompts your team already wrote — pulled out of chat sessions, shared, reused.

Extractions & insights

DecisionReuse RetryablePolicy

BlockerRefresh-token race in worker

Conventionagent_logs, not conversation

Open QSplit agent_logs model?

Signals run over every session — decisions, blockers, and conventions, structured.

Artifacts created for you

OAuth rewrite — design docupdated 2h ago

Signal pipeline — runbookupdated 1d ago

agent_logs — schema notesupdated 3d ago

Knowledge docs your agents wrote — yours to view, edit, and share. Delete ZeroShot, lose it all.

07 · with buildbetter.ai

One platform behind the evidence.

Out of the box, ZeroShot grounds every agent in your own sessions, skills, and reviews. Connect BuildBetter.ai — a customer intelligence platform built for agents — and that evidence goes all the way to your customers: the whole stack.

prototyping

Snapshot your web app and spin up 10 variants for the price of one build — then test them against synthetic personas.

projects

Every call, signal, doc, and decision for a project organized in one place.

knowledge-base

Your company's docs, specs, and product knowledge — structured for agents.

product-taxonomy

An auto-labeled, rule-based taxonomy that organizes every signal the way your product is built.

signal-pipeline

The customer & company signal pipeline — 35+ signal types extracted from every conversation.

call-recording

Auto-record and transcribe calls from Zoom, Google Meet, Microsoft Teams, and Gong.

custom-dashboards

Build report dashboards over all your customer and product data.

integrations

Slack, Zendesk, Intercom, your CRM, and Linear & Jira — connected.

customer-health

A success platform with customer health scores and risk signals.

close-the-loop

Follow up with the exact customers behind every feature you ship.

changelogs-releases

Generate changelogs and release notes from what actually shipped.

└ Learn more about BuildBetter.ai →

“Whether extreme spend pays off comes down to the business value of shipped code — which most companies still can’t measure.”
— Nicholas Arcolano, Head of Research, Jellyfish · via TechCrunch →

The only way I let a non-engineer ship to production is if they're using ZeroShot. It works that well.
— Adam Stanford, CPO

❯ ok. give it to me.⏺ two ways in:

Download for macOSv0.10.0 · 13.4 MB · free, no account

Book a demo for your team →SOC 2 · can run locally

Your agents code on evidence — three layers deep.

Three layers. Stacked.

Stop paying twice for the same context.

Hand off the chat. Not just the code.

The same question. A radically different answer.