← Blog

What I Learned Building 20 AI Prototypes in a Year

Fast feedback loops, when to use which model, and why most prototypes should stay prototypes—until one shouldn't.

I built a lot of AI prototypes in the last year. Most never shipped. A few became real products. Here's what actually translated.

Speed beats polish

The prototypes that taught me the most were the ones I got in front of users (or at least a working flow) in days, not weeks. One clear question per prototype: "Does this model do X well enough?" or "Do people want to use this flow?" If you spend a month building the perfect wrapper, you've lost the learning. Ship the ugly version first.

Picking the right model matters more than you think

Not everything needs GPT-4. Summarization, simple extraction, and low-stakes chat often work fine with smaller or cheaper models. The prototypes that blew up cost or latency were usually the ones where I reached for the biggest model by default. Start small; upgrade when you have evidence.

Structured output from day one

The moment I started defining output schemas (BAML, Zod, whatever) and validating model responses, the rest of the stack got easier. Prototypes that stayed "raw string in, raw string out" became impossible to extend. Typed outputs and evals turned prototypes into something I could hand off or scale.

Observability isn't optional after prototype #3

The first few prototypes, I could get away with print statements. After that, I was drowning in "which prompt was that?" and "why did it cost $200?" Tracing and cost visibility became part of the default setup. Same for evals: once you have more than a handful of flows, you need a harness or you'll break things without knowing.

Most ideas should stay prototypes

The biggest lesson: most ideas are worth a week, not a company. Build fast, learn, kill fast. The ones that keep pulling you back—where you keep fixing one more thing—those are the candidates. The rest are tuition. Building 20 prototypes is how you find the 1 or 2 worth turning into products.

What I'd do again

  • One clear hypothesis per prototype.
  • Structured outputs and a minimal eval from the start.
  • Observability and cost tracking so you don't get surprised.
  • No shame in killing a prototype. The goal is learning, not a portfolio of half-finished apps.

Twenty prototypes in a year sounds like a lot. It's really just a habit: small bets, fast feedback, and the discipline to stop when the answer is "no."