The New Startup Stack for AI Products

If you're starting an AI product today, here's a stack that works without overbuilding.

Frontend: Next.js / React

Still the default for most startups. Good DX, SSR when you need it, easy to add streaming for LLM responses. Alternatives (Remix, etc.) are fine; the important part is that your frontend can consume streaming and structured data from your API.

Backend: Node or Python

Node if your team lives in TypeScript and you want one language end-to-end. Python if you're heavy on ML libs, data pipelines, or LangChain-style orchestration. Both work. Pick one and stick with it for the core API.

Models: OpenRouter / OpenAI / Anthropic

Don't hardcode a single provider. Use a router (OpenRouter, or your own thin layer) so you can switch models and providers without rewriting. You'll want to A/B models, handle fallbacks, and control cost. One API surface, multiple backends.

Observability: Langfuse (or similar)

You need traces: prompt, completion, tokens, latency, cost. Langfuse, Helicone, or a self-hosted OTel pipeline. Don't ship without this. You'll need it the first time something breaks in prod or the bill spikes.

Vector DB: pgvector (to start)

For RAG and semantic search, pgvector is enough for a lot of products. Simple, runs in Postgres, no extra infra. When you outgrow it (scale or features), consider Pinecone, Weaviate, or a managed vector DB. Don't overbuild early.

Workers: Cloudflare / AWS

Background jobs, eval runs, batch processing. Cloudflare Workers for simple, cheap, globally distributed. AWS (Lambda, Step Functions) when you need more control or integration with the rest of your infra. Match the worker to the task.

Tradeoffs

Single provider vs router: Router adds a hop and a dependency but saves you from lock-in and makes fallbacks trivial.
pgvector vs dedicated vector DB: Start with pgvector; move when you hit limits.
Langfuse vs custom: Custom OTel is flexible but more work. Langfuse gets you 80% there fast.

This stack gets you to "working AI product" without building a platform. Optimize later when you have real usage and real constraints.