Why Prompt Engineering Is the Wrong Abstraction
Raw prompt tweaking doesn't scale. Structured prompts, DSLs like BAML, typed outputs, and eval pipelines are the right abstraction.
"Prompt engineering" as a discipline is mostly: edit a string, run it, see what happens, repeat. That's a terrible abstraction for anything beyond a prototype.
The problem with raw prompts
Prompts are untyped, unversioned, and hard to test. Change one word and behavior shifts. You have no compile-time guarantee that the model's output will match what your code expects. You're debugging prose. At scale you get prompt drift, duplication, and a pile of "v2_final_v3" strings that nobody wants to touch.
Structured prompts
Prompts should be structured. Templates with clear inputs and outputs. Variables that are validated before they're injected. No ad-hoc string concatenation in application code. The prompt is data, not prose—versioned, reviewable, and testable.
DSLs and typed outputs
A DSL like BAML (or similar) lets you define the shape of what you want: "this call returns an object with these fields and this enum." The compiler and runtime handle prompt assembly and output parsing. You get type safety in your app and a single place to define the contract. The model can still be wrong; you'll know immediately because the output won't parse or won't pass validation.
Evaluation pipelines
If you're not running evals on every change, you're not engineering—you're hoping. Structured prompts and typed outputs make evals tractable: you have a clear pass/fail or score. Automate it. Run evals in CI; block merges when something regresses. Prompt "engineering" without evals is just prompt guesswork.
The right abstraction
Treat prompts as a contract: structured definition, typed outputs, and an eval pipeline that guards the contract. Optimize the wording inside that frame, but don't make "edit the prompt string" the primary lever. The right abstraction is prompt + schema + evals, not "prompt engineering."