How to Stream Structured JSON from an LLM

You want the model to return JSON your app can use. You also want to stream so the user sees progress. Those two goals fight each other: JSON is valid only when it's complete. Here's how to do both.

Why it's hard

Streaming gives you token-by-token output. JSON is valid only when you have balanced braces, quotes, and commas. A partial stream is usually invalid JSON. So you either: (1) wait for the full response and parse once, or (2) stream something else (e.g. SSE with discrete chunks) and treat each chunk as a complete unit.

Approach 1: Stream tokens, parse at the end

Stream the raw tokens to the client for display (e.g. "thinking..." or a growing blob of text). In parallel, buffer the full completion on the server. When the stream ends, parse the buffer as JSON and validate against your schema. Send a final event with the structured result. User sees progress; app gets one clean object.

Approach 2: Structured streaming (e.g. JSON mode + chunks)

Some providers support "JSON mode" where the model is constrained to output valid JSON. You can still stream; the catch is that mid-stream you have invalid JSON. Options: (a) accept that you only parse on completion, or (b) use a format that streams valid fragments—e.g. newline-delimited JSON (NDJSON) where each line is a complete object, or streaming multiple small JSON objects in sequence.

Approach 3: DSLs and typed outputs (BAML, etc.)

Use a layer that defines the output shape and handles the model call. The DSL runtime often handles "get me this structure" and can abstract streaming vs non-streaming. You might stream a progress indicator while the runtime collects the structured result, then emit the typed object. Less control over exact byte streaming; more control over correctness.

Recommendation

For most apps: stream for UX (progress, perceived speed), buffer and parse for correctness. Emit the final structured result in a separate event or response field. If you need incremental structured data (e.g. a list that grows), use NDJSON or an array of small JSON objects per chunk. And wherever you can, use a schema (Zod, BAML, etc.) so invalid output fails fast and you get clear errors.