Streaming
Also: LLM streaming
Getting Claude's response word by word as it's generated, rather than waiting for the entire response to be ready. Without streaming, you wait — then get the whole response at once. With streaming, you start reading immediately while Claude is still writing. Makes AI applications feel much more responsive, especially for longer responses. Most modern Claude integrations use streaming by default.
In practice
You ask Claude a complex question and instead of waiting 10 seconds for the full response to appear, words start showing up immediately and you can start reading while it's still writing. That's streaming — responses arrive token by token rather than all at once. For real-time user-facing apps, it makes the experience feel dramatically faster and more responsive.
Related concepts
Where Streaming shows up
4 articlesWhen to stream, how to implement it properly in Python and TypeScript, error handling mid-stream, and the UX patterns that actually work.
End to end: API route, streaming SSE to the browser, React state, conversation history, and deployment. The complete working pattern.
Streaming makes sense when the user is waiting to read. It makes less sense when you need the complete output before doing anything with it. Here is the decision framework and the patterns for each.
Streaming sends Claude's response token by token as it's generated, instead of waiting until the full response is ready. The difference in perceived speed is significant — and the implementation is simpler than you'd expect.