Streaming Claude responses: implementation patterns and the tradeoffs
In brief
When to stream, how to implement it properly in Python and TypeScript, error handling mid-stream, and the UX patterns that actually work.
Contents
Streaming matters for anything user-facing. A five-second blank wait followed by a full response feels broken. A response that appears word-by-word feels alive, even if the total generation time is the same.
Beyond UX, streaming lets you process output incrementally — detect early termination, pipe to downstream systems, or display partial results before generation finishes. Here is how to implement it correctly across different contexts.
When to stream
Stream when:
- A human is waiting and watching
- Response length is unpredictable and could be long
- You want to pipe output to another process as it arrives
- You need to detect certain tokens early and react (e.g., stop generation when a sentinel appears)
Do not stream when:
- You need the complete response before doing anything with it (JSON parsing, database writes)
- You are running batch jobs where throughput matters more than latency
- The response is short (under ~100 tokens) — streaming overhead is not worth it
Basic streaming in Python
import anthropic
client = anthropic.Anthropic()
# Using the context manager
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain async/await in Python."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# Get the final message object after streaming completes
final_message = stream.get_final_message()
print(f"\n\nInput tokens: {final_message.usage.input_tokens}")
print(f"Output tokens: {final_message.usage.output_tokens}")
The context manager handles connection cleanup and gives you access to the final message (with usage stats) after the stream closes. If you need raw events:
with client.messages.stream(...) as stream:
for event in stream:
if event.type == "content_block_delta":
print(event.delta.text, end="", flush=True)
elif event.type == "message_stop":
print("\n[done]")
Streaming in TypeScript
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
const stream = await client.messages.stream({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Explain async/await in Python.' }]
})
// Stream text as it arrives
for await (const chunk of stream) {
if (
chunk.type === 'content_block_delta' &&
chunk.delta.type === 'text_delta'
) {
process.stdout.write(chunk.delta.text)
}
}
// Or use the convenience method
stream.on('text', (text) => process.stdout.write(text))
const finalMessage = await stream.finalMessage()
Streaming to a browser via Server-Sent Events
The common pattern: your backend calls Claude, streams the response, forwards it to the browser via SSE.
// Next.js API route (App Router)
import Anthropic from '@anthropic-ai/sdk'
import { NextRequest } from 'next/server'
const client = new Anthropic()
export async function POST(req: NextRequest) {
const { message } = await req.json()
const stream = client.messages.stream({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: message }]
})
const encoder = new TextEncoder()
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (
chunk.type === 'content_block_delta' &&
chunk.delta.type === 'text_delta'
) {
controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text: chunk.delta.text })}
`))
}
}
controller.enqueue(encoder.encode('data: [DONE]
'))
controller.close()
}
})
return new Response(readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
}
})
}
// Browser client
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: userInput })
})
const reader = response.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const chunk = decoder.decode(value)
const lines = chunk.split('\n').filter(line => line.startsWith('data: '))
for (const line of lines) {
const data = line.slice(6)
if (data === '[DONE]') break
const { text } = JSON.parse(data)
appendToUI(text) // your function to update the DOM
}
}
Error handling mid-stream
Errors can occur after the stream has started — the connection can drop, the server can 529. Handle this at the stream level:
try:
with client.messages.stream(...) as stream:
accumulated = ""
for text in stream.text_stream:
accumulated += text
print(text, end="", flush=True)
except anthropic.APIStatusError as e:
print(f"\nStream error: {e.status_code} — {e.message}")
# accumulated contains whatever was received before the error
# decide: retry from scratch, or surface partial output
except anthropic.APIConnectionError:
print("\nConnection dropped mid-stream")
For TypeScript:
try {
for await (const chunk of stream) {
// process chunks
}
} catch (error) {
if (error instanceof Anthropic.APIError) {
console.error(`Stream failed: ${error.status} ${error.message}`)
}
}
Accumulating a full response from a stream
If you need to process the complete output (parse JSON, run validation) but still want the UX of streaming:
buffer = ""
with client.messages.stream(...) as stream:
for text in stream.text_stream:
buffer += text
# optionally show progress without showing partial JSON
print(".", end="", flush=True)
print() # newline after dots
data = json.loads(buffer) # now parse the complete output
UX patterns worth knowing
Show a cursor or typing indicator while the stream opens but before the first token arrives. There is a ~300-500ms delay between the request and the first token. Without a visual signal, the interface looks frozen.
Do not render markdown mid-stream. If you convert markdown to HTML as tokens arrive, you get broken rendering — partial bold tags, half-rendered lists. Buffer until you have a complete "block" (paragraph break or two newlines), then render the completed block.
Abort on user action. If the user navigates away or cancels, close the stream:
const controller = new AbortController()
const stream = await client.messages.stream(
{ model: 'claude-sonnet-4-6', max_tokens: 1024, messages },
{ signal: controller.signal }
)
// Call this on user cancel / component unmount
controller.abort()
Do not re-render on every token. React state updates on every character means hundreds of re-renders per response. Batch updates with requestAnimationFrame or accumulate into a ref and flush periodically.
Further reading
- Streaming documentation — the API reference for streaming responses
- Fine-grained tool streaming — streaming partial tool call results