Infrastructure & DeploymentHow It Works

Streaming Claude responses: implementation patterns and the tradeoffs

In brief

When to stream, how to implement it properly in Python and TypeScript, error handling mid-stream, and the UX patterns that actually work.

6 min read·Streaming

Contents

♡Sign in to save

Streaming matters for anything user-facing. A five-second blank wait followed by a full response feels broken. A response that appears word-by-word feels alive, even if the total generation time is the same.

Beyond UX, streaming lets you process output incrementally — detect early termination, pipe to downstream systems, or display partial results before generation finishes. Here is how to implement it correctly across different contexts.

When to stream

Stream when:

A human is waiting and watching
Response length is unpredictable and could be long
You want to pipe output to another process as it arrives
You need to detect certain tokens early and react (e.g., stop generation when a sentinel appears)

Do not stream when:

You need the complete response before doing anything with it (JSON parsing, database writes)
You are running batch jobs where throughput matters more than latency
The response is short (under ~100 tokens) — streaming overhead is not worth it

Basic streaming in Python

import anthropic

client = anthropic.Anthropic()

# Using the context manager
with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain async/await in Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get the final message object after streaming completes
final_message = stream.get_final_message()
print(f"\n\nInput tokens: {final_message.usage.input_tokens}")
print(f"Output tokens: {final_message.usage.output_tokens}")

The context manager handles connection cleanup and gives you access to the final message (with usage stats) after the stream closes. If you need raw events:

with client.messages.stream(...) as stream:
    for event in stream:
        if event.type == "content_block_delta":
            print(event.delta.text, end="", flush=True)
        elif event.type == "message_stop":
            print("\n[done]")

Streaming in TypeScript

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic()

const stream = await client.messages.stream({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Explain async/await in Python.' }]
})

// Stream text as it arrives
for await (const chunk of stream) {
  if (
    chunk.type === 'content_block_delta' &&
    chunk.delta.type === 'text_delta'
  ) {
    process.stdout.write(chunk.delta.text)
  }
}

// Or use the convenience method
stream.on('text', (text) => process.stdout.write(text))
const finalMessage = await stream.finalMessage()

Streaming to a browser via Server-Sent Events

The common pattern: your backend calls Claude, streams the response, forwards it to the browser via SSE.

// Next.js API route (App Router)
import Anthropic from '@anthropic-ai/sdk'
import { NextRequest } from 'next/server'

const client = new Anthropic()

export async function POST(req: NextRequest) {
  const { message } = await req.json()

  const stream = client.messages.stream({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages: [{ role: 'user', content: message }]
  })

  const encoder = new TextEncoder()

  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        if (
          chunk.type === 'content_block_delta' &&
          chunk.delta.type === 'text_delta'
        ) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text: chunk.delta.text })}

`))
        }
      }
      controller.enqueue(encoder.encode('data: [DONE]

'))
      controller.close()
    }
  })

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    }
  })
}

// Browser client
const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ message: userInput })
})

const reader = response.body!.getReader()
const decoder = new TextDecoder()

while (true) {
  const { done, value } = await reader.read()
  if (done) break

  const chunk = decoder.decode(value)
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '))

  for (const line of lines) {
    const data = line.slice(6)
    if (data === '[DONE]') break
    const { text } = JSON.parse(data)
    appendToUI(text)  // your function to update the DOM
  }
}

Error handling mid-stream

Errors can occur after the stream has started — the connection can drop, the server can 529. Handle this at the stream level:

try:
    with client.messages.stream(...) as stream:
        accumulated = ""
        for text in stream.text_stream:
            accumulated += text
            print(text, end="", flush=True)
except anthropic.APIStatusError as e:
    print(f"\nStream error: {e.status_code} — {e.message}")
    # accumulated contains whatever was received before the error
    # decide: retry from scratch, or surface partial output
except anthropic.APIConnectionError:
    print("\nConnection dropped mid-stream")

For TypeScript:

try {
  for await (const chunk of stream) {
    // process chunks
  }
} catch (error) {
  if (error instanceof Anthropic.APIError) {
    console.error(`Stream failed: ${error.status} ${error.message}`)
  }
}

Accumulating a full response from a stream

If you need to process the complete output (parse JSON, run validation) but still want the UX of streaming:

buffer = ""
with client.messages.stream(...) as stream:
    for text in stream.text_stream:
        buffer += text
        # optionally show progress without showing partial JSON
        print(".", end="", flush=True)

print()  # newline after dots
data = json.loads(buffer)  # now parse the complete output

UX patterns worth knowing

Show a cursor or typing indicator while the stream opens but before the first token arrives. There is a ~300-500ms delay between the request and the first token. Without a visual signal, the interface looks frozen.

Do not render markdown mid-stream. If you convert markdown to HTML as tokens arrive, you get broken rendering — partial bold tags, half-rendered lists. Buffer until you have a complete "block" (paragraph break or two newlines), then render the completed block.

Abort on user action. If the user navigates away or cancels, close the stream:

const controller = new AbortController()

const stream = await client.messages.stream(
  { model: 'claude-sonnet-4-6', max_tokens: 1024, messages },
  { signal: controller.signal }
)

// Call this on user cancel / component unmount
controller.abort()

Do not re-render on every token. React state updates on every character means hundreds of re-renders per response. Batch updates with requestAnimationFrame or accumulate into a ref and flush periodically.