AI Codex
Developer PathStep 3 of 20
← Prev·Next →
Infrastructure & DeploymentHow It Works

Streaming Claude responses: implementation patterns and the tradeoffs

In brief

When to stream, how to implement it properly in Python and TypeScript, error handling mid-stream, and the UX patterns that actually work.

6 min read·Streaming

Contents

Sign in to save

Streaming matters for anything user-facing. A five-second blank wait followed by a full response feels broken. A response that appears word-by-word feels alive, even if the total generation time is the same.

Beyond UX, streaming lets you process output incrementally — detect early termination, pipe to downstream systems, or display partial results before generation finishes. Here is how to implement it correctly across different contexts.

When to stream

Stream when:

  • A human is waiting and watching
  • Response length is unpredictable and could be long
  • You want to pipe output to another process as it arrives
  • You need to detect certain tokens early and react (e.g., stop generation when a sentinel appears)

Do not stream when:

  • You need the complete response before doing anything with it (JSON parsing, database writes)
  • You are running batch jobs where throughput matters more than latency
  • The response is short (under ~100 tokens) — streaming overhead is not worth it

Basic streaming in Python

import anthropic

client = anthropic.Anthropic()

# Using the context manager
with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain async/await in Python."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get the final message object after streaming completes
final_message = stream.get_final_message()
print(f"\n\nInput tokens: {final_message.usage.input_tokens}")
print(f"Output tokens: {final_message.usage.output_tokens}")

The context manager handles connection cleanup and gives you access to the final message (with usage stats) after the stream closes. If you need raw events:

with client.messages.stream(...) as stream:
    for event in stream:
        if event.type == "content_block_delta":
            print(event.delta.text, end="", flush=True)
        elif event.type == "message_stop":
            print("\n[done]")

Streaming in TypeScript

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic()

const stream = await client.messages.stream({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'Explain async/await in Python.' }]
})

// Stream text as it arrives
for await (const chunk of stream) {
  if (
    chunk.type === 'content_block_delta' &&
    chunk.delta.type === 'text_delta'
  ) {
    process.stdout.write(chunk.delta.text)
  }
}

// Or use the convenience method
stream.on('text', (text) => process.stdout.write(text))
const finalMessage = await stream.finalMessage()

Streaming to a browser via Server-Sent Events

The common pattern: your backend calls Claude, streams the response, forwards it to the browser via SSE.

// Next.js API route (App Router)
import Anthropic from '@anthropic-ai/sdk'
import { NextRequest } from 'next/server'

const client = new Anthropic()

export async function POST(req: NextRequest) {
  const { message } = await req.json()

  const stream = client.messages.stream({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages: [{ role: 'user', content: message }]
  })

  const encoder = new TextEncoder()

  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        if (
          chunk.type === 'content_block_delta' &&
          chunk.delta.type === 'text_delta'
        ) {
          controller.enqueue(encoder.encode(`data: ${JSON.stringify({ text: chunk.delta.text })}

`))
        }
      }
      controller.enqueue(encoder.encode('data: [DONE]

'))
      controller.close()
    }
  })

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    }
  })
}
// Browser client
const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ message: userInput })
})

const reader = response.body!.getReader()
const decoder = new TextDecoder()

while (true) {
  const { done, value } = await reader.read()
  if (done) break

  const chunk = decoder.decode(value)
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '))

  for (const line of lines) {
    const data = line.slice(6)
    if (data === '[DONE]') break
    const { text } = JSON.parse(data)
    appendToUI(text)  // your function to update the DOM
  }
}

Error handling mid-stream

Errors can occur after the stream has started — the connection can drop, the server can 529. Handle this at the stream level:

try:
    with client.messages.stream(...) as stream:
        accumulated = ""
        for text in stream.text_stream:
            accumulated += text
            print(text, end="", flush=True)
except anthropic.APIStatusError as e:
    print(f"\nStream error: {e.status_code} — {e.message}")
    # accumulated contains whatever was received before the error
    # decide: retry from scratch, or surface partial output
except anthropic.APIConnectionError:
    print("\nConnection dropped mid-stream")

For TypeScript:

try {
  for await (const chunk of stream) {
    // process chunks
  }
} catch (error) {
  if (error instanceof Anthropic.APIError) {
    console.error(`Stream failed: ${error.status} ${error.message}`)
  }
}

Accumulating a full response from a stream

If you need to process the complete output (parse JSON, run validation) but still want the UX of streaming:

buffer = ""
with client.messages.stream(...) as stream:
    for text in stream.text_stream:
        buffer += text
        # optionally show progress without showing partial JSON
        print(".", end="", flush=True)

print()  # newline after dots
data = json.loads(buffer)  # now parse the complete output

UX patterns worth knowing

Show a cursor or typing indicator while the stream opens but before the first token arrives. There is a ~300-500ms delay between the request and the first token. Without a visual signal, the interface looks frozen.

Do not render markdown mid-stream. If you convert markdown to HTML as tokens arrive, you get broken rendering — partial bold tags, half-rendered lists. Buffer until you have a complete "block" (paragraph break or two newlines), then render the completed block.

Abort on user action. If the user navigates away or cancels, close the stream:

const controller = new AbortController()

const stream = await client.messages.stream(
  { model: 'claude-sonnet-4-6', max_tokens: 1024, messages },
  { signal: controller.signal }
)

// Call this on user cancel / component unmount
controller.abort()

Do not re-render on every token. React state updates on every character means hundreds of re-renders per response. Batch updates with requestAnimationFrame or accumulate into a ref and flush periodically.

Further reading

Related tools

Next in Developer Path · Step 4 of 20

Continue to the next article in the learning path

Next article →

Weekly brief

For people actually using Claude at work.

Each week: one thing Claude can do in your work that most people haven't figured out yet — plus the failure modes to avoid. No tutorials. No hype.

No spam. Unsubscribe anytime.

What to read next

Picked for where you are now

All articles →