AI Codex
Developer PathStep 13 of 20
← Prev·Next →
Infrastructure & DeploymentHow It Works

Deploying a Claude application: from localhost to production

In brief

Environment variables, rate limits, error handling, costs, and the things that bite you on your first production deploy. A practical checklist.

7 min read·

Contents

Sign in to save

Getting Claude working locally is one thing. Shipping it to real users is another. The gap is not about code — it is about secrets management, rate limits, cost controls, error handling, and observability. Here is what to sort out before you deploy.

Secrets and environment variables

Never hardcode your API key. This seems obvious, but it is the most common mistake in Claude apps shipped by first-time builders.

The right pattern:

import os
import anthropic

# Load from environment — never hardcode
client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

For deployment platforms:

  • Vercel: Settings → Environment Variables → add ANTHROPIC_API_KEY
  • Railway: Variables tab in your service settings
  • Fly.io: fly secrets set ANTHROPIC_API_KEY=sk-...
  • AWS/GCP/Azure: Use their secrets manager services, not env vars directly for production

Never commit .env files. Add .env, .env.local, .env.production to .gitignore before your first commit, not after.

Rate limits and what happens when you hit them

The Anthropic API has rate limits: requests per minute (RPM), tokens per minute (TPM), and tokens per day (TPD). Your tier determines the limits. When you exceed them, you get a 429 RateLimitError.

Retry with exponential backoff:

import anthropic
import time

def call_with_retry(client, max_retries=5, **kwargs):
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                # Server error — retry
                if attempt == max_retries - 1:
                    raise
                time.sleep(2 ** attempt)
            else:
                raise  # 4xx errors: don't retry

For TypeScript:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

async function callWithRetry(params: Anthropic.MessageCreateParamsNonStreaming, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.messages.create(params);
    } catch (err) {
      if (err instanceof Anthropic.RateLimitError) {
        if (attempt === maxRetries - 1) throw err;
        const wait = Math.pow(2, attempt) * 1000;
        await new Promise(r => setTimeout(r, wait));
        continue;
      }
      if (err instanceof Anthropic.APIError && err.status >= 500) {
        if (attempt === maxRetries - 1) throw err;
        await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
        continue;
      }
      throw err; // don't retry 4xx
    }
  }
}

Cost controls

Without controls, a single runaway request or an attacker hammering your endpoint can generate a large unexpected bill.

Controls to put in place before launch:

1. Set max_tokens tightly. Default to the smallest value that covers your actual outputs. If your app generates summaries under 300 words, use max_tokens=512, not max_tokens=4096.

2. Rate-limit your own users. Implement per-user request throttling before Claude calls. Use Redis or an in-memory counter:

import redis
import time

r = redis.Redis()

def check_rate_limit(user_id: str, limit: int = 20, window_seconds: int = 60) -> bool:
    key = f"ratelimit:{user_id}"
    pipe = r.pipeline()
    pipe.incr(key)
    pipe.expire(key, window_seconds)
    count, _ = pipe.execute()
    return count <= limit

3. Set Anthropic spend limits. In the Anthropic console, set a monthly spend limit. This is a hard stop — requests fail once you hit it, but you won't get a surprise bill.

4. Log token usage per request. Capture response.usage.input_tokens and response.usage.output_tokens and store them. You need this data to understand costs by user, by route, and over time.

Observability: what to log

Log enough to debug problems without logging sensitive user data:

import logging
import time

logger = logging.getLogger(__name__)

def logged_claude_call(user_id: str, route: str, **kwargs):
    start = time.time()
    try:
        response = client.messages.create(**kwargs)
        duration_ms = int((time.time() - start) * 1000)
        logger.info({
            "event": "claude_call_success",
            "user_id": user_id,
            "route": route,
            "model": kwargs.get("model"),
            "input_tokens": response.usage.input_tokens,
            "output_tokens": response.usage.output_tokens,
            "duration_ms": duration_ms,
        })
        return response
    except Exception as e:
        logger.error({
            "event": "claude_call_error",
            "user_id": user_id,
            "route": route,
            "error_type": type(e).__name__,
            "error_message": str(e),
        })
        raise

Do not log user message content unless you have a clear business reason and appropriate user consent. Log metadata instead.

The pre-launch checklist

Before your first real users:

  • API key in environment variable, not code
  • API key never committed to git (check your history)
  • Retry logic with exponential backoff in place
  • Per-user rate limiting implemented
  • max_tokens set to realistic values
  • Spend limit set in Anthropic console
  • Token usage logged per request
  • Error responses to users are friendly, not raw API errors
  • Tested what happens when the API is down (graceful degradation)
  • Tested with your actual production environment variables

The Claude API is reliable, but building on any external API means planning for the moments it is not.

Further reading

Related tools

Next in Developer Path · Step 14 of 20

Continue to the next article in the learning path

Next article →

Weekly brief

For people actually using Claude at work.

What practitioners are building, the mistakes worth avoiding, and the workflows that actually stick. No tutorials. No hype.

No spam. Unsubscribe anytime.

What to read next

Picked for where you are now

All articles →