Infrastructure & DeploymentHow It Works

Eliminating AI Agent Cold Start with Pre-Fetch Caching

In brief

The most expensive moment for an AI agent is the first one. A cron job at 3–4 AM that pre-fetches and caches context eliminates cold start entirely — shifting token cost from user sessions to a background API key that runs once, not fifty times a day.

8 min read·

Contents

♡Sign in to save

The worst time to fetch data is when the user is waiting for an answer.

Not just because it's slow — though it is. But because in an AI agent context, fetching at session start means you're paying for that data 50 times a day (once per session, for every user), when you could be paying for it once (overnight, in a batch job the user never sees).

This is the cold start problem. And the fix is straightforward once you see it.

What cold start actually is

Cold start is the latency and cost of populating an agent's context from scratch at session start. The agent needs to know the state of the world — your pipeline, your open tickets, your recent meetings — before it can do anything useful.

If you fetch that data live at session start:

Latency: The user waits while APIs respond
Cost: You pay for the same data 50 times (once per session per user)
Reliability: If HubSpot is slow or Intercom has a hiccup, the agent's startup fails

If you pre-fetch and cache overnight:

Latency: Zero. The data is already there.
Cost: You pay once per data source per day, not once per session
Reliability: API failures don't affect user sessions

The pattern

The pre-fetch pattern has four steps:

1. Identify what data the agent needs at session start.
What does your agent load before it can answer the first question? Pipeline state? Meeting notes from yesterday? Open support tickets? This is your pre-fetch candidate list.

2. Write a cron job that runs at 3–4 AM.
This job calls all your data sources, processes the responses, and writes results to a cache layer (a database table, a Redis key, an S3 object — whatever fits your stack).

3. Agent sessions read from cache.
When a user starts a session, the agent reads from the pre-fetched cache instead of calling APIs live. Session startup becomes a few fast reads instead of a dozen slow API calls.

4. Token cost shifts to the background pool.
Instead of 50 sessions each paying for cold start, one background job pays for it once.

Implementation

Here's what a pre-fetch job looks like in practice:

// cron-prefetch.ts — runs at 03:00 UTC daily

import { createClient } from '@supabase/supabase-js'
import { fetchHubspotPipeline } from './connectors/hubspot'
import { fetchOpenTickets } from './connectors/intercom'
import { fetchGongSummaries } from './connectors/gong'
import { fetchGranolaYesterday } from './connectors/granola'

const sb = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_KEY!)

async function runPrefetch() {
  const date = new Date().toISOString().split('T')[0]

  // Fetch and process each source
  const [pipeline, tickets, callSummaries, meetingNotes] = await Promise.all([
    fetchHubspotPipeline(),         // Returns shaped summary, not full objects
    fetchOpenTickets(),             // Returns status + priority, not full threads
    fetchGongSummaries({ days: 7 }),// AI-summarized, not raw transcripts
    fetchGranolaYesterday(),        // Action items extracted, not raw notes
  ])

  // Write to cache table
  await sb.from('agent_context_cache').upsert({
    cache_key: `daily_context_${date}`,
    pipeline,
    tickets,
    call_summaries: callSummaries,
    meeting_notes: meetingNotes,
    generated_at: new Date().toISOString(),
  }, { onConflict: 'cache_key' })

  console.log(`Prefetch complete: ${date}`)
}

runPrefetch().catch(console.error)

And the session startup becomes:

// agent-session.ts

async function getSessionContext(userId: string): Promise<string> {
  const date = new Date().toISOString().split('T')[0]

  const { data } = await sb
    .from('agent_context_cache')
    .select('*')
    .eq('cache_key', `daily_context_${date}`)
    .single()

  if (!data) {
    // Fallback: fetch live if cache miss (rare — only on first run or failures)
    return await fetchContextLive(userId)
  }

  return formatContextForAgent(data, userId)
}

Session startup goes from "call 6 APIs and wait" to "read one database row."

What to cache vs. what to keep live

Not everything belongs in the pre-fetch. The rule is simple: if the data changes faster than your cache refresh cycle and the user needs the freshest version, keep it live.

Cache it	Keep live
Meeting notes from yesterday	Current support conversation
CRM pipeline state	Real-time payment status
Yesterday's revenue numbers	Live inventory count
Roadmap and Notion docs	Active Slack thread
Gong call summaries (last 7 days)	Chat message someone just sent
Open ticket list (summary)	Whether a payment is processing right now
Weekly pipeline health	Real-time stock or pricing data

The pattern: aggregate or summarized views of past data get cached. Real-time status of active operations stays live.

In practice, this means the pre-fetch handles 80–90% of what agents read at session start. The few live calls that remain are fast (single-record lookups) and necessary (payment status, active conversations).

Staleness handling

The obvious objection: what if the data changes during the day and the cache is stale?

A few approaches, depending on how much freshness matters:

Per-user timestamp checks. When the agent reads from cache, it includes the cache generation time in the context. "As of this morning's data..." This sets expectations and gives Claude a signal to flag when freshness matters.

Invalidation hooks. If a deal closes mid-day, a webhook hits your cache invalidation endpoint, which marks that record as stale and triggers a targeted refresh for that customer only.

Explicit live lookups. Your agent knows which tools are cached vs. live. When a user asks about a specific customer's current status, the tool routes to the live API. When they ask about pipeline trends, it reads from cache. The routing is explicit, not implicit.

Multiple cache windows. Hourly cache for data that changes fast (ticket status), daily cache for data that changes slowly (deal stages, Notion docs), weekly cache for data that almost never changes (historical revenue, old meeting summaries).

The economics

Let's make this concrete. Assume:

50 active users per day
Each session previously fetched from 6 live sources
Average cold start cost: 400,000 tokens × $3/M input = $1.20 per session
Total daily cold start cost (50 users): $60

With pre-fetch caching:

One background job fetches all 6 sources once
Cost of the background job: ~100,000 tokens × $3/M input = $0.30
Per-session startup: reading from cache = near zero tokens
Total daily cost: $0.30

That's a $59.70 daily savings — or roughly $1,800/month — from one overnight cron job.

The numbers will vary. The pattern doesn't. Pre-fetching converts per-session cost to per-day cost. At any reasonable team size, that arithmetic is compelling.

The hidden benefit: reliability

The cost savings are the headline. The operational benefit that shows up later is reliability.

When your agent cold-starts from live APIs, every API dependency is in the critical path of user sessions. If HubSpot is slow (or down), your agent is slow (or broken). If Gong's API is rate-limited, your session startup fails.

With pre-fetch caching, those APIs are called once a day in a background job at 3 AM. If the background job fails, you catch it in monitoring, fix it, and the user sessions run on yesterday's cache. The user experience degrades gracefully — slightly stale data — rather than failing hard.

Getting started

Week 1: Identify your highest-cost cold start items. Measure the actual token cost of your session startup (log it if you don't already). Find the biggest offenders.

Week 2: Write a pre-fetch job for the top two or three sources. Keep it simple — fetch, process, write to a table.

Week 3: Wire your agent sessions to read from cache first, fall back to live on miss.

Week 4: Add monitoring. Track cache hit rate, cache generation time, staleness at session start. Now you have visibility into something that was previously invisible.

The infrastructure investment is a few days of engineering. The payoff runs every day.