Eliminating AI Agent Cold Start with Pre-Fetch Caching
In brief
The most expensive moment for an AI agent is the first one. A cron job at 3–4 AM that pre-fetches and caches context eliminates cold start entirely — shifting token cost from user sessions to a background API key that runs once, not fifty times a day.
Contents
The worst time to fetch data is when the user is waiting for an answer.
Not just because it's slow — though it is. But because in an AI agent context, fetching at session start means you're paying for that data 50 times a day (once per session, for every user), when you could be paying for it once (overnight, in a batch job the user never sees).
This is the cold start problem. And the fix is straightforward once you see it.
What cold start actually is
Cold start is the latency and cost of populating an agent's context from scratch at session start. The agent needs to know the state of the world — your pipeline, your open tickets, your recent meetings — before it can do anything useful.
If you fetch that data live at session start:
- Latency: The user waits while APIs respond
- Cost: You pay for the same data 50 times (once per session per user)
- Reliability: If HubSpot is slow or Intercom has a hiccup, the agent's startup fails
If you pre-fetch and cache overnight:
- Latency: Zero. The data is already there.
- Cost: You pay once per data source per day, not once per session
- Reliability: API failures don't affect user sessions
The pattern
The pre-fetch pattern has four steps:
1. Identify what data the agent needs at session start.
What does your agent load before it can answer the first question? Pipeline state? Meeting notes from yesterday? Open support tickets? This is your pre-fetch candidate list.
2. Write a cron job that runs at 3–4 AM.
This job calls all your data sources, processes the responses, and writes results to a cache layer (a database table, a Redis key, an S3 object — whatever fits your stack).
3. Agent sessions read from cache.
When a user starts a session, the agent reads from the pre-fetched cache instead of calling APIs live. Session startup becomes a few fast reads instead of a dozen slow API calls.
4. Token cost shifts to the background pool.
Instead of 50 sessions each paying for cold start, one background job pays for it once.
Implementation
Here's what a pre-fetch job looks like in practice:
// cron-prefetch.ts — runs at 03:00 UTC daily
import { createClient } from '@supabase/supabase-js'
import { fetchHubspotPipeline } from './connectors/hubspot'
import { fetchOpenTickets } from './connectors/intercom'
import { fetchGongSummaries } from './connectors/gong'
import { fetchGranolaYesterday } from './connectors/granola'
const sb = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_KEY!)
async function runPrefetch() {
const date = new Date().toISOString().split('T')[0]
// Fetch and process each source
const [pipeline, tickets, callSummaries, meetingNotes] = await Promise.all([
fetchHubspotPipeline(), // Returns shaped summary, not full objects
fetchOpenTickets(), // Returns status + priority, not full threads
fetchGongSummaries({ days: 7 }),// AI-summarized, not raw transcripts
fetchGranolaYesterday(), // Action items extracted, not raw notes
])
// Write to cache table
await sb.from('agent_context_cache').upsert({
cache_key: `daily_context_${date}`,
pipeline,
tickets,
call_summaries: callSummaries,
meeting_notes: meetingNotes,
generated_at: new Date().toISOString(),
}, { onConflict: 'cache_key' })
console.log(`Prefetch complete: ${date}`)
}
runPrefetch().catch(console.error)
And the session startup becomes:
// agent-session.ts
async function getSessionContext(userId: string): Promise<string> {
const date = new Date().toISOString().split('T')[0]
const { data } = await sb
.from('agent_context_cache')
.select('*')
.eq('cache_key', `daily_context_${date}`)
.single()
if (!data) {
// Fallback: fetch live if cache miss (rare — only on first run or failures)
return await fetchContextLive(userId)
}
return formatContextForAgent(data, userId)
}
Session startup goes from "call 6 APIs and wait" to "read one database row."
What to cache vs. what to keep live
Not everything belongs in the pre-fetch. The rule is simple: if the data changes faster than your cache refresh cycle and the user needs the freshest version, keep it live.
| Cache it | Keep live |
|---|---|
| Meeting notes from yesterday | Current support conversation |
| CRM pipeline state | Real-time payment status |
| Yesterday's revenue numbers | Live inventory count |
| Roadmap and Notion docs | Active Slack thread |
| Gong call summaries (last 7 days) | Chat message someone just sent |
| Open ticket list (summary) | Whether a payment is processing right now |
| Weekly pipeline health | Real-time stock or pricing data |
The pattern: aggregate or summarized views of past data get cached. Real-time status of active operations stays live.
In practice, this means the pre-fetch handles 80–90% of what agents read at session start. The few live calls that remain are fast (single-record lookups) and necessary (payment status, active conversations).
Staleness handling
The obvious objection: what if the data changes during the day and the cache is stale?
A few approaches, depending on how much freshness matters:
Per-user timestamp checks. When the agent reads from cache, it includes the cache generation time in the context. "As of this morning's data..." This sets expectations and gives Claude a signal to flag when freshness matters.
Invalidation hooks. If a deal closes mid-day, a webhook hits your cache invalidation endpoint, which marks that record as stale and triggers a targeted refresh for that customer only.
Explicit live lookups. Your agent knows which tools are cached vs. live. When a user asks about a specific customer's current status, the tool routes to the live API. When they ask about pipeline trends, it reads from cache. The routing is explicit, not implicit.
Multiple cache windows. Hourly cache for data that changes fast (ticket status), daily cache for data that changes slowly (deal stages, Notion docs), weekly cache for data that almost never changes (historical revenue, old meeting summaries).
The economics
Let's make this concrete. Assume:
- 50 active users per day
- Each session previously fetched from 6 live sources
- Average cold start cost: 400,000 tokens × $3/M input = $1.20 per session
- Total daily cold start cost (50 users): $60
With pre-fetch caching:
- One background job fetches all 6 sources once
- Cost of the background job: ~100,000 tokens × $3/M input = $0.30
- Per-session startup: reading from cache = near zero tokens
- Total daily cost: $0.30
That's a $59.70 daily savings — or roughly $1,800/month — from one overnight cron job.
The numbers will vary. The pattern doesn't. Pre-fetching converts per-session cost to per-day cost. At any reasonable team size, that arithmetic is compelling.
The hidden benefit: reliability
The cost savings are the headline. The operational benefit that shows up later is reliability.
When your agent cold-starts from live APIs, every API dependency is in the critical path of user sessions. If HubSpot is slow (or down), your agent is slow (or broken). If Gong's API is rate-limited, your session startup fails.
With pre-fetch caching, those APIs are called once a day in a background job at 3 AM. If the background job fails, you catch it in monitoring, fix it, and the user sessions run on yesterday's cache. The user experience degrades gracefully — slightly stale data — rather than failing hard.
Getting started
Week 1: Identify your highest-cost cold start items. Measure the actual token cost of your session startup (log it if you don't already). Find the biggest offenders.
Week 2: Write a pre-fetch job for the top two or three sources. Keep it simple — fetch, process, write to a table.
Week 3: Wire your agent sessions to read from cache first, fall back to live on miss.
Week 4: Add monitoring. Track cache hit rate, cache generation time, staleness at session start. Now you have visibility into something that was previously invisible.
The infrastructure investment is a few days of engineering. The payoff runs every day.