AI Codex
Building Your Internal AI StackStep 2 of 8
← Prev·Next →
Infrastructure & DeploymentFailure Modes

The Token Cost of How You Fetch Data — and Why It Matters More Than Your Model Choice

In brief

A morning routine agent that pulls calendar, email, CRM, and support data cold via native connectors can burn 400,000 tokens before the user asks a single question. The same workflow with pre-fetched, shaped data costs near zero on startup. How you fetch data has a 10-50x cost impact — bigger than your model choice.

10 min read·

Contents

Sign in to save

Here's a number that should bother you: 400,000 tokens.

That's what a morning routine skill can burn before the user types their first message — if you're fetching data the naive way. That's before any reasoning, before any output, before any value is delivered. Four hundred thousand tokens of context loading, just to give the agent what it needs to get started.

At $3 per million input tokens (Sonnet pricing), that's $1.20 per user per morning. For a team of 50, that's $60 a day, $1,500 a month, just for the cold start.

And here's the thing: the morning routine itself isn't expensive. Generating a daily brief, summarizing meetings, flagging priority emails — that's maybe 2,000–5,000 tokens of output. The token cost isn't in what the AI does. It's in how the data gets in.


Three tiers of data access

Not all data fetching is equal. There's a spectrum from cheap to expensive, and where you land on it is mostly an architectural choice.

Tier 1: Warehouse SQL (cheapest)

You write a SQL query against your data warehouse. The query is precomputed, returns exactly the fields you asked for, and the result is a flat table.

SELECT
  customer_name,
  deal_stage,
  arr,
  open_tickets,
  last_payment_status
FROM customer_summary
WHERE owner_id = ?
  AND close_date BETWEEN NOW() AND NOW() + INTERVAL 90 DAY

This returns maybe 20 rows, 6 columns, clean field names. Claude reads a tight table. Tokens: minimal.

The tradeoff: warehouse data is lagged. It reflects wherever your last ETL run landed — typically a few hours behind. For yesterday's revenue numbers or pipeline state, that's fine. For a live support ticket someone just opened 10 minutes ago, it's not.

Tier 2: Internal MCP Live API (middle)

Your internal MCP server calls the upstream API, shapes the response, and returns only what Claude needs. The raw HubSpot deal object has 40+ fields. Your tool returns 6. The raw Intercom ticket has the full conversation thread. Your tool returns the status and last update.

The live API is slower than a warehouse query (you're making an HTTP call) and costs more tokens (JSON still needs to be serialized), but the data is current and you control the shape.

Token cost: much higher than SQL, but far less than raw native connectors. The shaping makes a real difference.

Tier 3: Native Connectors (most expensive)

You've connected HubSpot, Intercom, QuickBooks, Gong, Granola, and Slack directly to Claude. Claude loads 8+ tool schemas at session start. When it calls each one, it gets the full API response.

A HubSpot deal API call returns 40+ fields. A Gong recording summary might be 3,000 words. Intercom's conversation history includes every message thread. Slack's API returns the full message objects with metadata. QuickBooks entity objects include the full audit trail.

Claude parses all of it. You pay for all of it.

Token cost per call: 10-50x more than equivalent SQL.

Now multiply that by session initialization. Six connected services, each loading their schemas. The agent populates its context with enough information to be useful — and that population costs before a single question is answered.


The morning routine cold start, unpacked

Here's what actually happens when a "morning routine" skill runs cold with native connectors:

  1. Schema loading. Claude loads tool schemas for all connected services: HubSpot, Gong, Intercom, QuickBooks, Google Calendar, Gmail. That's maybe 8 schema definitions, each 500-2,000 tokens. Already at 10,000 tokens before the first API call.

  2. Calendar pull. Fetch today's meetings. Google Calendar returns each event as a full JSON object including attendees, conferencing links, recurrence rules, color codes, and RSVP status for all invitees. Five meetings: ~15,000 tokens.

  3. Email triage. Fetch last 24 hours of email. Each email returns full MIME headers, thread ID, label IDs, history ID, and message body. 40 emails at 2,000 tokens each: 80,000 tokens.

  4. CRM pipeline check. Fetch all open deals in the quarter. HubSpot returns deal objects with 40+ properties each. 25 deals: ~60,000 tokens.

  5. Support tickets. Fetch open tickets assigned to the team. Intercom returns full conversation objects including all message threads. 15 tickets: ~80,000 tokens.

  6. Gong summaries. Fetch recent call recordings. Even "summaries" include full transcript snippets. 8 calls: ~150,000 tokens.

Total before the user says good morning: roughly 400,000 tokens.


What the same workflow costs with proper data architecture

With an internal MCP server and pre-fetched warehouse data:

  1. Schema loading. One MCP server, one schema, a dozen clean tool definitions. Maybe 3,000 tokens.

  2. Calendar pull. Your tool returns a shaped list: meeting name, time, attendees, agenda note. No conferencing metadata, no recurrence rules, no RSVP status. 5 meetings: ~1,500 tokens.

  3. Email triage. Your ETL runs at 2 AM, processes the inbox, and stores a summary table: sender, subject, priority flag (computed), one-line summary. 40 emails: ~4,000 tokens.

  4. CRM pipeline. Your warehouse query returns a clean summary view: customer, stage, ARR, days-to-close. 25 deals: ~2,000 tokens.

  5. Support tickets. Pre-processed ticket summary: customer, issue category, priority, days open. 15 tickets: ~1,500 tokens.

  6. Gong summaries. Pre-processed AI summaries from your own batch run, stored in warehouse. 8 calls: ~6,000 tokens.

Total: roughly 18,000 tokens. The same information, 95% cheaper.


Why this matters more than your model choice

Teams spend a lot of energy debating Claude Sonnet vs. Haiku vs. Opus. The per-token price difference between models is real — Haiku is roughly 20x cheaper per token than Opus.

But if your data access pattern is burning 400,000 tokens on setup, the model choice is a rounding error on a rounding error. You're paying for context, not computation. And context costs scale with architecture, not with model tier.

Fix the architecture first. Then debate models.


The session cost mental model

Think of every agent session as having two cost buckets:

  • Context cost: what you pay to load information into the model's window
  • Reasoning cost: what you pay for the model to think and respond

Most people optimize for reasoning cost (by choosing cheaper models). Most of the waste is in context cost (by fetching data badly).

A few principles that compound:

Shape before you send. If your API returns 40 fields and you need 6, filter at the tool level — not inside the prompt. Every extra field you pass is a token you pay for.

Pre-process what doesn't change. Yesterday's revenue numbers are the same whether you fetch them at 3 AM or 9 AM. Fetch them once, cache them, read from cache.

Summarize before storing. A 2-hour Gong recording transcript is 30,000 words. A good summary is 400. If you're going to use this in AI context repeatedly, the summarization step is worth paying once.

Know your freshness requirements. Not all data needs to be live. Pipeline state from this morning is fine for a daily brief. Real-time payment status matters for a customer call. Build your data access patterns around actual freshness requirements, not "always live is safest."


The practical starting point

You don't have to rebuild everything at once. Start here:

  1. Identify the highest-cost pattern. What does your agent load at session start that it could read from a pre-processed table instead?

  2. Write one warehouse view. Take your most expensive live query and move it into a nightly materialized view. Measure the before/after token cost.

  3. Shape your top-five tools. If you have native connectors pulling full API responses, write wrapper tools that return only the fields you actually use.

  4. Add a pre-fetch job. A cron at 3 AM that runs your most common queries and caches results. User sessions read from cache. Token cost shifts to a background key.

The economics compound. Every tool you shape, every query you pre-process, every API call you replace with a warehouse lookup — it all reduces session startup cost, which is the biggest single line item in most AI agent budgets.

Related tools

Next in Building Your Internal AI Stack · Step 3 of 8

Continue to the next article in the learning path

Next article →

Weekly brief

For people actually using Claude at work.

Each week: one thing Claude can do in your work that most people haven't figured out yet — plus the failure modes to avoid. No tutorials. No hype.

No spam. Unsubscribe anytime.

What to read next

Picked for where you are now

All articles →