AI Codex
for engineering leads & ops teams8 guides · ~74 min

Building your internal AI stack

Most teams connect Claude to their tools one connector at a time and wonder why it's slow and expensive. These eight guides cover the architecture that fixes it: a single internal MCP server, the right data access patterns, caching that eliminates cold start, access control that actually works, and the skills layer that turns repeated workflows into one-command automation.

01

The concept: one routing layer for all your data

Why connecting Claude directly to each tool (CRM, billing, support, comms) breaks at scale — and how an internal MCP server solves it with a single interface, shaped responses, and access control baked in.

9 min
02

Why how you fetch data costs more than your model choice

The three tiers: warehouse SQL (cheapest), internal MCP live API (middle), native connectors (10-50x more expensive). A morning routine skill can consume 400,000 tokens before a user asks a single question. Here's how that happens and how to prevent it.

10 min
03

Which data goes in a warehouse vs. stays live

The decision framework: freshness under one hour means live API, aggregate or cross-system queries mean warehouse. The key is routing both through the same internal MCP so the caller never needs to know which path was taken.

8 min
04

Eliminating the expensive moment at session start

A cron job at 3–4 AM that pre-fetches and caches context shifts token cost from user sessions to a background API key. What to cache, what to keep live, and why this single change can reduce cold-start cost by 95%.

8 min
05

BigQuery vs StarRocks for agent workloads

BigQuery's per-query pricing made sense for analysts running a few large queries a day. AI agents make many small queries all day. StarRocks at ~$200/month flat with sub-second latency is a better fit — here's when to switch.

9 min
06

Access control: why prompts are not enough

When you give AI access to company data, "everyone gets everything" is not acceptable. Tool-level permissions enforced in middleware — not by the model — are the only reliable solution. How to build it.

9 min
07

Turning repeated workflows into reusable tools

A skill is a markdown file that encodes a multi-step AI workflow: which system to query, which fields matter, how to format output. Once written, any team member can invoke it by name. How to identify, build, and publish them.

10 min
08

What it looks like when you build it properly

The five-layer stack: UI, skills, internal MCP, live API + warehouse, pre-fetch cache. The target metrics: cold-start under 10k tokens, queries under 2s, zero permission violations, 80%+ of weekly repeated workflows automated.

11 min