Showing AI ROI to your CEO — what to measure and how to report it
In brief
Your CEO doesn't want a technology update. They want to know if the investment is working and whether to do more. Three types of evidence actually work: time saved, error rate improvement, and throughput. Here's how to measure them and how to present them.
Contents
Ninety days in. Your CEO schedules a check-in and asks: "So what has the AI stuff actually delivered?"
You don't want to answer with "it's been really promising" or "the team is excited about the possibilities." Those answers destroy credibility faster than any bad demo. Your CEO came back from a conference, handed you this responsibility, and has been mostly leaving you alone to figure it out. They're not asking because they're curious. They're asking because they want to decide whether to invest more.
You need a real answer. Here's how to have one.
Why traditional ROI doesn't work for agents
Most ROI calculations look for revenue impact: does the investment generate more revenue, or reduce direct costs? Agents can do both, but often indirectly. The contract review agent that catches problematic clauses doesn't generate revenue — it prevents revenue from leaking. The invoice triage agent that saves Sarah 3 hours a week doesn't reduce headcount — it frees Sarah to do the work that requires judgment.
This doesn't mean the ROI isn't real. It means the measurement model has to account for the indirect path.
The three types of evidence that actually work with executives each have a different measurement approach — and you need to know which one applies to each agent you're running.
Type 1: Time saved (most credible, most commonly available)
This is the easiest to measure and the most intuitive for any executive to understand.
Before: "Maria spends approximately 4 hours a week on supplier invoice triage — sorting incoming invoices by category, flagging discrepancies, routing to the right approver."
After: "Maria spends about 40 minutes a week on the same function — the agent handles initial categorization and routing. Maria reviews flagged items and handles exceptions."
The number: 3.3 hours/week × $58/hour fully-loaded cost × 50 working weeks = ~$9,600/year for one person.
If you have three people who were doing this task and they're all down to 40 minutes, multiply by three.
How to get this number reliably:
The mistake most Agent Operators make is not establishing the baseline. If you deploy the agent and then try to figure out how long it used to take, you're estimating. Your CEO will press on the estimate and you'll have nothing solid.
Before you deploy any agent, ask the people who currently do the task to track their time on that specific task for two weeks. Not every task — just the one the agent will handle. Use a simple shared spreadsheet: name, date, task, minutes. It takes 30 seconds per session and produces numbers you can defend.
After rollout, track for two more weeks. The before/after comparison is now a measurement, not an estimate.
Even rough numbers are better than estimates. "We tracked it for 3 weeks: average dropped from 18 minutes per invoice to 4 minutes" is defensible. "We think it saves about 75% of the time" is not.
Type 2: Error rate or quality improvement
This works when the value comes from catching mistakes, not saving time. Contract review, compliance checks, quality control, data validation — the ROI is in the downstream cost of errors that didn't happen.
Before: "Our contract review process caught clause issues in about 60% of contracts before signing. We tracked 3 contract disputes in Q4 that could have been caught earlier — combined rework cost was approximately $85K."
After: "The agent flags clause patterns we trained it to recognize. Independent review of 40 contracts over the last 6 weeks: agent flagged issues in 82% of cases that had actual problems, versus our previous 60% catch rate."
Why this works with executives: The cost of the errors you're preventing is almost always much larger than the cost of running the agent. One prevented contract dispute that would have cost $30K in rework pays for 6 months of Claude usage. You don't need to claim you prevented every error — just that you improved the catch rate and the math on prevented errors is favorable.
How to get this number:
You need a baseline catch rate, which means you need to track how many issues your current process catches and how many slip through. For most companies, you'll find this in downstream data: contract disputes, invoice discrepancies that required rework, quality failures, compliance flags that should have been caught earlier.
Establish this baseline before you deploy. After rollout, track the agent's performance on the same metric. The comparison becomes the evidence.
Type 3: Throughput and capacity
This works when the value comes from doing more with the same team — handling more volume, processing more requests, serving more customers without adding headcount.
Before: "Our customer service team handles approximately 90 tickets per day manually. Average first-response time is 6 hours."
After: "With the triage agent handling first-line categorization and routing — and drafting responses for standard request types — we're handling 90 tickets per day with the same team, but complex escalation time is down 40% because tier-1 time has been cut significantly. And we've absorbed a 20% volume increase from the new account additions without adding staff."
Why this works: Same cost, more output, or same output during a period of growth without additional cost. Both are compelling to a CFO.
How to get this number: Pull from your ticketing system, CRM, or whatever system of record tracks volume and cycle time. This data usually already exists — you just need to pull the before and after comparisons.
The monthly update format that actually lands
Once a quarter, or whenever asked, provide a one-pager with this structure:
What's running: A 2–3 bullet list of active agents and what each one handles. One sentence each. Not technical — functional.
What it's handling: Numbers per agent. Choose whichever metric is most meaningful: tasks handled per week, time saved per person, volume processed.
What it's costing: Total monthly Claude spend. Don't hide it — present it alongside the value.
What we're building next: One sentence on the next agent and why it was prioritized.
The question you're answering: Not "is this ROI positive?" — that's a threshold question that leads to a yes/no. The real question your CEO is asking is: "Should we be doing more of this?" Answer that. "Yes — the next highest-value workflow is supplier onboarding communications. Based on volume, we estimate similar time savings as the invoice triage agent. Here's what it would take to build it."
What kills your credibility with this report
Claiming ROI you didn't measure. "We estimate this saves approximately 40% of the time" — when you didn't track it — reads as a guess. Your CEO has seen enough inflated estimates from technology projects to be skeptical. If you didn't measure it, say so, and describe what you'll track going forward.
Activity metrics instead of outcome metrics. "We ran 1,200 agent queries this month" is an activity metric. It says nothing about value. "The invoice triage agent handled 340 categorizations, saving approximately 6 hours of work per week" is an outcome metric. Use outcome metrics.
Comparing to the cost of premium software. "The Claude bill is lower than what we'd pay for another Salesforce seat" is a comparison that makes no sense to your CEO. The right comparison is to the cost of the work the agent is replacing.
The longer game
Your first ROI report is about defending the investment. Your second and third are about expanding it. The frame that builds long-term credibility:
"We now have 3 agents running. Two are clearly delivering value — I have the numbers. One is still in validation. Here's what I've learned about which types of workflows are best suited to agents, and here's my recommendation for what to build next."
This is the Agent Operator who builds institutional support for the initiative over time. Not because they oversell results, but because they're honest about what they know and what they don't — and they keep delivering reliable results.
Try this today
Go back to your most important running agent. Can you quantify the time it's saving, the quality it's improving, or the volume it's enabling? If yes, write those numbers down. If no, set up the tracking you'll need to have those numbers in 30 days.
Before your next CEO check-in, you'll want at least one agent with a clean before-and-after story. Start building that story now.