The Bill for 'Tokenmaxxing' Arrives: Big Tech Starts Rationing AI Tokens

After a year of rewarding raw AI usage, Meta, Uber and others are capping token budgets and chasing efficiency instead.

use-cases2026-06-28 22:00 KST·Lead Editor·6 min read

A reckoning, not a release

The biggest AI story of the past week isn't a model launch. It's a budget meeting. After roughly two years in which large companies treated AI consumption as a proxy for innovation — paying employees, in effect, to use as much as possible — the bills have arrived, and they are enormous. CNBC reported on June 26 that OpenAI and Anthropic now face a "new AI reality" as their biggest customers pivot from maximizing usage to minimizing it. The shift even has a name: the industry called the old behavior "tokenmaxxing," and the correction "tokenminimizing."

This matters more than any single benchmark. Frontier labs have built their revenue stories on the assumption that token consumption only goes up. If their largest enterprise buyers are now actively engineering it down, the economics of the whole sector look different than they did a quarter ago.

How "tokenmaxxing" got out of hand

The practice grew out of a reasonable instinct gone feral. Through 2025, companies wanted employees to adopt AI fast, so they measured adoption — and what gets measured gets gamed. Several big tech firms ran internal leaderboards ranking staff by token consumption. Per reporting collected by BigGo Finance, Amazon ran one called "Kirorank" that it shut down in May after it "devolved into wasteful token-chasing contests with zero practical value." MLQ News reports Meta tracked usage on a leaderboard nicknamed "Claudeonomics" that, in the company's own framing, "inadvertently incentivized volume over productivity."

The result was a metric that rewarded motion over progress. Engineers learned that running more agents, longer chains, and bigger context windows scored points — regardless of whether anything shipped. Because AI pricing scales directly with usage, that gamified behavior translated into large and unpredictable invoices.

The numbers behind the panic

The figures, where sources cite them, are striking. According to MLQ News, an internal Meta memo to 6,000 employees described an "exponential increase" in AI costs, noting that staff consumed 73.7 trillion tokens in roughly 30 days, with internal AI spending approaching billions of dollars in 2026. (Meta separately plans to spend up to $135 billion on AI infrastructure through the year.) Meta's response: an "AI Gateway" dashboard for real-time monitoring, formal token budgets starting in 2027, and a nudge toward its in-house MetaCode tool over Anthropic's Claude. CTO Andrew Bosworth summed up the new posture bluntly: "All motion is not progress and token usage alone is not a measure of impact."

Uber is the cautionary tale everyone cites. MLQ News reports it burned through its entire 2026 AI coding budget in about four months and imposed a $1,500-per-month cap per employee tool; BigGo puts per-engineer costs in the $500–$2,000 monthly range before the cap, with 95% of engineers using the tool monthly. Microsoft, in BigGo's account, revoked most internal Claude Code licenses in one division and pushed engineers back to GitHub Copilot CLI over "unmanageable" billing. Meta, AT&T and Walmart are all described as tightening internal AI spending. These specific figures come from secondary reporting rather than company filings, so treat the exact numbers as indicative rather than audited — but the direction is consistent across every source.

The real problem: usage isn't output

The uncomfortable core of the story is that spending and results have come unglued. CNBC reports that despite roughly 70% of committed code at Uber being AI-generated, COO Andrew Macdonald said the link between token spending and measurable output "is not there yet." BigGo's reporting frames the same gap quantitatively, claiming code commits surged far faster than production releases and that agentic tools can consume on the order of a thousand times more tokens than ordinary chat for a low effective return. Those productivity ratios are the article's own modeling and should be read skeptically, but the qualitative point is widely echoed: more tokens did not reliably mean more shipped software.

That gap is why the metric itself is changing. CNBC reports that Salesforce CEO Marc Benioff says the company still plans to spend heavily on AI this year but now tracks "agentic work units" — a measure meant to capture output rather than raw consumption. The reframing is the real news. Companies aren't abandoning AI; they're trying to redefine what success looks like so the bill maps to value.

The fixes: gateways, routers, and cheaper models

A small ecosystem is forming around cost control. CNBC notes rising demand for "gateway" tools and model routers that monitor, cap and optimize spending, with Microsoft and Databricks shipping relevant products and a startup called Factory releasing a router that automatically sends low-complexity tasks to cheaper models. The logic is simple: most prompts don't need a frontier model, and routing them to a smaller one preserves quality where it matters while cutting the bill everywhere else.

The most pointed example is a defection. CNBC reports that the CEO of AI startup Lindy moved 100% of his company's traffic off Anthropic's Claude to DeepSeek, the Chinese lab known for cheaper open-weight alternatives, expecting to save millions within months. One startup is an anecdote, not a trend — but it's exactly the substitution frontier labs have to fear if their premium pricing isn't matched by premium results.

Hype versus reality

It's worth being precise about what this is and isn't. It is not evidence that enterprise AI demand is collapsing. Benioff still intends to spend hundreds of millions; Meta is still pouring tens of billions into infrastructure; usage is being rationalized, not switched off. Goldman Sachs, per MLQ News, still projects roughly a 24x rise in enterprise token consumption by 2030. The reckoning is about discipline, not retreat.

But it is a genuine repricing of risk. The bull case for OpenAI and Anthropic — both reportedly preparing IPOs — leaned on the idea that consumption growth is effectively unbounded. A customer base that now budgets tokens, routes around premium models, and demands proof of output introduces a ceiling that wasn't in the spreadsheet a few months ago. It also rewards efficiency leaders: cheaper open-weight models and smarter routing benefit directly from the same anxiety that pressures the frontier labs. Much of the granular reporting here is secondary, and primary company numbers are scarce, so the safest read is the shared narrative across sources rather than any one statistic.

The takeaway

For two years the AI industry's favorite metric was "more." Last week's reporting marks the moment that assumption broke in public: the same companies that built token leaderboards are now building token budgets. The substance is real — Meta's memo, Uber's cap, Amazon's killed scoreboard — even if the exact figures come mostly from secondary outlets and deserve caution. The strategic message is clearer than any number. The next phase of enterprise AI won't be won by whoever burns the most tokens, but by whoever turns each token into something that ships. That's a harder game, and it's one that favors efficiency, routing and cheaper models as much as it favors the frontier.

#enterprise-ai#token-economics#agentic-coding#cost-optimization

Primary sources

OpenAI and Anthropic face new AI reality as users shift from 'tokenmaxxing' to efficiency (CNBC)Meta Caps Internal AI Token Spending After Costs Approach Billions in 2026 (MLQ News)Silicon Valley's AI Bubble Burst: Three Months to Burn a Year's Budget (BigGo Finance)