Loading article…
Coinbase cuts AI costs by ~50% as token usage hits record highs; CEO Brian Armstrong shares five efficiency tactics, including Chinese LLM defaults and
Coinbase announced that its AI spending has fallen to roughly half of its peak level, even as token consumption climbs to one of the highest points in the company's history, thanks to five cost‑saving tactics unveiled by CEO Brian Armstrong on X【1】. The move aims to keep the firm’s AI infrastructure scalable without throttling engineers’ access to large language models (LLMs).
| At a glance | |
|---|---|
| AI spend | ↓ ≈ 50 % from peak |
| Token usage | Near‑record high |
| Default models | GLM 5.2, Kimi 2.7 (Chinese LLMs) |
| Cost‑saving tactic | Automated model routing, caching, lean context, spend visibility |
Armstrong’s first lever is swapping default LLMs for cheaper open‑weight Chinese models—GLM 5.2 from Z.ai and Kimi 2.7 from Moonshot AI—rather than defaulting to premium offerings from Anthropic or OpenAI【1】. The second step routes each prompt to the most appropriate model based on task difficulty, letting “frontier” models handle planning while cheaper models handle execution【1】. A third measure improves inference cost by using more aggressive caching, and a fourth keeps context lean by starting fresh sessions when switching tasks【1】. Finally, the company makes every engineer’s token consumption visible, tying higher spend to higher impact expectations rather than imposing hard caps【1】.
Armstrong attached a graph showing token usage climbing to historic levels while AI spend dropped sharply, though the exact timeline isn’t disclosed【1】. The Decoder reports that the same routing and caching upgrades lifted the hit‑rate from 5 % to 60 % and cut Coinbase’s AI bill in half as token usage kept rising【4】. In a separate Business Insider post, Armstrong said the firm has kept costs “roughly flat” despite exponential token growth, and he forecasts that within 12‑18 months, 80 % of workloads will run on models that are 99 % cheaper than today’s frontier options【2】.
Coinbase’s strategy mirrors a broader shift away from the “tokenmaxxing” craze, where firms previously encouraged unrestricted token consumption to showcase raw AI power. Instead, companies now impose usage caps or visibility rules to curb runaway costs. Armstrong’s approach aligns with moves by other tech firms—Lindy’s adoption of Deepseek v4 and Snowflake’s testing of Chinese models—adding pricing pressure on Western AI labs as they prepare for potential IPOs【4】.
By halving AI costs while allowing token usage to expand, Coinbase demonstrates a scalable model for crypto‑focused firms that need AI‑driven productivity without unsustainable spend. The open question remains whether the cost‑saving measures will sustain as AI workloads become more complex and demand higher‑end models.
Coverage is mostly measured — 73 of 84 reports stay neutral.
Every Monday — the token unlocks, Fed dates & catalysts set to move crypto and markets this week. So you’re never blindsided.
Free · 3-min read · one-click unsubscribe
AI-assisted synthesis by the TrendWatcher Editorial Desk · sourced from 5 outlets · Jun 29, 2026 · How we report
Coinbase is experimenting with open weight models including GLM 5.2 from Z.ai and Kimi 2.7 from Moonshot AI.
The company provides engineers with visibility into their token usage and expects higher impact from those who consume more AI resources.
The layoffs were partly attributed to AI changing how people work and enabling small teams to complete tasks more quickly.