Coinbase is adjusting its internal AI usage strategy, shifting its focus away from solely pursuing the most powerful models and instead assigning different tasks to models at different price points. CEO Brian Armstrong stated that this approach has helped the company maintain overall AI costs at a roughly stable level despite rapid growth in token usage.
Armstrong discusses model splitting
Armstrong stated in an article on the X platform that not every AI suggestion requires calling a high-end model. He indicated that Coinbase is working to route appropriate requests to cheaper models. In some scenarios, this approach has allowed the company to keep costs roughly flat even as usage has grown exponentially.
This statement reflects a real shift in enterprise AI deployment: as the volume of calls continues to rise, the cost pressure of solely relying on high-performance models is beginning to receive more attention. Compared to the previous emphasis on "using it as much as possible and fully utilizing it," enterprises are now placing greater emphasis on task stratification and cost efficiency.
Tech industry professionals followed up with discussions

Armstrong's remarks subsequently drew responses from several figures in the tech industry. Venture capitalist Marc Andreessen commented that this approach was "worth watching." Hugging Face co-founder Julien Chaumond also stated that model routing is becoming one of the fastest-growing directions recently.
Box CEO Aaron Levie believes that Armstrong's data is "somewhat extreme," but he also predicts that future AI usage will become more stratified. In his opinion, high-value, more complex tasks will continue to be handled by leading models, while high-frequency, standardized tasks will be handled more by low-cost models.
The AI-driven cost management approach is evolving.
Harvey co-founder Winston Weinberg stated that how to allocate "intelligent resources" will become crucial. This means that in the future, companies will compete not only on which models they have integrated, but also on whether they can use the model capabilities in the most appropriate way.
The article notes that this emphasis on efficiency in public statements has only recently become more common. Previously, the tech industry favored showcasing high token bills or highlighting the scale of their use of the latest models. However, with inference costs and call frequency continuing to rise, the market has begun to reassess whether the price of high-end models can be extrapolated indefinitely.
Glean co-founder Tony Gentilcore also supports Armstrong's view. He stated that the tech community has long understood this, and those who are still linearly extrapolating demand based on high-end model prices are mostly financial market participants.












