The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs
Abstract
Inference-time scaling has emerged as a critical avenue for enhancing Large Language Model performance, yet real-world deployment is bound by strict computational budgets. In this work, we formulate inference budget allocation as a global constrained optimization problem governed by economic principles. By modeling reasoning utility as an S-shaped function, we derive a theoretical optimal policy based on a global \textit{shadow price} that dynamically equilibrates resource scarcity. Based on this theory, we propose Difficulty-Aware Budget Allocation (DABA), a market-based mechanism that numerically solves for the exact market-clearing price. Unlike standard methods, DABA implements a Lambert W policy to execute strategic abandonment, sacrificing insolvent tasks to redistribute critical computational resources to solvable complex queries. Extensive experiments on mathematical reasoning benchmarks demonstrate that DABA significantly improves the Pareto frontier of cost versus accuracy. In resource-scarce regimes, DABA achieves up to a 3 times improvement in global accuracy compared to uniform allocation.