Selling Data as a Digital Good with Scaling Valuations
Abstract
We study mechanism design for selling data as a digital good when the value derived from training AI models follows a scaling law. The seller faces a linear cost when producing data, while the buyers benefit from additional data with diminishing returns as data volume increases. This departs from classical auction models by allowing allocations to be continuous quantities of data rather than binary outcomes. We first analyze an offline setting in which all buyer types are realized simultaneously, characterizing profit-optimal mechanisms and showing how virtual-value methods extend to continuous data allocations. We then consider an online setting with sequential arrivals, where production decisions must be made under demand uncertainty. We show that myopic allocation and fixed production plans can be arbitrarily suboptimal, whereas a simple two-stage algorithm that combines upfront production with adaptive expansion achieves a constant-factor approximation to the offline optimum. Finally, we study bilateral data trading under asymmetric information, where both the buyer’s value and the seller’s cost are private. Although the optimal truthful mechanism has a complex structure, we show that simple and implementable mechanisms recover a constant fraction of the first-best gain-from-trade. Overall, our results highlight how scaling laws introduce new algorithmic trade-offs in market design and provide performance guarantees for data markets under uncertainty.