Scaling Laws for Precision in High-Dimensional Linear Regression
Abstract
Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear regression framework. We demonstrate that the impact of quantization is twofold: it introduces an additive error and, simultaneously, shrinks the effective model and data sizes. Crucially, our analysis reveals distinct behaviors across quantization regimes: multiplicative quantization (where error variance scales with signal magnitude) primarily reduces the effective data size, whereas additive quantization (where error variance is independent of the signal) diminishes both the effective model size and data size. Numerical experiments validate our theoretical findings. By rigorously characterizing the complex interplay among model scale, dataset size, and quantization error, our work provides a principled theoretical basis for optimizing training protocols under practical hardware constraints.