Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Abstract
Scaling test-time computation during language model inference, such as generating intermediate thoughts or sampling multiple candidate answers, has proven effective in improving model performance. While these techniques inherently rely on the stochastic nature of inference to explore diverse reasoning paths, prior theoretical works typically build on a deterministic decoding framework, overlooking the stochastic nature of practical language model inference. This work takes an initial step to bridge this gap by establishing a new theoretical framework, incorporating randomness and sampling directly into the decoding analysis. To demonstrate the framework's effectiveness, we apply it to the canonical in-context linear regression task with continuous and binary coefficients, simulating decoding via noise injection and sampling to analyze widely adopted inference techniques. We validate our theoretical findings through numerical simulations, with additional experiments on real-world tasks substantiating the framework's potential for practical applications.