On the Power of (Approximate) Reward Models for Inference-Time Scaling
Youheng Zhu ⋅ Yiping Lu
Abstract
Inference-time scaling has recently emerged as a powerful paradigm for improving the reasoning capability of large language models. Among various approaches, \emph{Sequential Monte Carlo (SMC)} has become a particularly important framework, enabling iterative generation, evaluation, rejection, and resampling of intermediate reasoning trajectories. A central component in this process is the \emph{reward model}, which evaluates partial solutions and guides the allocation of computation during inference. However, in practice, true reward models are never available. All deployed systems rely on \emph{approximate reward models}, raising a fundamental question: \emph{Why and when do approximate reward models suffice for effective inference-time scaling?} In this work, we provide a theoretical answer. We identify the \emph{Bellman error} of the approximate reward model as the key quantity governing the effectiveness of SMC-based inference-time scaling. For a reasoning process of length $T$, we show that if the Bellman error of the approximate reward model is bounded by $O(1/T)$, then combining this reward model with SMC reduces the computational complexity of reasoning from exponential in $T$ to polynomial in $T$. This yields an \emph{exponential improvement} in inference efficiency despite using only approximate rewards.
Successful Page Load