Romberg-Extrapolated Zeroth-Order Gradient Estimator: Higher-Order Bias Reduction with Preserved Leading Directional Variance
Hongcheng Dong ⋅ Wenqiang Pu ⋅ Licheng Zhao ⋅ Rui Zhou ⋅ Feng Yin
Abstract
Zeroth-order optimization is widely used when gradients are unavailable, but the standard two-point estimator suffers from $\mathcal{O}(r^2)$ truncation bias at smoothing radius $r$. Existing bias-reduction schemes typically increase the leading directional variance under a fixed number of function evaluations per gradient estimate, while variance-reduction schemes generally do not improve the bias order. We propose Romberg-ZOGE, which forms a Romberg-extrapolated linear combination of two-point differences evaluated at radii $\{r/2^k\}_{k=0}^R$ while reusing the same perturbation direction across all radii. With appropriately chosen weights, Romberg-ZOGE cancels the first $R$ even-order truncation terms and achieves $\mathcal{O}(r^{2R+2})$ bias under $(2R{+}2)$-order smoothness, while preserving the leading directional variance constant of the two-point estimator up to higher-order residual terms. We further characterize the stochastic-oracle setting by deriving an explicit noise-amplification factor and corresponding bias and variance bounds. Experiments on synthetic benchmarks, simulator-based wireless optimization, and black-box prompt tuning of OPT-1.3B demonstrate faster and more stable zeroth-order SGD when the number of function evaluations per gradient estimate is fixed.
Successful Page Load