OVLR: Efficient, Scalable, and Robust Training via Output-Level Variance-Reduced Likelihood Ratio
Abstract
Gradient-based optimization is fundamental to deep learning, yet standard backpropagation (BP) is inherently limited by the requirement of differentiability, rendering it brittle when encountering piecewise-constant objectives with vanishing gradients (e.g., hard 0-1 loss) or black-box feedback. While likelihood ratio (LR) methods offer a theoretical alternative, their high variance in high-dimensional spaces often undermines training stability and scalability. We propose OVLR (Output-Level Variance-Reduced Likelihood Ratio), a simple yet powerful framework that circumvents this fundamental trade-off by providing a unified solution for efficient, scalable, and robust gradient estimation. OVLR achieves dramatic variance reduction by performing perturbations and antithetic sampling in the low-dimensional output space. Crucially, the method maintains high computational efficiency: it requires only a single deterministic forward pass through the neural network, with additional costs restricted to evaluating the loss function across multiple samples. Designed as a drop-in replacement, OVLR integrates seamlessly into automatic differentiation frameworks via vector-Jacobian products, enabling the direct optimization of objectives with vanishing or pathological gradients, such as the 0-1 loss for noise-tolerant classification and truncated losses for outlier-resistant regression, where BP fails to provide reliable learning signals. Extensive empirical results across image classification, generative modeling, language modeling, and robot imitation learning demonstrate that OVLR not only matches BP performance on problems with informative gradients, but also provides a decisive advantage on problems with vanishing or inaccessible gradients.