Robust Learning via Nested Distributionally Robust Optimization
Jinyi Huang ⋅ Jinlong Lei ⋅ Guodong Shi
Abstract
Distributionally Robust Optimization (DRO) is widely used to improve model robustness, with existing methods addressing either geometric perturbations (e.g., input shifts) or statistical contamination (e.g., heavy-tailed noise and outliers) effectively. However, these uncertainty sources often co-exist. Coupling them through a single divergence or optimal transport constraint conflates geometric displacement with loss-based outlierness, which often leads to the discarding of informative high-leverage samples. We introduce nested DRO, a bilevel formulation that combines optimal transport with an outer $\\phi$-divergence constraint to decouple geometric smoothing from statistical robustness. We prove that this structure naturally induces a geometry-invariant, loss-based reweighting mechanism that separates outlier suppression from transport-induced regularization. We derive a tractable strong dual for the resulting non-convex problem and show its equivalence to variance-regularized risk minimization, providing a rigorous theoretical justification for reweighting gains as a natural consequence of dualization. Empirical results on synthetic and real datasets demonstrate that nested DRO consistently outperforms geometry-coupled DRO baselines, particularly under heavy-tailed contamination where preserving high-leverage structure is crucial.
Successful Page Load