Online Robust Reinforcement Learning with General Function Approximation
Debamita Ghosh ⋅ George Atia ⋅ Yue Wang
Abstract
Reinforcement learning (RL) in real-world tasks often suffers from the performance degradation due to the distribution shift between training and deployment environments. Distributionally Robust RL (DR-RL) addresses this issue by optimizing the worst-case performance over an uncertainty set of transition dynamics, providing an optimized baseline performance upon deployment. However, existing methods typically require strong data access assumptions (e.g., a generative model or comprehensive offline datasets) and mostly focus on tabular settings. In this paper, we introduce a purely online DR-RL algorithm with general function approximation that learns a robust policy directly from interaction, without any prior knowledge or pre-collected data. Our method uses a dual-based fitted robust Bellman update to jointly learn the value function and the robust backup operator. We establish the first regret guarantee for online DR-RL in terms of an intrinsic complexity measure—the robust Bellman–Eluder (BE) dimension, for general $\phi$-divergence uncertainty sets. Our regret bound is sublinear and independent of $|\mathcal{S}|$ and $|\mathcal{A}|$, and recovers sharp rates in structured regimes, providing a scalable method for practical DR-RL.
Successful Page Load