Q-SAM: Unlocking Sharpness-Aware Minimization for Generalization in Offline Reinforcement Learning
Da Wang ⋅ Yi Ma ⋅ Ting Guo ⋅ Lin Li ⋅ Wei Wei ⋅ Jiye Liang
Abstract
Generalization remains a central challenge in offline reinforcement learning (RL), where policies are trained solely from static datasets and must perform reliably under distribution shift. While most existing offline RL methods focus on reducing training loss using standard optimizers such as Adam, the role of loss landscape geometry $-$ particularly sharpness $-$ has received little attention. Sharpness-Aware Minimization (SAM) has recently shown strong generalization benefits in supervised learning by favoring flatter minima. However, directly applying SAM to offline RL is non-trivial: unlike supervised settings with ground-truth labels, offline RL relies on bootstrapped targets, making sharpness estimation noisy and often destabilizing optimization. In this paper, we revisit offline RL from an optimization perspective and investigate how sharpness-aware optimization can be made effective in this setting. We propose Q bound weighted SAM (Q-SAM), a robust and scalable framework that treats sharpness as a weighted objective and selectively prioritizes samples that are most suitable for sharpness-aware optimization based on Q bounds. By aligning the SAM objective with the characteristics of bootstrapped value estimation, Q-SAM amplifies the benefits of sharpness minimization while preserving training stability. Extensive experiments on standard offline RL benchmarks demonstrate that Q-SAM consistently improves generalization performance across diverse datasets and algorithms. Our results highlight the importance of loss sharpness in offline RL and suggest optimizer design as a promising direction for developing more robust offline RL methods.
Successful Page Load