Robust Exploration through Generative Replay
Abstract
Reinforcement learning (RL) methods typically employ replay buffers to store and leverage past experiences for training policies and value functions. Recent advances in generative models offer a promising alternative to the static replay buffer by capturing online experiences and synthesizing additional transitions beyond them. Guided by conditional generative models, generative replay enables the indirect exploration of unseen states without further environment interactions. However, we identify that existing approaches that utilize conditional generative models struggle to balance between exploration and dynamic plausibility of generated samples, which may exacerbate training instability due to adversarial samples. In this work, we propose a novel framework for Robust Exploration through Generative Replay (REGR). We first define the target distribution as a novelty-tilted distribution to effectively balance between exploration and dynamic plausibility. To sample from the tilted distribution, we fine-tune diffusion models rather than relying on guidance. Through extensive experiments, we confirm that REGR shows robust exploration capabilities compared to prior approaches across several continuous control tasks, including sparse-reward environments. Finally, we conduct an in-depth analysis of components such as value-estimation bias and state coverage to validate why REGR achieves robust performance.