Episodic Memory-Guided Controllable Experience Synthesis for Reinforcement Learning
Abstract
In real-world scenarios, data collection for reinforcement learning (RL) is often constrained by safety concerns and high costs, resulting in limited data availability. Diffusion models (DMs) have recently demonstrated remarkable capabilities in capturing complex distributions, making data augmentation a promising approach. However, existing DM-based data augmentation methods still suffer from the limited quality of synthesized data for downstream RL tasks. To overcome this limitation, we propose a novel method called episodic memory-guided controllable experience synthesizer (EMCES). EMCES incorporates an episodic memory-based controllable DM with informative yet concise conditions constructed by episodic memory (EM). To guide the synthesis toward high-quality data, we propose an EM-prioritized condition sampling strategy that leverages EM-based temporal-difference errors to focus generation on data most helpful for RL. Furthermore, we introduce a hashing-based state representation for EM to improve its efficiency and further boost the quality of synthetic data. To the best of our knowledge, EMCES is the first work to incorporate EM into controllable DMs and to leverage EM for guiding data synthesis in RL. Experimental results across multiple environments demonstrate that EMCES significantly improves the quality of the synthetic data, thereby improving the performance of several state-of-the-art RL algorithms.