Uncertainty-Guided Exploration and Stable Planning for Sparse-Reward Manipulation from Limited Demonstrations
Abstract
Reinforcement learning from demonstrations (RLfD) offers a promising method for robotic manipulation with sparse rewards. However, limited demonstrations often cause agents to encounter out-of-distribution states where world models produce poor predictions. In multi-stage tasks, jointly optimizing a learned reward function and policy introduces a moving target problem, and the resulting non-stationarity intensifies the impact of uncertainty on policy learning. In this work, we propose QUEST, a model-based RL framework that adaptively switches between exploration and exploitation guided by uncertainty to achieve stable and efficient learning. Specifically, our approach employs intrinsic rewards to capture environmental stochasticity, leverages ensemble dynamics for uncertainty-guided planning, and introduces a hybrid sampling strategy to prioritize rare successful stage transitions. We evaluate QUEST on challenging sparse-reward manipulation tasks with limited expert demonstrations. Results show that QUEST outperforms state-of-the-art methods by 17\% on average, with gains increasing to 60\% on difficult tasks. We further demonstrate successful zero-shot sim-to-real transfer on three real-world tasks.