Poster
in
Workshop: Reinforcement Learning for Real Life
IV-RL: Leveraging Target Uncertainty Estimation for Sample Efficiency in Deep Reinforcement Learning
Vincent Mai · Kaustubh Mani · Liam Paull
The model-free deep reinforcement learning framework is well adapted to the field of robotics, but it is difficult to deploy in the real world due to the poor sample efficiency of the learning process. In the widely used temporal-difference algorithms, this inefficiency is partly due to the noisy supervision caused by bootstrapping. The label noise is heteroscedastic: the target network prediction is subject to epistemic uncertainty which depends on the input and the learning process. We propose Inverse-Variance RL, which uses uncertainty predictions to weight the samples in the mini-batch during the Bellman update following the Batch Inverse-Variance approach for heteroscedastic regression in neural networks. We show experimentally that this approach improves the sample efficiency of DQN in two environments, and propose directions for further work on this method.