Skip to yearly menu bar Skip to main content


Model-Based Active Exploration

Pranav Shyam · Wojciech Jaƛkowski · Faustino Gomez

Pacific Ballroom #46

Keywords: [ Robotics ] [ Planning and Control ] [ Deep Reinforcement Learning ] [ Bayesian Deep Learning ] [ Active Learning ]


Efficient exploration is an unsolved problem in Reinforcement Learning which is usually addressed by reactively rewarding the agent for fortuitously encountering novel situations. This paper introduces an efficient {\em active} exploration algorithm, Model-Based Active eXploration (MAX), which uses an ensemble of forward models to plan to observe novel events. This is carried out by optimizing agent behaviour with respect to a measure of novelty derived from the Bayesian perspective of exploration, which is estimated using the disagreement between the futures predicted by the ensemble members. We show empirically that in semi-random discrete environments where directed exploration is critical to make progress, MAX is at least an order of magnitude more efficient than strong baselines. MAX scales to high-dimensional continuous environments where it builds task-agnostic models that can be used for any downstream task.

Live content is unavailable. Log in and register to view live content