Timezone: »

Data-Efficient Exploration with Self Play for Atari
Michael Laskin · Catherine Cang · Ryan Rudes · Pieter Abbeel

Most reinforcement learning (RL) algorithms rely on hand-crafted extrinsic rewards to learn skills. However, crafting a reward function for each skill is not scalable and results in narrow agents that learn reward-specific skills. To alleviate the reliance on reward engineering it is important to develop RL algorithms capable of efficiently acquiring skills with no rewards extrinsic to the agent. While much progress has been made on reward-free exploration in RL, current methods struggle to explore efficiently. Self-play has long been a promising approach for acquiring skills but most successful applications have been in multi-agent zero-sum games with extrinsic reward. In this work, we present SelfPlayer, a data-efficient single-agent self-play exploration algorithm. SelfPlayer samples hard but achievable goals from the agent’s past by maximizing a symmetric KL divergence between the visitation distributions of two copies of the agent, Alice and Bob. We show that SelfPlayer outperforms prior leading self-supervised exploration algorithms such as GoExplore and Curiosity on the data-efficient Atari benchmark.

Author Information

Michael Laskin (UC Berkeley)
Catherine Cang (University of California Berkeley)
Ryan Rudes (Half Hollow Hills High School East)

I'm a rising high school junior who enjoys ML projects, research, and almost anything CS related. I am a visiting student researcher in the Robot Learning Lab group of BAIR, lead by Pieter Abbeel.

Pieter Abbeel (UC Berkeley & Covariant)

More from the Same Authors