Timezone: »

ConQUR: Mitigating Delusional Bias in Deep Q-Learning
DiJia Su · Jayden Ooi · Tyler Lu · Dale Schuurmans · Craig Boutilier

Thu Jul 16 07:00 AM -- 07:45 AM & Thu Jul 16 06:00 PM -- 06:45 PM (PDT) @ Virtual

Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

Author Information

DiJia Su (Princeton University)
Jayden Ooi (Google)
Tyler Lu (Google)
Dale Schuurmans (Google / University of Alberta)
Craig Boutilier (Google)

More from the Same Authors