Timezone: »

Efficient Exploration by HyperDQN in Deep Reinforcement Learning
Ziniu Li · Yingru Li · Hao Liang · Tong Zhang
Efficient exploration is crucial to sample efficient reinforcement learning. In this paper, we present a scalable exploration method called \emph{HyperDQN}, which builds on the famous Deep Q-Network (DQN) \citep{mnih2015human} and extends the idea of hyper model \citep{dwaracher20hypermodel} for deep reinforcement learning. In particular, \emph{HyperDQN} maintains a probabilistic meta-model that captures the epistemic uncertainty of the $Q$-value function over the parameter space. This meta-model samples randomized $Q$-value functions, which will generate exploratory action sequences for deep exploration. The proposed method requires fewer samples to achieve substantially better performance than DQN and BootstrappedDQN \citep{osband16boostrapdqn} on hard-exploration tasks, including deep sea, grid world, and mountain car. The numerical results demonstrate that the developed approach can lead to efficient exploration with limited computation resources.

#### Author Information

##### Tong Zhang (HKUST)

Tong Zhang is a professor of Computer Science and Mathematics at the Hong Kong University of Science and Technology. His research interests are machine learning, big data and their applications. He obtained a BA in Mathematics and Computer Science from Cornell University, and a PhD in Computer Science from Stanford University. Before joining HKUST, Tong Zhang was a professor at Rutgers University, and worked previously at IBM, Yahoo as research scientists, Baidu as the director of Big Data Lab, and Tencent as the founding director of AI Lab. Tong Zhang was an ASA fellow and IMS fellow, and has served as the chair or area-chair in major machine learning conferences such as NIPS, ICML, and COLT, and has served as associate editors in top machine learning journals such as PAMI, JMLR, and Machine Learning Journal.