Timezone: »
One principled approach for provably efficient exploration is incorporating the upper confidence bound (UCB) into the value function as a bonus. However, UCB is specified to deal with linear and tabular settings and is incompatible with Deep Reinforcement Learning (DRL). In this paper, we propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I). OB2I constructs a general-purpose UCB-bonus through non-parametric bootstrap in DRL. The UCB-bonus estimates the epistemic uncertainty of state-action pairs for optimistic exploration. We build theoretical connections between the proposed UCB-bonus and the LSVI-UCB in linear setting. We propagate future uncertainty in a time-consistent manner through episodic backward update, which exploits the theoretical advantage and empirically improves the sample-efficiency. Our experiments in MNIST maze and Atari suit suggest that OB2I outperforms several state-of-the-art exploration approaches.
Author Information
Chenjia Bai (Harbin Institute of Technology)
Lingxiao Wang (Northwestern University)
Lei Han (Tencent AI Lab)
Jianye Hao (Tianjin University)
Animesh Garg (University of Toronto, Vector Institute, Nvidia)
Peng Liu (Harbin Institute of Technology)
Zhaoran Wang (Northwestern University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Principled Exploration via Optimistic Bootstrapping and Backward Induction »
Wed. Jul 21st 01:20 -- 01:25 PM Room
More from the Same Authors
-
2021 : Auditing AI models for Verified Deployment under Semantic Specifications »
Homanga Bharadhwaj · De-An Huang · Chaowei Xiao · Anima Anandkumar · Animesh Garg -
2021 : Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning »
Chenjia Bai · Lingxiao Wang · Lei Han · Jianye Hao · Animesh Garg · Peng Liu · Zhaoran Wang -
2021 : Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 : Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings »
Shunshi Zhang · Murat Erdogdu · Animesh Garg -
2021 : Learning by Watching: Physical Imitation of Manipulation Skills from Human Videos »
Haoyu Xiong · Yun-Chun Chen · Homanga Bharadhwaj · Samrath Sinha · Animesh Garg -
2022 : VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path »
Romina Abachi · Claas Voelcker · Animesh Garg · Amir-massoud Farahmand -
2022 : MoCoDA: Model-based Counterfactual Data Augmentation »
Silviu Pitis · Elliot Creager · Ajay Mandlekar · Animesh Garg -
2023 : Boosting Off-policy RL with Policy Representation and Policy-extended Value Function Approximator »
Min Zhang · Jianye Hao · Hongyao Tang · Yan Zheng -
2023 : A Policy-Decoupled Method for High-Quality Data Augmentation in Offline Reinforcement Learning »
Shixi Lian · Yi Ma · Jinyi Liu · Jianye Hao · Yan Zheng · Zhaopeng Meng -
2023 : Improving Offline-to-Online Reinforcement Learning with Q-Ensembles »
Kai Zhao · Yi Ma · Jinyi Liu · Jianye Hao · Yan Zheng · Zhaopeng Meng -
2023 : Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning »
Jinyi Liu · Yi Ma · Jianye Hao · Yujing Hu · Yan Zheng · Tangjie Lv · Changjie Fan -
2023 Poster: Behavior Contrastive Learning for Unsupervised Skill Discovery »
Rushuai Yang · Chenjia Bai · Hongyi Guo · Siyuan Li · Bin Zhao · Zhen Wang · Peng Liu · Xuelong Li -
2023 Poster: RACE: Improve Multi-Agent Reinforcement Learning with Representation Asymmetry and Collaborative Evolution »
Pengyi Li · Jianye Hao · Hongyao Tang · Yan Zheng · Xian Fu -
2023 Poster: MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL »
Fei Ni · Jianye Hao · Yao Mu · Yifu Yuan · Yan Zheng · Bin Wang · Zhixuan Liang -
2023 Poster: Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning »
Yulai Zhao · Zhuoran Yang · Zhaoran Wang · Jason Lee -
2023 Poster: Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments »
Yixuan Wang · Simon Zhan · Ruochen Jiao · Zhilu Wang · Wanxin Jin · Zhuoran Yang · Zhaoran Wang · Chao Huang · Qi Zhu -
2023 Poster: Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics »
Shenao Zhang · Wanxin Jin · Zhaoran Wang -
2023 Poster: ChiPFormer: Transferable Chip Placement via Offline Decision Transformer »
Yao LAI · Jinxin Liu · Zhentao Tang · Bin Wang · Jianye Hao · Ping Luo -
2023 Poster: Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints »
Jiayang Li · Jing Yu · Boyi Liu · Yu Nie · Zhaoran Wang -
2022 Poster: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Poster: Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics »
Matthias Weissenbacher · Samrath Sinha · Animesh Garg · Yoshinobu Kawahara -
2022 Spotlight: Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics »
Matthias Weissenbacher · Samrath Sinha · Animesh Garg · Yoshinobu Kawahara -
2022 Spotlight: Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes »
Hongyi Guo · Qi Cai · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation »
ZHIHAN LIU · Yufeng Zhang · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets »
Han Zhong · Wei Xiong · Jiyuan Tan · Liwei Wang · Tong Zhang · Zhaoran Wang · Zhuoran Yang -
2022 Poster: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Adaptive Model Design for Markov Decision Process »
Siyu Chen · Donglin Yang · Jiayang Li · Senmiao Wang · Zhuoran Yang · Zhaoran Wang -
2022 Spotlight: Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency »
Qi Cai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Poster: Individual Reward Assisted Multi-Agent Reinforcement Learning »
Li Wang · Yupeng Zhang · Yujing Hu · Weixun Wang · Chongjie Zhang · Yang Gao · Jianye Hao · Tangjie Lv · Changjie Fan -
2022 Poster: PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration »
Pengyi Li · Hongyao Tang · Tianpei Yang · Xiaotian Hao · Tong Sang · Yan Zheng · Jianye Hao · Matthew Taylor · Wenyuan Tao · Zhen Wang -
2022 Poster: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2022 Poster: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Poster: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Individual Reward Assisted Multi-Agent Reinforcement Learning »
Li Wang · Yupeng Zhang · Yujing Hu · Weixun Wang · Chongjie Zhang · Yang Gao · Jianye Hao · Tangjie Lv · Changjie Fan -
2022 Spotlight: Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning »
Boxiang Lyu · Zhaoran Wang · Mladen Kolar · Zhuoran Yang -
2022 Spotlight: PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration »
Pengyi Li · Hongyao Tang · Tianpei Yang · Xiaotian Hao · Tong Sang · Yan Zheng · Jianye Hao · Matthew Taylor · Wenyuan Tao · Zhen Wang -
2022 Spotlight: Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation »
Xiaoyu Chen · Han Zhong · Zhuoran Yang · Zhaoran Wang · Liwei Wang -
2022 Spotlight: Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning »
Shuang Qiu · Lingxiao Wang · Chenjia Bai · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games »
Hongyi Guo · Zuyue Fu · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Poster: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang -
2021 Spotlight: Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport »
Lewis Liu · Yufeng Zhang · Zhuoran Yang · Reza Babanezhad · Zhaoran Wang -
2021 Spotlight: Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin LIANG -
2021 Spotlight: Randomized Exploration in Reinforcement Learning with General Value Function Approximation »
Haque Ishfaq · Qiwen Cui · Viet Nguyen · Alex Ayoub · Zhuoran Yang · Zhaoran Wang · Doina Precup · Lin Yang -
2021 Poster: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Spotlight: Value Iteration in Continuous Actions, States and Time »
Michael Lutter · Shie Mannor · Jan Peters · Dieter Fox · Animesh Garg -
2021 Oral: On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game »
Shuang Qiu · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Oral: Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions »
Shuang Qiu · Xiaohan Wei · Jieping Ye · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Is Pessimism Provably Efficient for Offline RL? »
Ying Jin · Zhuoran Yang · Zhaoran Wang -
2021 Spotlight: Learning While Playing in Mean-Field Games: Convergence and Optimality »
Qiaomin Xie · Zhuoran Yang · Zhaoran Wang · Andreea Minca -
2021 Poster: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Poster: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition »
Bo Liu · Qiang Liu · Peter Stone · Animesh Garg · Yuke Zhu · Anima Anandkumar -
2021 Spotlight: Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning »
Anuj Mahajan · Mikayel Samvelyan · Lei Mao · Viktor Makoviychuk · Animesh Garg · Jean Kossaifi · Shimon Whiteson · Yuke Zhu · Anima Anandkumar -
2021 Oral: Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition »
Bo Liu · Qiang Liu · Peter Stone · Animesh Garg · Yuke Zhu · Anima Anandkumar -
2021 Oral: Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Semi-Supervised StyleGAN for Disentanglement Learning »
Weili Nie · Tero Karras · Animesh Garg · Shoubhik Debnath · Anjul Patney · Ankit Patel · Anima Anandkumar -
2020 Poster: Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Angular Visual Hardness »
Beidi Chen · Weiyang Liu · Zhiding Yu · Jan Kautz · Anshumali Shrivastava · Animesh Garg · Anima Anandkumar -
2020 Poster: Provably Efficient Exploration in Policy Optimization »
Qi Cai · Zhuoran Yang · Chi Jin · Zhaoran Wang -
2020 Poster: On the Global Optimality of Model-Agnostic Meta-Learning »
Lingxiao Wang · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees »
Sen Na · Yuwei Luo · Zhuoran Yang · Zhaoran Wang · Mladen Kolar -
2020 Poster: Q-value Path Decomposition for Deep Multiagent Reinforcement Learning »
Yaodong Yang · Jianye Hao · Guangyong Chen · Hongyao Tang · Yingfeng Chen · Yujing Hu · Changjie Fan · Zhongyu Wei -
2020 Poster: Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising »
Xiaotian Hao · Zhaoqing Peng · Yi Ma · Guan Wang · Junqi Jin · Jianye Hao · Shan Chen · Rongquan Bai · Mingzhou Xie · Miao Xu · Zhenzhe Zheng · Chuan Yu · HAN LI · Jian Xu · Kun Gai -
2019 Poster: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2019 Oral: On the statistical rate of nonlinear recovery in generative models with heavy-tailed data »
Xiaohan Wei · Zhuoran Yang · Zhaoran Wang -
2019 Poster: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI »
Lei Han · Peng Sun · Yali Du · Jiechao Xiong · Qing Wang · Xinghai Sun · Han Liu · Tong Zhang -
2019 Oral: Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI »
Lei Han · Peng Sun · Yali Du · Jiechao Xiong · Qing Wang · Xinghai Sun · Han Liu · Tong Zhang -
2018 Poster: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang -
2018 Poster: Candidates vs. Noises Estimation for Large Multi-Class Classification Problem »
Lei Han · Yiheng Huang · Tong Zhang -
2018 Oral: Candidates vs. Noises Estimation for Large Multi-Class Classification Problem »
Lei Han · Yiheng Huang · Tong Zhang -
2018 Oral: The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference »
Hao Lu · Yuan Cao · Junwei Lu · Han Liu · Zhaoran Wang