Timezone: »
When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fundamental difficulty is that the Bellman operator may become an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov's smoothing technique and the Legendre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm's sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.
Author Information
Bo Dai (Georgia Institute of Technology)
Albert Shaw (Georgia Tech)
Lihong Li (Google Inc.)
Lin Xiao (Microsoft Research)
Niao He (UIUC)
Zhen Liu (Georgia Tech)
Jianshu Chen (Microsoft Research)
Le Song (Georgia Institute of Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2018 Oral: SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation »
Thu. Jul 12th 09:20 -- 09:40 AM Room A1
More from the Same Authors
-
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 : RL Foundation Panel »
Matthew Botvinick · Thomas Dietterich · Leslie Kaelbling · John Langford · Warrren B Powell · Csaba Szepesvari · Lihong Li · Yuxi Li -
2021 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Minmin Chen · Omer Gottesman · Lihong Li · Zongqing Lu · Rupam Mahmood · Niranjani Prasad · Zhiwei (Tony) Qin · Csaba Szepesvari · Matthew Taylor -
2021 Poster: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Spotlight: Near-Optimal Representation Learning for Linear Bandits and Linear RL »
Jiachen Hu · Xiaoyu Chen · Chi Jin · Lihong Li · Liwei Wang -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Workshop: Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond »
Jian Tang · Le Song · Jure Leskovec · Renjie Liao · Yujia Li · Sanja Fidler · Richard Zemel · Ruslan Salakhutdinov -
2020 : Opening Remarks: Jian Tang & Le Song »
Jian Tang · Le Song -
2020 : Industry Panel - Talk by Lin Xiao - Statistical Adaptive Stochastic Gradient Methods »
Lin Xiao -
2020 Poster: Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization »
Hadrien Hendrikx · Lin Xiao · Sebastien Bubeck · Francis Bach · Laurent Massoulié -
2020 Poster: Batch Stationary Distribution Estimation »
Junfeng Wen · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Poster: Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search »
Binghong Chen · Chengtao Li · Hanjun Dai · Le Song -
2020 Poster: Temporal Logic Point Processes »
Shuang Li · Lu Wang · Ruizhi Zhang · xiaofu Chang · Xuqin Liu · Yao Xie · Yuan Qi · Le Song -
2020 Poster: Learning To Stop While Learning To Predict »
Xinshi Chen · Hanjun Dai · Yu Li · Xin Gao · Le Song -
2020 Poster: Neural Contextual Bandits with UCB-based Exploration »
Dongruo Zhou · Lihong Li · Quanquan Gu -
2019 Workshop: Reinforcement Learning for Real Life »
Yuxi Li · Alborz Geramifard · Lihong Li · Csaba Szepesvari · Tao Wang -
2019 Poster: Target-Based Temporal-Difference Learning »
Donghwan Lee · Niao He -
2019 Oral: Target-Based Temporal-Difference Learning »
Donghwan Lee · Niao He -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Poster: A Composite Randomized Incremental Gradient Method »
Junyu Zhang · Lin Xiao -
2019 Poster: Particle Flow Bayes' Rule »
Xinshi Chen · Hanjun Dai · Le Song -
2019 Poster: Generative Adversarial User Model for Reinforcement Learning Based Recommendation System »
Xinshi Chen · Shuang Li · Hui Li · Shaohua Jiang · Yuan Qi · Le Song -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Generative Adversarial User Model for Reinforcement Learning Based Recommendation System »
Xinshi Chen · Shuang Li · Hui Li · Shaohua Jiang · Yuan Qi · Le Song -
2019 Oral: Particle Flow Bayes' Rule »
Xinshi Chen · Hanjun Dai · Le Song -
2019 Oral: A Composite Randomized Incremental Gradient Method »
Junyu Zhang · Lin Xiao -
2018 Poster: Adversarial Attack on Graph Structured Data »
Hanjun Dai · Hui Li · Tian Tian · Xin Huang · Lin Wang · Jun Zhu · Le Song -
2018 Poster: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Poster: Towards Black-box Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Xingguo Li · Zhen Liu · James Rehg · Le Song -
2018 Oral: Towards Black-box Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Xingguo Li · Zhen Liu · James Rehg · Le Song -
2018 Oral: Adversarial Attack on Graph Structured Data »
Hanjun Dai · Hui Li · Tian Tian · Xin Huang · Lin Wang · Jun Zhu · Le Song -
2018 Oral: Scalable Bilinear Pi Learning Using State and Action Features »
Yichen Chen · Lihong Li · Mengdi Wang -
2018 Poster: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation »
Jianbo Chen · Le Song · Martin Wainwright · Michael Jordan -
2018 Poster: Stochastic Training of Graph Convolutional Networks with Variance Reduction »
Jianfei Chen · Jun Zhu · Le Song -
2018 Poster: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2018 Oral: Stochastic Training of Graph Convolutional Networks with Variance Reduction »
Jianfei Chen · Jun Zhu · Le Song -
2018 Oral: Learning Steady-States of Iterative Algorithms over Graphs »
Hanjun Dai · Zornitsa Kozareva · Bo Dai · Alex Smola · Le Song -
2018 Oral: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation »
Jianbo Chen · Le Song · Martin Wainwright · Michael Jordan -
2017 Poster: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Poster: Variational Policy for Guiding Point Processes »
Yichen Wang · Grady Williams · Evangelos Theodorou · Le Song -
2017 Talk: Stochastic Generative Hashing »
Bo Dai · Ruiqi Guo · Sanjiv Kumar · Niao He · Le Song -
2017 Talk: Variational Policy for Guiding Point Processes »
Yichen Wang · Grady Williams · Evangelos Theodorou · Le Song -
2017 Poster: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs »
Rakshit Trivedi · Hanjun Dai · Yichen Wang · Le Song -
2017 Talk: Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs »
Rakshit Trivedi · Hanjun Dai · Yichen Wang · Le Song -
2017 Talk: Stochastic Variance Reduction Methods for Policy Evaluation »
Simon Du · Jianshu Chen · Lihong Li · Lin Xiao · Dengyong Zhou -
2017 Poster: Provably Optimal Algorithms for Generalized Linear Contextual Bandits »
Lihong Li · Yu Lu · Dengyong Zhou -
2017 Poster: Fake News Mitigation via Point Process Based Intervention »
Mehrdad Farajtabar · Jiachen Yang · Xiaojing Ye · Huan Xu · Rakshit Trivedi · Elias Khalil · Shuang Li · Le Song · Hongyuan Zha -
2017 Poster: Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Ahmad Humayun · Charlene Tay · Chen Yu · Linda Smith · James Rehg · Le Song -
2017 Poster: Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms »
Jialei Wang · Lin Xiao -
2017 Talk: Iterative Machine Teaching »
Weiyang Liu · Bo Dai · Ahmad Humayun · Charlene Tay · Chen Yu · Linda Smith · James Rehg · Le Song -
2017 Talk: Provably Optimal Algorithms for Generalized Linear Contextual Bandits »
Lihong Li · Yu Lu · Dengyong Zhou -
2017 Talk: Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms »
Jialei Wang · Lin Xiao -
2017 Talk: Fake News Mitigation via Point Process Based Intervention »
Mehrdad Farajtabar · Jiachen Yang · Xiaojing Ye · Huan Xu · Rakshit Trivedi · Elias Khalil · Shuang Li · Le Song · Hongyuan Zha