Timezone: »
The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.
Author Information
Gergely Neu (Universitat Pompeu Fabra / Google Brain Zürich)
More from the Same Authors
-
2022 : Online Learning with Off-Policy Feedback »
Germano Gabbianelli · Matteo Papini · Gergely Neu -
2023 Poster: Optimistic Planning by Regularized Dynamic Programming »
Antoine Moulin · Gergely Neu -
2020 : Speaker Panel »
Csaba Szepesvari · Martha White · Sham Kakade · Gergely Neu · Shipra Agrawal · Akshay Krishnamurthy -
2017 Poster: Algorithmic Stability and Hypothesis Complexity »
Tongliang Liu · Gábor Lugosi · Gergely Neu · Dacheng Tao -
2017 Talk: Algorithmic Stability and Hypothesis Complexity »
Tongliang Liu · Gábor Lugosi · Gergely Neu · Dacheng Tao