Timezone: »
Poster
Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation
Orin Levy · Alon Cohen · Asaf Cassel · Yishay Mansour
We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an $\widetilde{O}(H^{2.5} \sqrt{ T|S||A| ( \mathcal{R}\_{TH}(\mathcal{O}) + H \log(\delta^{-1}) )})$ regret guarantee, with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon and $\mathcal{R}\_{TH}(\mathcal{O}) = \mathcal{R}\_{TH}(\mathcal{O}\_{sq}^\mathcal{F}) + \mathcal{R}\_{TH}(\mathcal{O}\_{log}^\mathcal{P})$ is the sum of the square and log-loss regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient rate optimal regret minimization algorithm for adversarial CMDPs that operates under the minimal standard assumption of online function approximation.
Author Information
Orin Levy (Tel Aviv University)
Alon Cohen (Technion and Google)
Asaf Cassel (Tel Aviv University)
Yishay Mansour (Google and Tel Aviv University)
More from the Same Authors
-
2021 : Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 : Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 : Learning Adversarial Markov Decision Processes with Delayed Feedback »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 : Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP »
Orin Levy · Yishay Mansour -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2023 Oral: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: Reinforcement Learning Can Be More Efficient with Multiple Rewards »
Christoph Dann · Yishay Mansour · Mehryar Mohri -
2023 Poster: Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation »
Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Regret Minimization and Convergence to Equilibria in General-sum Markov Games »
Liad Erez · Tal Lancewicki · Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Concurrent Shuffle Differential Privacy Under Continual Observation »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2023 Poster: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Poster: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2022 Oral: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Spotlight: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2021 Poster: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Spotlight: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Poster: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Poster: Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √T Regret »
Asaf Cassel · Tomer Koren -
2021 Poster: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Spotlight: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Spotlight: Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √T Regret »
Asaf Cassel · Tomer Koren -
2021 Poster: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2020 Poster: Near-optimal Regret Bounds for Stochastic Shortest Path »
Aviv Rosenberg · Alon Cohen · Yishay Mansour · Haim Kaplan -
2020 Poster: Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently »
Asaf Cassel · Alon Cohen · Tomer Koren -
2019 Poster: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Poster: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Poster: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Oral: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Oral: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour