Timezone: »
Reward design is one of the most critical and challenging aspects when formulating a task as a reinforcement learning (RL) problem. In practice, it often takes several attempts of reward specification and learning with it in order to find one that leads to sample-efficient learning of the desired behavior. Instead, in this work, we study whether directly incorporating multiple alternate reward formulations of the same task in a single agent can lead to faster learning. We analyze multi-reward extensions of action-elimination algorithms and prove more favorable instance-dependent regret bounds compared to their single-reward counterparts, both in multi-armed bandits and in tabular Markov decision processes. Our bounds scale for each state-action pair with the inverse of the largest gap among all reward functions. This suggests that learning with multiple rewards can indeed be more sample-efficient, as long as the rewards agree on an optimal policy. We further prove that when rewards do not agree, multi-reward action elimination in multi-armed bandits still learns a policy that is good across all reward functions.
Author Information
Christoph Dann (Google)
Yishay Mansour (Google and Tel Aviv University)
Mehryar Mohri (Google Research and Courant Institute of Mathematical Sciences)
More from the Same Authors
-
2021 : Learning with User-Level Privacy »
Daniel A Levy · Ziteng Sun · Kareem Amin · Satyen Kale · Alex Kulesza · Mehryar Mohri · Ananda Theertha Suresh -
2021 : Minimax Regret for Stochastic Shortest Path »
Alon Cohen · Yonathan Efroni · Yishay Mansour · Aviv Rosenberg -
2021 : Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure »
Aviv Rosenberg · Yishay Mansour -
2021 : Learning Adversarial Markov Decision Processes with Delayed Feedback »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 : Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP »
Orin Levy · Yishay Mansour -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2023 : Ranking with Abstention »
Anqi Mao · Mehryar Mohri · Yutao Zhong -
2023 Poster: Learning in POMDPs is Sample-Efficient with Hindsight Observability »
Jonathan Lee · Alekh Agarwal · Christoph Dann · Tong Zhang -
2023 Oral: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Poster: $H$-Consistency Bounds for Pairwise Misranking Loss Surrogates »
Anqi Mao · Mehryar Mohri · Yutao Zhong -
2023 Poster: Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation »
Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Regret Minimization and Convergence to Equilibria in General-sum Markov Games »
Liad Erez · Tal Lancewicki · Uri Sherman · Tomer Koren · Yishay Mansour -
2023 Poster: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2023 Poster: Cross-Entropy Loss Functions: Theoretical Analysis and Applications »
Anqi Mao · Mehryar Mohri · Yutao Zhong -
2023 Poster: Concurrent Shuffle Differential Privacy Under Continual Observation »
Jay Tenenbaum · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2023 Poster: Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation »
Orin Levy · Alon Cohen · Asaf Cassel · Yishay Mansour -
2023 Poster: Random Classification Noise does not defeat All Convex Potential Boosters Irrespective of Model Choice »
Yishay Mansour · Richard Nock · Robert C. Williamson -
2023 Oral: Best of Both Worlds Policy Optimization »
Christoph Dann · Chen-Yu Wei · Julian Zimmert -
2022 : Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback »
Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg -
2022 Poster: Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation »
Chris Dann · Yishay Mansour · Mehryar Mohri · Ayush Sekhari · Karthik Sridharan -
2022 Spotlight: Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation »
Chris Dann · Yishay Mansour · Mehryar Mohri · Ayush Sekhari · Karthik Sridharan -
2022 Poster: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Poster: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2022 Poster: H-Consistency Bounds for Surrogate Loss Minimizers »
Pranjal Awasthi · Anqi Mao · Mehryar Mohri · Yutao Zhong -
2022 Oral: Cooperative Online Learning in Stochastic and Adversarial MDPs »
Tal Lancewicki · Aviv Rosenberg · Yishay Mansour -
2022 Oral: H-Consistency Bounds for Surrogate Loss Minimizers »
Pranjal Awasthi · Anqi Mao · Mehryar Mohri · Yutao Zhong -
2022 Spotlight: FriendlyCore: Practical Differentially Private Aggregation »
Eliad Tsfadia · Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer -
2021 Spotlight: A Discriminative Technique for Multiple-Source Adaptation »
Corinna Cortes · Mehryar Mohri · Ananda Theertha Suresh · Ningshan Zhang -
2021 Poster: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Poster: A Discriminative Technique for Multiple-Source Adaptation »
Corinna Cortes · Mehryar Mohri · Ananda Theertha Suresh · Ningshan Zhang -
2021 Spotlight: Differentially-Private Clustering of Easy Instances »
Edith Cohen · Haim Kaplan · Yishay Mansour · Uri Stemmer · Eliad Tsfadia -
2021 Poster: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Poster: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Relative Deviation Margin Bounds »
Corinna Cortes · Mehryar Mohri · Ananda Theertha Suresh -
2021 Spotlight: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Poster: Relative Deviation Margin Bounds »
Corinna Cortes · Mehryar Mohri · Ananda Theertha Suresh -
2021 Poster: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Spotlight: Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions »
Tal Lancewicki · Shahar Segal · Tomer Koren · Yishay Mansour -
2021 Poster: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2020 Poster: Near-optimal Regret Bounds for Stochastic Shortest Path »
Aviv Rosenberg · Alon Cohen · Yishay Mansour · Haim Kaplan -
2020 Poster: Adaptive Region-Based Active Learning »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Ningshan Zhang -
2020 Poster: Online Learning with Dependent Stochastic Feedback Graphs »
Corinna Cortes · Giulia DeSalvo · Claudio Gentile · Mehryar Mohri · Ningshan Zhang -
2020 Poster: SCAFFOLD: Stochastic Controlled Averaging for Federated Learning »
Sai Praneeth Reddy Karimireddy · Satyen Kale · Mehryar Mohri · Sashank Jakkam Reddi · Sebastian Stich · Ananda Theertha Suresh -
2020 Poster: Adversarial Learning Guarantees for Linear Hypotheses and Neural Networks »
Pranjal Awasthi · Natalie Frank · Mehryar Mohri -
2020 Poster: FedBoost: A Communication-Efficient Algorithm for Federated Learning »
Jenny Hamer · Mehryar Mohri · Ananda Theertha Suresh -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 Poster: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Poster: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Poster: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret »
Alon Cohen · Tomer Koren · Yishay Mansour -
2019 Oral: Adversarial Online Learning with noise »
Alon Resler · Yishay Mansour -
2019 Oral: Differentially Private Learning of Geometric Concepts »
Haim Kaplan · Yishay Mansour · Yossi Matias · Uri Stemmer -
2019 Oral: Online Convex Optimization in Adversarial Markov Decision Processes »
Aviv Rosenberg · Yishay Mansour -
2019 Poster: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2019 Oral: Policy Certificates: Towards Accountable Reinforcement Learning »
Christoph Dann · Lihong Li · Wei Wei · Emma Brunskill -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill