Timezone: »
Reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement learning hardness called the bad-policy density. This quantity measures the fraction of the deterministic stationary policy space that is below a desired threshold in value. We prove that this simple quantity has many properties one would expect of a measure of learning hardness. Further, we prove it is NP-hard to compute the measure in general, but there are paths to polynomial-time approximation. We conclude by summarizing potential directions and uses for this measure.
Author Information
David Abel (DeepMind)
Cameron Allen (Brown University)
Dilip Arumugam (Stanford University)
D Ellis Hershkowitz (Carnegie Mellon University)
Michael L. Littman (Brown University)
Lawson Wong (Northeastern University)
More from the Same Authors
-
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2021 : Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback »
Ishaan Shah · David Halpern · Michael L. Littman · Kavosh Asadi -
2022 : Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2023 : Specifying Behavior Preference with Tiered Reward Functions »
Zhiyuan Zhou · Henry Sowerby · Michael L. Littman -
2023 : Can Euclidean Symmetry Help in Reinforcement Learning and Planning »
Linfeng Zhao · Owen Howell · Jung Yeon Park · Xupeng Zhu · Robin Walters · Lawson Wong -
2023 Oral: Settling the Reward Hypothesis »
Michael Bowling · John Martin · David Abel · Will Dabney -
2023 Poster: Settling the Reward Hypothesis »
Michael Bowling · John Martin · David Abel · Will Dabney -
2023 Poster: Meta-learning Parameterized Skills »
Haotian Fu · Shangqun Yu · Saket Tiwari · Michael L. Littman · George Konidaris -
2022 Poster: Toward Compositional Generalization in Object-Oriented World Modeling »
Linfeng Zhao · Lingzhi Kong · Robin Walters · Lawson Wong -
2022 Oral: Toward Compositional Generalization in Object-Oriented World Modeling »
Linfeng Zhao · Lingzhi Kong · Robin Walters · Lawson Wong -
2021 Poster: Deciding What to Learn: A Rate-Distortion Approach »
Dilip Arumugam · Benjamin Van Roy -
2021 Spotlight: Deciding What to Learn: A Rate-Distortion Approach »
Dilip Arumugam · Benjamin Van Roy -
2021 Poster: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Spotlight: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2020 Poster: Flexible and Efficient Long-Range Planning Through Curious Exploration »
Aidan Curtis · Minjian Xin · Dilip Arumugam · Kevin Feigelis · Daniel Yamins -
2020 Poster: What can I do here? A Theory of Affordances in Reinforcement Learning »
Khimya Khetarpal · Zafarali Ahmed · Gheorghe Comanici · David Abel · Doina Precup -
2019 Poster: Finding Options that Minimize Planning Time »
Yuu Jinnai · David Abel · David Hershkowitz · Michael L. Littman · George Konidaris -
2019 Oral: Finding Options that Minimize Planning Time »
Yuu Jinnai · David Abel · David Hershkowitz · Michael L. Littman · George Konidaris -
2019 Poster: Discovering Options for Exploration by Minimizing Cover Time »
Yuu Jinnai · Jee Won Park · David Abel · George Konidaris -
2019 Oral: Discovering Options for Exploration by Minimizing Cover Time »
Yuu Jinnai · Jee Won Park · David Abel · George Konidaris -
2018 Poster: State Abstractions for Lifelong Reinforcement Learning »
David Abel · Dilip S. Arumugam · Lucas Lehnert · Michael L. Littman -
2018 Oral: State Abstractions for Lifelong Reinforcement Learning »
David Abel · Dilip S. Arumugam · Lucas Lehnert · Michael L. Littman -
2018 Poster: Policy and Value Transfer in Lifelong Reinforcement Learning »
David Abel · Yuu Jinnai · Sophie Guo · George Konidaris · Michael L. Littman -
2018 Oral: Policy and Value Transfer in Lifelong Reinforcement Learning »
David Abel · Yuu Jinnai · Sophie Guo · George Konidaris · Michael L. Littman -
2018 Poster: Lipschitz Continuity in Model-based Reinforcement Learning »
Kavosh Asadi · Dipendra Misra · Michael L. Littman -
2018 Oral: Lipschitz Continuity in Model-based Reinforcement Learning »
Kavosh Asadi · Dipendra Misra · Michael L. Littman -
2017 Poster: An Alternative Softmax Operator for Reinforcement Learning »
Kavosh Asadi · Michael L. Littman -
2017 Poster: Interactive Learning from Policy-Dependent Human Feedback »
James MacGlashan · Mark Ho · Robert Loftin · Bei Peng · Guan Wang · David L Roberts · Matthew E. Taylor · Michael L. Littman -
2017 Talk: Interactive Learning from Policy-Dependent Human Feedback »
James MacGlashan · Mark Ho · Robert Loftin · Bei Peng · Guan Wang · David L Roberts · Matthew E. Taylor · Michael L. Littman -
2017 Talk: An Alternative Softmax Operator for Reinforcement Learning »
Kavosh Asadi · Michael L. Littman