Timezone: »
Fluid human–agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are likely to provide. This work analyzes the COnvergent Actor–Critic by Humans (COACH) algorithm under three different types of feedback—policy feedback, reward feedback, and advantage feedback. For these three feedback types, we find that COACH can behave sub-optimally. We propose a variant of COACH, episodic COACH (E-COACH), which we prove converges for all three types. We compare our COACH variant with two other reinforcement learning algorithms: Q-learning and TAMER.
Author Information
Ishaan Shah (Brown University)
David Halpern (Brown University)
Michael L. Littman (Brown University)
Kavosh Asadi (Brown University)
More from the Same Authors
-
2021 : Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback »
Ishaan Shah -
2021 : Continuous Doubly Constrained Batch Reinforcement Learning »
Rasool Fakoor · Jonas Mueller · Kavosh Asadi · Pratik Chaudhari · Alex Smola -
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2023 : Specifying Behavior Preference with Tiered Reward Functions »
Zhiyuan Zhou · Henry Sowerby · Michael L. Littman -
2023 Poster: Meta-learning Parameterized Skills »
Haotian Fu · Shangqun Yu · Saket Tiwari · Michael L. Littman · George Konidaris -
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2021 : Poster »
Shiji Zhou · Nastaran Okati · Wichinpong Sinchaisri · Kim de Bie · Ana Lucic · Mina Khan · Ishaan Shah · JINGHUI LU · Andreas Kirsch · Julius Frost · Ze Gong · Gokul Swamy · Ah Young Kim · Ahmed Baruwa · Ranganath Krishnan -
2019 Poster: Finding Options that Minimize Planning Time »
Yuu Jinnai · David Abel · David Hershkowitz · Michael L. Littman · George Konidaris -
2019 Oral: Finding Options that Minimize Planning Time »
Yuu Jinnai · David Abel · David Hershkowitz · Michael L. Littman · George Konidaris -
2018 Poster: State Abstractions for Lifelong Reinforcement Learning »
David Abel · Dilip S. Arumugam · Lucas Lehnert · Michael L. Littman -
2018 Oral: State Abstractions for Lifelong Reinforcement Learning »
David Abel · Dilip S. Arumugam · Lucas Lehnert · Michael L. Littman -
2018 Poster: Policy and Value Transfer in Lifelong Reinforcement Learning »
David Abel · Yuu Jinnai · Sophie Guo · George Konidaris · Michael L. Littman -
2018 Oral: Policy and Value Transfer in Lifelong Reinforcement Learning »
David Abel · Yuu Jinnai · Sophie Guo · George Konidaris · Michael L. Littman -
2018 Poster: Lipschitz Continuity in Model-based Reinforcement Learning »
Kavosh Asadi · Dipendra Misra · Michael L. Littman -
2018 Oral: Lipschitz Continuity in Model-based Reinforcement Learning »
Kavosh Asadi · Dipendra Misra · Michael L. Littman -
2017 Poster: An Alternative Softmax Operator for Reinforcement Learning »
Kavosh Asadi · Michael L. Littman -
2017 Poster: Interactive Learning from Policy-Dependent Human Feedback »
James MacGlashan · Mark Ho · Robert Loftin · Bei Peng · Guan Wang · David L Roberts · Matthew E. Taylor · Michael L. Littman -
2017 Talk: Interactive Learning from Policy-Dependent Human Feedback »
James MacGlashan · Mark Ho · Robert Loftin · Bei Peng · Guan Wang · David L Roberts · Matthew E. Taylor · Michael L. Littman -
2017 Talk: An Alternative Softmax Operator for Reinforcement Learning »
Kavosh Asadi · Michael L. Littman