Timezone: »
This paper investigates the problem of interactively learning behaviors communicated by a human teacher using positive and negative feedback. Much previous work on this problem has made the assumption that people provide feedback for decisions that is dependent on the behavior they are teaching and is independent from the learner's current policy. We present empirical results that show this assumption to be false---whether human trainers give a positive or negative feedback for a decision is influenced by the learner's current policy. Based on this insight, we introduce Convergent Actor-Critic by Humans (COACH), an algorithm for learning from policy-dependent feedback that converges to a local optimum. Finally, we demonstrate that COACH can successfully learn multiple behaviors on a physical robot.
Author Information
James MacGlashan (Cogitai)
Mark Ho (Brown University)
Robert Loftin (North Carolina State University)
Bei Peng (Washington State University)
Guan Wang (Brown University)
David L Roberts (North Carolina State University)
Matthew E. Taylor (Washington State University)
Michael L. Littman (Brown University)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Talk: Interactive Learning from Policy-Dependent Human Feedback »
Mon. Aug 7th 05:48 -- 06:06 AM Room C4.5
More from the Same Authors
-
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2021 : Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback »
Ishaan Shah · David Halpern · Michael L. Littman · Kavosh Asadi -
2023 Poster: Meta-learning Parameterized Skills »
Haotian Fu · Shangqun Yu · Saket Tiwari · Michael L. Littman · George Konidaris -
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2019 Poster: Finding Options that Minimize Planning Time »
Yuu Jinnai · David Abel · David Hershkowitz · Michael L. Littman · George Konidaris -
2019 Oral: Finding Options that Minimize Planning Time »
Yuu Jinnai · David Abel · David Hershkowitz · Michael L. Littman · George Konidaris -
2018 Poster: State Abstractions for Lifelong Reinforcement Learning »
David Abel · Dilip S. Arumugam · Lucas Lehnert · Michael L. Littman -
2018 Oral: State Abstractions for Lifelong Reinforcement Learning »
David Abel · Dilip S. Arumugam · Lucas Lehnert · Michael L. Littman -
2018 Poster: Policy and Value Transfer in Lifelong Reinforcement Learning »
David Abel · Yuu Jinnai · Sophie Guo · George Konidaris · Michael L. Littman -
2018 Oral: Policy and Value Transfer in Lifelong Reinforcement Learning »
David Abel · Yuu Jinnai · Sophie Guo · George Konidaris · Michael L. Littman -
2018 Poster: Lipschitz Continuity in Model-based Reinforcement Learning »
Kavosh Asadi · Dipendra Misra · Michael L. Littman -
2018 Oral: Lipschitz Continuity in Model-based Reinforcement Learning »
Kavosh Asadi · Dipendra Misra · Michael L. Littman -
2017 Poster: An Alternative Softmax Operator for Reinforcement Learning »
Kavosh Asadi · Michael L. Littman -
2017 Talk: An Alternative Softmax Operator for Reinforcement Learning »
Kavosh Asadi · Michael L. Littman