Timezone: »

Learning Optimal Advantage from Preferences and Mistaking it for Reward
William Knox · Stephane Hatgis-Kessell · Sigurdur Adalgeirsson · Serena Booth · Anca Dragan · Peter Stone · Scott Niekum

Fri Jul 28 05:30 PM -- 05:45 PM (PDT) @
Event URL: https://openreview.net/forum?id=euZXhbTmQ7 »

Most recent work that involves learning reward functions from human preferences over pairs of trajectory segments---as used in reinforcement learning from human feedback (RLHF), including for ChatGPT and many contemporary language models---are built with the assumption that such human preferences are generated based only upon the reward accrued within those segments, which we call their partial return.But if this assumption is false because people base their preferences on information other than partial return, then what type of function is their algorithm learning from preferences? We argue that this function is better thought of as an approximation of the optimal advantage function, not a reward function as previously believed.

Author Information

William Knox (Bosch / UT Austin)

Brad co-leads the Bosch Learning Agents Lab, which is housed at UT Austin and focuses on the development of machine learning algorithms for autonomous driving. His research has largely had at least one foot in either reinforcement learning or human-robot interaction from machine learning. Brad’s dissertation, “Learning from Human-Generated Reward”, comprised early pioneering work on human-in-the-loop reinforcement learning and won the 2012 best dissertation award for the UT Austin Department of Computer Science. His postdoctoral research at the MIT Media Lab focused on creating interactive characters through machine learning on puppetry-style demonstrations of interaction. Before joining Bosch, Brad founded and sold his startup Bots Alive, working in the toy robotics sector. He has won multiple best paper awards and was named to IEEE Intelligent System’s AI’s 10 to Watch in 2013.

Stephane Hatgis-Kessell (University of Texas at Austin)
Sigurdur Adalgeirsson (Google Research)
Serena Booth (Massachusetts Institute of Technology)
Anca Dragan (University of California, Berkeley)
Peter Stone (The University of Texas at Austin and Sony AI)
Scott Niekum (University of Massachusetts at Amherst)

More from the Same Authors