Timezone: »

 
Poster
On the Effectiveness of Offline RL for Dialogue Response Generation
Paloma Sodhi · Felix Wu · Ethan Elenberg · Kilian Weinberger · Ryan Mcdonald

Wed Jul 26 02:00 PM -- 03:30 PM (PDT) @ Exhibit Hall 1 #321

A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.

Author Information

Paloma Sodhi (ASAPP)
Felix Wu (ASAPP Inc.)
Ethan Elenberg (ASAPP)
Kilian Weinberger (Cornell University)

Kilian Weinberger is an Associate Professor in the Department of Computer Science at Cornell University. He received his Ph.D. from the University of Pennsylvania in Machine Learning under the supervision of Lawrence Saul and his undergraduate degree in Mathematics and Computer Science from the University of Oxford. During his career he has won several best paper awards at ICML, CVPR, AISTATS and KDD (runner-up award). In 2011 he was awarded the Outstanding AAAI Senior Program Chair Award and in 2012 he received an NSF CAREER award. He was elected co-Program Chair for ICML 2016 and for AAAI 2018. Kilian Weinberger's research focuses on Machine Learning and its applications. In particular, he focuses on learning under resource constraints, metric learning, machine learned web-search ranking, computer vision and deep learning. Before joining Cornell University, he was an Associate Professor at Washington University in St. Louis and before that he worked as a research scientist at Yahoo! Research in Santa Clara.

Ryan Mcdonald (ASAPP)

More from the Same Authors