Timezone: »
A common training technique for language models is teacher forcing (TF). TF attempts to match human language exactly, even though identical meanings can be expressed in different ways. This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives. We present a comprehensive evaluation across multiple datasets, models, and metrics. Offline RL shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.
Author Information
Paloma Sodhi (ASAPP)
Felix Wu (ASAPP Inc.)
Ethan Elenberg (ASAPP)
Kilian Weinberger (Cornell University)
Kilian Weinberger is an Associate Professor in the Department of Computer Science at Cornell University. He received his Ph.D. from the University of Pennsylvania in Machine Learning under the supervision of Lawrence Saul and his undergraduate degree in Mathematics and Computer Science from the University of Oxford. During his career he has won several best paper awards at ICML, CVPR, AISTATS and KDD (runner-up award). In 2011 he was awarded the Outstanding AAAI Senior Program Chair Award and in 2012 he received an NSF CAREER award. He was elected co-Program Chair for ICML 2016 and for AAAI 2018. Kilian Weinberger's research focuses on Machine Learning and its applications. In particular, he focuses on learning under resource constraints, metric learning, machine learned web-search ranking, computer vision and deep learning. Before joining Cornell University, he was an Associate Professor at Washington University in St. Louis and before that he worked as a research scientist at Yahoo! Research in Santa Clara.
Ryan Mcdonald (ASAPP)
More from the Same Authors
-
2023 Poster: IncDSI: Incrementally Updatable Document Retrieval »
Varsha Kishore · Chao Wan · Justin Lovelace · Yoav Artzi · Kilian Weinberger -
2023 Poster: Unsupervised Out-of-Distribution Detection with Diffusion Inpainting »
Zhenzhen Liu · Jin Zhou · Yufan Wang · Kilian Weinberger -
2021 Poster: Making Paper Reviewing Robust to Bid Manipulation Attacks »
Ruihan Wu · Chuan Guo · Felix Wu · Rahul Kidambi · Laurens van der Maaten · Kilian Weinberger -
2021 Spotlight: Making Paper Reviewing Robust to Bid Manipulation Attacks »
Ruihan Wu · Chuan Guo · Felix Wu · Rahul Kidambi · Laurens van der Maaten · Kilian Weinberger -
2021 Poster: Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision »
Johan Björck · Xiangyu Chen · Christopher De Sa · Carla Gomes · Kilian Weinberger -
2021 Spotlight: Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision »
Johan Björck · Xiangyu Chen · Christopher De Sa · Carla Gomes · Kilian Weinberger -
2019 : Panel Discussion (moderator: Tom Dietterich) »
Max Welling · Kilian Weinberger · Terrance Boult · Dawn Song · Thomas Dietterich -
2019 : Keynote by Kilian Weinberger: On Calibration and Fairness »
Kilian Weinberger -
2019 Poster: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Oral: Simple Black-box Adversarial Attacks »
Chuan Guo · Jacob Gardner · Yurong You · Andrew Wilson · Kilian Weinberger -
2019 Poster: Simplifying Graph Convolutional Networks »
Felix Wu · Amauri Souza · Tianyi Zhang · Christopher Fifty · Tao Yu · Kilian Weinberger -
2019 Oral: Simplifying Graph Convolutional Networks »
Felix Wu · Amauri Souza · Tianyi Zhang · Christopher Fifty · Tao Yu · Kilian Weinberger -
2018 Poster: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson -
2018 Oral: Constant-Time Predictive Distributions for Gaussian Processes »
Geoff Pleiss · Jacob Gardner · Kilian Weinberger · Andrew Wilson -
2017 Poster: On Approximation Guarantees for Greedy Low Rank Optimization »
RAJIV KHANNA · Ethan R. Elenberg · Alexandros Dimakis · Joydeep Ghosh · Sahand Negahban -
2017 Talk: On Approximation Guarantees for Greedy Low Rank Optimization »
RAJIV KHANNA · Ethan R. Elenberg · Alexandros Dimakis · Joydeep Ghosh · Sahand Negahban -
2017 Poster: On Calibration of Modern Neural Networks »
Chuan Guo · Geoff Pleiss · Yu Sun · Kilian Weinberger -
2017 Talk: On Calibration of Modern Neural Networks »
Chuan Guo · Geoff Pleiss · Yu Sun · Kilian Weinberger