Skip to yearly menu bar Skip to main content


(4 events)   Timezone:  
Show all
Toggle Poster Visibility
Oral
Tue Jul 23 01:30 AM -- 01:45 AM (PDT) @ Hall C 1-3 None
Debating with More Persuasive LLMs Leads to More Truthful Answers
Akbir Khan · John Hughes · Dan Valentine · Laura Ruis · Kshitij Sachan · Ansh Radhakrishnan · Edward Grefenstette · Samuel Bowman · Tim Rocktäschel · Ethan Perez
[ Slides
Oral
Tue Jul 23 01:45 AM -- 02:00 AM (PDT) @ Hall C 1-3 None
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Collin Burns · Pavel Izmailov · Jan Kirchner · Bowen Baker · Leo Gao · Leopold Aschenbrenner · Yining Chen · Adrien Ecoffet · Manas Joglekar · Jan Leike · Ilya Sutskever · Jeffrey K Wu
Oral
Tue Jul 23 02:00 AM -- 02:15 AM (PDT) @ Hall C 1-3 None
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee · Xiaoyan Bai · Itamar Pres · Martin Wattenberg · Jonathan K. Kummerfeld · Rada Mihalcea
Oral
Tue Jul 23 02:15 AM -- 02:30 AM (PDT) @ Hall C 1-3 None
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu · Wei Fu · Jiaxuan Gao · Wenjie Ye · Weilin Liu · Zhiyu Mei · Guangju Wang · Chao Yu · Yi Wu