Skip to yearly menu bar Skip to main content


Poster
in
Workshop: RLxF: RL from World Feedback

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

Yinghui He ⋅ Simran Kaur ⋅ Adithya Bhaskar ⋅ Yongjin Yang ⋅ Jiarui Liu ⋅ Narutatsu Ri ⋅ Liam Fowl ⋅ Abhishek Panigrahi ⋅ Danqi Chen ⋅ Sanjeev Arora

Abstract

Log in and register to view live content