Skip to yearly menu bar Skip to main content


Poster
in
Workshop: RLxF: RL from World Feedback

Multi-Rollout On-Policy Distillation via Peer Successes and Failures

Weichen Yu ⋅ Xiaomin Li ⋅ Yizhou Zhao ⋅ Xiaoze Liu ⋅ RUOWANG ZHANG ⋅ Haixin Wang ⋅ Yinyi Luo ⋅ Chen Wu ⋅ Gaurav Mittal ⋅ Matt Fredrikson ⋅ Yu Hu

Abstract

Log in and register to view live content