Skip to yearly menu bar Skip to main content


Poster

Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future

Yidong Wang ⋅ Xin Wang ⋅ Cunxiang Wang ⋅ Junfeng Fang ⋅ Qiufeng Wang ⋅ Jianing Chu ⋅ Xuran Meng ⋅ Shu-Xun Yang ⋅ Andrew Feng ⋅ Libo Qin ⋅ Wei Ye ⋅ Shikun Zhang

Abstract

Log in and register to view live content