Skip to yearly menu bar Skip to main content


Poster

TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT

Rana Khan ⋅ Zijie Liu ⋅ Zhen Tan ⋅ Charles Fleming ⋅ Tianlong Chen

Abstract

Log in and register to view live content