Skip to yearly menu bar Skip to main content


Poster

Reuse your FLOPs: Scaling RL on Hard Problems by Conditioning on Very Off-Policy Prefixes

Amrith Setlur ⋅ Zijian Wang ⋅ Andrew Cohen ⋅ Paria Rashidinejad ⋅ Sang Michael Xie

Abstract

Log in and register to view live content