Skip to yearly menu bar Skip to main content


Quantifying Empirical Compute-Supervision Tradeoffs in RLVR

Ryo Mitsuhashi ⋅ Patrick Chen ⋅ Isabelle Tseng ⋅ Jasin Cekinmez ⋅ Addison J. Wu

Abstract

Log in and register to view live content