Skip to yearly menu bar Skip to main content


Poster
in
Workshop: RLxF: RL from World Feedback

When The Verifier Is The Only Trustworthy Feedback Source: A Self-Teacher RLVR Pilot, A Confounded Logprob Extension, And Four Corrective Probes

Ethan Y Wang ⋅ Aayan Alwani

Abstract

Log in and register to view live content