Skip to yearly menu bar Skip to main content


Poster
in
Workshop: RLxF: RL from World Feedback

EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL

Lunjun Zhang ⋅ Jimmy Ba

Abstract

Log in and register to view live content