Skip to yearly menu bar Skip to main content


Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

Zhihe Yang ⋅ Xufang Luo ⋅ Zilong Wang ⋅ Dongqi Han ⋅ Zhiyuan He ⋅ Dongsheng Li ⋅ Yunjian Xu

Abstract

Chat is not available.