Skip to yearly menu bar Skip to main content


Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

Zhihe Yang · Xufang Luo · Zilong Wang · Dongqi Han · Zhiyuan He · Dongsheng Li · Yunjian Xu

Abstract

Chat is not available.