Skip to yearly menu bar Skip to main content


Act Only When It Pays: Efficient Reinforcement Learning for LLM Reasoning via Selective Rollouts

Haizhong Zheng ⋅ Yang Zhou ⋅ Brian Bartoldson ⋅ Bhavya Kailkhura ⋅ Fan Lai ⋅ Jiawei Zhao ⋅ Beidi Chen

Abstract

Chat is not available.