Skip to yearly menu bar Skip to main content


$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

Jin Zhou ⋅ Kaiwen Wang ⋅ Jonathan Chang ⋅ Zhaolin Gao ⋅ Nathan Kallus ⋅ Kilian Weinberger ⋅ Kianté Brantley ⋅ Wen Sun

Abstract

Chat is not available.