Skip to yearly menu bar Skip to main content


Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO

Chen ⋅ Xiaopeng Li ⋅ Ziniu Li ⋅ Xi Chen ⋅ Tianyi Lin

Abstract

Chat is not available.