Skip to yearly menu bar Skip to main content


Poster

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Shih-Yang Liu ⋅ Xin Dong ⋅ Ximing Lu ⋅ Shizhe Diao ⋅ Peter Belcak ⋅ Mingjie Liu ⋅ Min-Hung Chen ⋅ Hongxu Yin ⋅ Yu-Chiang Wang ⋅ Kwang-Ting Cheng ⋅ Yejin Choi ⋅ Jan Kautz ⋅ Pavlo Molchanov

Abstract

Log in and register to view live content