Skip to yearly menu bar Skip to main content


Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Banghua Zhu ⋅ Michael Jordan ⋅ Jiantao Jiao

Abstract

Video

Chat is not available.