Skip to yearly menu bar Skip to main content


Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons

Banghua Zhu · Michael Jordan · Jiantao Jiao

Abstract

Video

Chat is not available.