Skip to yearly menu bar Skip to main content


Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Alexandre Rame · Guillaume Couairon · Corentin Dancette · Mustafa Shukor · Jean-Baptiste Gaya · Laure Soulier · Matthieu Cord

Abstract

Video

Chat is not available.