Skip to yearly menu bar Skip to main content


Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Alexandre Rame ⋅ Guillaume Couairon ⋅ Corentin Dancette ⋅ Mustafa Shukor ⋅ Jean-Baptiste Gaya ⋅ Laure Soulier ⋅ Matthieu Cord

Abstract

Video

Chat is not available.