Skip to yearly menu bar Skip to main content


Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Alexandre Rame ⋅ Guillaume Couairon ⋅ Corentin Dancette ⋅ Jean-Baptiste Gaya ⋅ Mustafa Shukor ⋅ Laure Soulier ⋅ Matthieu Cord

Abstract

Video

Chat is not available.