Skip to yearly menu bar Skip to main content


Poster

Removing Sandbagging in LLMs by Training with Weak Supervision

Emil Ryd ⋅ Henning Bartsch ⋅ Julian Stastny ⋅ Joe Benton ⋅ Vivek Hebbar

Abstract

Log in and register to view live content