Skip to yearly menu bar Skip to main content


Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Lillian Sun ⋅ Martin Pawelczyk ⋅ Zhenting Qi ⋅ Aounon Kumar ⋅ Himabindu Lakkaraju

Abstract

Chat is not available.