Skip to yearly menu bar Skip to main content


Oral presentation
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Lillian Sun ⋅ Martin Pawelczyk ⋅ Zhenting Qi ⋅ Aounon Kumar ⋅ Himabindu Lakkaraju
2025 Oral presentation
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Abstract

Video

Chat is not available.