Skip to yearly menu bar Skip to main content


Oral presentation
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju
2025 Oral presentation
in
Workshop: Methods and Opportunities at Small Scale (MOSS)

Abstract

Video

Chat is not available.