Skip to yearly menu bar Skip to main content


Poster
in
Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)
Sat, Jul 19, 2025 • 3:00 PM – 3:45 PM PDT

Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

Lillian Sun · Martin Pawelczyk · Zhenting Qi · Aounon Kumar · Himabindu Lakkaraju

Abstract

Chat is not available.