Skip to yearly menu bar Skip to main content


Poster Tue, Jul 15, 2025 • 4:30 PM – 7:00 PM PDT

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

Tingchen Fu · Mrinank Sharma · Phil Torr · Shay Cohen · David Krueger · Fazl Barez

Abstract

Lay Summary

Video

Chat is not available.