Skip to yearly menu bar Skip to main content


Poster

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

Tingchen Fu · Mrinank Sharma · Phil Torr · Shay Cohen · David Krueger · Fazl Barez
2025 Poster

Abstract

Lay Summary

Video

Chat is not available.