Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of AI Safety

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Bang An ⋅ Sicheng Zhu ⋅ Ruiyi Zhang ⋅ Michael-Andrei Panaitescu-Liess ⋅ Yuancheng Xu ⋅ Furong Huang

Abstract

Chat is not available.