Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of AI Safety

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Bang An · Sicheng Zhu · Ruiyi Zhang · Michael-Andrei Panaitescu-Liess · Yuancheng Xu · Furong Huang

Abstract

Chat is not available.