Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Next Generation of AI Safety

Weak-to-Strong Jailbreaking on Large Language Models

Xuandong Zhao ⋅ Xianjun Yang ⋅ Tianyu Pang ⋅ Chao Du ⋅ Lei Li ⋅ Yu-Xiang Wang ⋅ William Wang

Abstract

Chat is not available.