Skip to yearly menu bar Skip to main content


Poster

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Bilgehan Sel ⋅ Xuanli He ⋅ Alwin Peng ⋅ Ming Jin ⋅ Jerry Wei

Abstract

Log in and register to view live content