Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

14 Results

<<   <   Page 1 of 2   >   >>
Poster
Thu 4:30 COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xingang Guo · Fangxu Yu · Huan Zhang · Lianhui Qin · Bin Hu
Poster
Thu 2:30 PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
Ziyang Zhang · Qizhen Zhang · Jakob Foerster
Poster
Wed 4:30 Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu · Xiaosen Zheng · Tianyu Pang · Chao Du · Qian Liu · Ye Wang · Jing Jiang · Min Lin
Workshop
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Liwei Jiang · Kavel Rao · Seungju Han · Allyson Ettinger · Faeze Brahman · Sachin Kumar · Niloofar Mireshghallah · Ximing Lu · Maarten Sap · Nouha Dziri · Yejin Choi
Workshop
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao · Xianjun Yang · Tianyu Pang · Chao Du · Lei Li · Yu-Xiang Wang · William Wang
Workshop
Merging Improves Self-Critique Against Jailbreak Attacks
Victor Gallego
Workshop
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Patrick Chao · Edoardo Debenedetti · Alex Robey · Maksym Andriushchenko · Francesco Croce · Vikash Sehwag · Edgar Dobriban · Nicolas Flammarion · George J. Pappas · Florian Tramer · Hamed Hassani · Eric Wong
Workshop
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng · Tianyu Pang · Chao Du · Qian Liu · Jing Jiang · Min Lin
Workshop
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko · Francesco Croce · Nicolas Flammarion
Workshop
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra · Manolis Zampetakis · Paul Kassianik · Blaine Nelson · Hyrum Anderson · Yaron Singer · Amin Karbasi
Workshop
Attacking Large Language Models with Projected Gradient Descent
Simon Markus Geisler · Tom Wollschläger · M. Hesham Abdalla · Johannes Gasteiger · Stephan Günnemann
Workshop
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova · James Zou