Keynote
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning
Zico Kolter
Zico Kolter
Bio: Zico Kolter is an Associate Professor in the Computer Science Department at Carnegie Mellon University, and also serves as chief scientist of AI research for the Bosch Center for Artificial Intelligence. His work spans the intersection of machine learning and optimization, with a large focus on developing more robust and rigorous methods in deep learning. In addition, he has worked in a number of application areas, highlighted by work on sustainability and smart energy systems. He is a recipient of the DARPA Young Faculty Award, a Sloan Fellowship, and best paper awards at NeurIPS, ICML (honorable mention), AISTATS (test of time), IJCAI, KDD, and PESGM.
Title: Adversarial Attacks on Aligned LLMs
Abstract: In this talk, I'll discuss our recent work on generating adversarial attacks against public LLM tools, such as ChatGPT and Bard. Using combined gradient-based and greedy search on open-source LLMs, we find adversarial suffix strings that cause these models to ignore their "safety alignment" and answer potentially harmful user queries. And most surprisingly, we find that these adversarial prompts transfer amazingly well to closed-source, publicly-available models. I'll discuss the methodology and results of this attack, as well as what this may mean for the future of LLM robustness.