Skip to yearly menu bar Skip to main content



Abstract:

Bio: Zico Kolter is an Associate Professor in the Computer Science Department at Carnegie Mellon University, and also serves as chief scientist of AI research for the Bosch Center for Artificial Intelligence. His work spans the intersection of machine learning and optimization, with a large focus on developing more robust and rigorous methods in deep learning. In addition, he has worked in a number of application areas, highlighted by work on sustainability and smart energy systems. He is a recipient of the DARPA Young Faculty Award, a Sloan Fellowship, and best paper awards at NeurIPS, ICML (honorable mention), AISTATS (test of time), IJCAI, KDD, and PESGM.

Title: Adversarial Attacks on Aligned LLMs

Abstract: In this talk, I'll discuss our recent work on generating adversarial attacks against public LLM tools, such as ChatGPT and Bard. Using combined gradient-based and greedy search on open-source LLMs, we find adversarial suffix strings that cause these models to ignore their "safety alignment" and answer potentially harmful user queries. And most surprisingly, we find that these adversarial prompts transfer amazingly well to closed-source, publicly-available models. I'll discuss the methodology and results of this attack, as well as what this may mean for the future of LLM robustness.

Chat is not available.