Skip to yearly menu bar Skip to main content


Keynote
in
Workshop: 2nd ICML Workshop on New Frontiers in Adversarial Machine Learning

Lea Schönherr

Lea Schönherr


Abstract:

Bio: Lea Schönherr is a tenure track faculty at CISPA Helmholtz Center for Information Security since 2022. She obtained her PhD from Ruhr-Universität Bochum, Germany, in 2021 and is a recipient of two fellowships from UbiCrypt (DFG Graduate School) and Casa (DFG Cluster of Excellence). Her research interests are in the area of information security with a focus on adversarial machine learning and generative models to defend against real-world threats. She is particularly interested in language as an interface to machine learning models and in combining different domains such as audio, text, and images. She has published several papers on threat detection and defense of speech recognition systems and generative models.

Title: Brave New World: Challenges and Threats in Multimodal AI Agent Integrations

Abstract: Being on the rise, AI agents become more integrated into our daily lives and will soon be indispensable for countless downstream tasks, be it translation, text enhancing, summarisation or other assisting applications like code generation. As of today, the human-agent interface is no longer limited to plain text and large language models (LLMs) can handle documents, videos, images, audio and more. In addition, the generation of various multimodal outputs is becoming more advanced and realistic in appearance, allowing for more sophisticated communication with AI agents. Particularly in the future, agents will rely on a more natural-feeling voice interface for interactions with AI agents. In this presentation, we will take a closer look at the resulting challenges and security threats associated with integrated multimodal AI agents, which relate to two possible categories: Malicious inputs used to jailbreak LLMs, as well as computer-generated output that is indistinguishable from human-generated content. In the first case, specially designed inputs are used to exploit an LLM or its embedding system, also referred to as prompt hacking. Existing attacks show that content filters of LLMs can be easily bypassed with specific inputs and that private information can be leaked. The use of additional input modalities, such as speech, allows for a much broader potential attack surface that needs to be investigated and protected. In the second case, generative models are utilized to produce fake content that is nearly impossible to distinguish from human-generated content. This fake content is often used for fraudulent and manipulative purposes and impersonation and realistic fake news is already possible using a variety of techniques. As these models continue to evolve, detecting these fraudulent activities will become increasingly difficult, while the attacks themselves will become easier to automate and require less expertise. This creates significant challenges for preventing fraud and the uncontrolled spread of fake news.

Chat is not available.