Skip to yearly menu bar Skip to main content


Poster

An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks

Valentyn Boreiko · Alexander Panfilov · Václav Voráček · Matthias Hein · Jonas Geiping
2025 Poster

Abstract

Lay Summary

Video

Chat is not available.