Skip to yearly menu bar Skip to main content


A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

Blake Bullwinkel · Mark Russinovich · Ahmed Salem · Santiago Zanella-Beguelin · Dan Jones · Giorgio Severi · Eugenia Kim · Keegan Hines · Amanda Minnich · Yonatan Zunger · Ram Shankar Siva Kumar

Abstract

Chat is not available.