Skip to yearly menu bar Skip to main content


A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

Blake Bullwinkel ⋅ Mark Russinovich ⋅ Ahmed Salem ⋅ Santiago Zanella-Beguelin ⋅ Dan Jones ⋅ Giorgio Severi ⋅ Eugenia Kim ⋅ Keegan Hines ⋅ Amanda Minnich ⋅ Yonatan Zunger ⋅ Ram Shankar Siva Kumar

Abstract

Chat is not available.