Skip to yearly menu bar Skip to main content


Poster

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

Nathaniel Li ⋅ Alexander Pan ⋅ Anjali Gopal ⋅ Summer Yue ⋅ Daniel Berrios ⋅ Alice Gatti ⋅ Justin Li ⋅ Ann-Kathrin Dombrowski ⋅ Shashwat Goel ⋅ Gabriel Mukobi ⋅ Nathan Helm-Burger ⋅ Rassin Lababidi ⋅ Lennart Justen ⋅ Andrew Liu ⋅ Michael Chen ⋅ Isabelle Barrass ⋅ Oliver Zhang ⋅ Xiaoyuan Zhu ⋅ Rishub Tamirisa ⋅ Bhrugu Bharathi ⋅ Ariel Herbert-Voss ⋅ Cort Breuer ⋅ Andy Zou ⋅ Mantas Mazeika ⋅ Zifan Wang ⋅ Palash Oswal ⋅ Weiran Lin ⋅ Adam Hunt ⋅ Justin Tienken-Harder ⋅ Kevin Shih ⋅ Kemper Talley ⋅ John Guan ⋅ Ian Steneker ⋅ David Campbell ⋅ Brad Jokubaitis ⋅ Steven Basart ⋅ Stephen Fitz ⋅ Ponnurangam Kumaraguru ⋅ Kallol Karmakar ⋅ Uday Tupakula ⋅ Vijay Varadharajan ⋅ Yan Shoshitaishvili ⋅ Jimmy Ba ⋅ Kevin Esvelt ⋅ Alexandr Wang ⋅ Dan Hendrycks
2024 Poster

Abstract

Chat is not available.