Skip to yearly menu bar Skip to main content


Poster

MIB: A Mechanistic Interpretability Benchmark

Aaron Mueller ⋅ Atticus Geiger ⋅ Sarah Wiegreffe ⋅ Dana Arad ⋅ Iván Arcuschin ⋅ Adam Belfki ⋅ Yik Siu Chan ⋅ Jaden Fiotto-Kaufman ⋅ Tal Haklay ⋅ Michael Hanna ⋅ Jing Huang ⋅ Rohan Gupta ⋅ Yaniv Nikankin ⋅ Hadas Orgad ⋅ Nikhil Prakash ⋅ Anja Reusch ⋅ Aruna Sankaranarayanan ⋅ Shun Shao ⋅ Alessandro Stolfo ⋅ Martin Tutek ⋅ Amir Zur ⋅ David Bau ⋅ Yonatan Belinkov
2025 Poster

Abstract

Lay Summary

Video

Chat is not available.