Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Actionable Interpretability

MIB: A Mechanistic Interpretability Benchmark

Aaron Mueller ⋅ Atticus Geiger ⋅ Sarah Wiegreffe ⋅ Dana Arad ⋅ Iván Arcuschin ⋅ Adam Belfki ⋅ Yik Siu Chan ⋅ Jaden Fiotto-Kaufman ⋅ Tal Haklay ⋅ Michael Hanna ⋅ Jing Huang ⋅ Rohan Gupta ⋅ Yaniv Nikankin ⋅ Hadas Orgad ⋅ Nikhil Prakash ⋅ Anja Reusch ⋅ Aruna Sankaranarayanan ⋅ Shun Shao ⋅ Alessandro Stolfo ⋅ Martin Tutek ⋅ Amir Zur ⋅ David Bau ⋅ Yonatan Belinkov
2025 Poster
in
Workshop: Actionable Interpretability

Abstract

Chat is not available.