Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Actionable Interpretability
Sat, Jul 19, 2025 • 10:40 AM – 11:40 AM PDT

MIB: A Mechanistic Interpretability Benchmark

Aaron Mueller · Atticus Geiger · Sarah Wiegreffe · Dana Arad · Iván Arcuschin · Adam Belfki · Yik Siu Chan · Jaden Fiotto-Kaufman · Tal Haklay · Michael Hanna · Jing Huang · Rohan Gupta · Yaniv Nikankin · Hadas Orgad · Nikhil Prakash · Anja Reusch · Aruna Sankaranarayanan · Shun Shao · Alessandro Stolfo · Martin Tutek · Amir Zur · David Bau · Yonatan Belinkov

Abstract

Chat is not available.