Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Actionable Interpretability

MIB: A Mechanistic Interpretability Benchmark

Aaron Mueller · Atticus Geiger · Sarah Wiegreffe · Dana Arad · Iván Arcuschin · Adam Belfki · Yik Siu Chan · Jaden Fiotto-Kaufman · Tal Haklay · Michael Hanna · Jing Huang · Rohan Gupta · Yaniv Nikankin · Hadas Orgad · Nikhil Prakash · Anja Reusch · Aruna Sankaranarayanan · Shun Shao · Alessandro Stolfo · Martin Tutek · Amir Zur · David Bau · Yonatan Belinkov
2025 Poster
in
Workshop: Actionable Interpretability

Abstract

Chat is not available.