Skip to yearly menu bar Skip to main content


Poster

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Adam Karvonen ⋅ Can Rager ⋅ Johnny Lin ⋅ Curt Tigges ⋅ Joseph Bloom ⋅ David Chanin ⋅ Yeu-Tong Lau ⋅ Eoin Farrell ⋅ Callum McDougall ⋅ Kola Ayonrinde ⋅ Demian Till ⋅ Matthew Wearden ⋅ Arthur Conmy ⋅ Samuel Marks ⋅ Neel Nanda
2025 Poster

Abstract

Lay Summary

Video

Chat is not available.