Skip to yearly menu bar Skip to main content


Poster

Codebook Features: Sparse and Discrete Interpretability for Neural Networks

Alex Tamkin ⋅ Mohammad Taufeeque ⋅ Noah Goodman
2024 Poster

Abstract

Chat is not available.