Skip to yearly menu bar Skip to main content


Poster

Falsifying Sparse Autoencoder Reasoning Features in Language Models

George Ma ⋅ Zhongyuan Liang ⋅ Irene Y. Chen ⋅ Somayeh Sojoudi

Abstract

Log in and register to view live content