Skip to yearly menu bar Skip to main content


Spotlight Poster

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Zhengxuan Wu ⋅ Aryaman Arora ⋅ Atticus Geiger ⋅ Zheng Wang ⋅ Jing Huang ⋅ Dan Jurafsky ⋅ Christopher Manning ⋅ Christopher Potts
2025 Spotlight Poster

Abstract

Lay Summary

Video

Chat is not available.