Skip to yearly menu bar Skip to main content


Poster

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

Adam Karvonen ⋅ James Chua ⋅ Clément Dumas ⋅ Kit Fraser-Taliente ⋅ Subhash Kantamneni ⋅ Julian Minder ⋅ Euan Ong ⋅ Arnab Sen Sharma ⋅ Daniel Wen ⋅ Owain Evans ⋅ Samuel Marks

Abstract

Log in and register to view live content