Skip to yearly menu bar Skip to main content


Poster Wed, Jul 8, 2026 • 10:30 AM – 12:15 PM KST Coex: HALL A

Endogenous Resistance to Activation Steering in Language Models: Evidence for Internal Consistency Monitoring in Llama-3.3-70B

Alex McKenzie ⋅ Keenan Pepper ⋅ Stijn Servaes ⋅ Martin Leitgab ⋅ Murat Cubuktepe ⋅ Michael Vaiana ⋅ Diogo de Lucena ⋅ Judd Rosenblatt ⋅ Michael Graziano

Abstract

Log in and register to view live content