Skip to yearly menu bar Skip to main content


Poster
in
Affinity Workshop: New In ML

Scaling laws for activation steering with Llama 2 models and refusal mechanisms

Abstract

Chat is not available.