Skip to yearly menu bar Skip to main content


Poster Wed, Jul 16, 2025 • 11:00 AM – 1:30 PM PDT

Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Mahavir Dabas · Si Chen · Charles Fleming · Ming Jin · Ruoxi Jia

Abstract

Lay Summary

Video

Chat is not available.