Skip to yearly menu bar Skip to main content


Poster

Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Mahavir Dabas · Si Chen · Charles Fleming · Ming Jin · Ruoxi Jia
2025 Poster

Abstract

Lay Summary

Video

Chat is not available.