Skip to yearly menu bar Skip to main content


Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets

Lei Hsiung ⋅ Tianyu Pang ⋅ Yung-Chen Tang ⋅ Linyue Song ⋅ Tsung-Yi Ho ⋅ Pin-Yu Chen ⋅ Yaoqing Yang

Abstract

Video

Chat is not available.