Poster
in
Workshop: Actionable Interpretability

Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder

Xianjun Yang ⋅ Shaoliang Nie ⋅ Lijuan Liu ⋅ Suchin Gururangan ⋅ Ujjwal Karn ⋅ Rui Hou ⋅ Madian Khabsa ⋅ Yuning Mao

2025 Poster
in
Workshop: Actionable Interpretability

Project Page [ OpenReview]

Abstract

Instruction tuning data are often quantity-saturated due to the large volume of data collection and fast model iteration, leaving data selection important but underexplored. Existing quality-driven data selection methods, such as LIMA (NeurIPS 2023 \citep{zhou2024lima}) and AlpaGasus (ICLR 2024 \citep{chenalpagasus}) generally ignore the equal importance of data diversity and complexity. In this work, we aim to design a diversity-aware data selection strategy and creatively propose using sparse autoencoders (SAEs) to tackle the challenge of data diversity measure. In addition, SAEs can also provide more interpretability of model behavior and explain, e.g., the surprising effectiveness of selecting the longest response (ICML 2024 \citep{zhaolong}). Using effective data selection, we experimentally prove that models trained on our selected data can outperform other methods in terms of model capabilities, reduce training cost, and potentially gain more control over model behaviors. We prove that SAEs can serve as a good alternative to diversity measure and design our method to be scalable for potential industrial large-scale pruning, and we will also release our trained SAEs for use by the broader community.

Chat is not available.