Group-wise Data Ordering: Enhancing Instruction Tuning of Large Language Models via Embedding Proximity
Abstract
Instruction tuning (IT) is a central mechanism for aligning large language models (LLMs) with user intent. In practice, randomly shuffling the training set is a simple yet surprisingly strong baseline. However, it overlooks latent structure, such as domain and reasoning depth, and thus interleaves heterogeneous objectives, which can induce gradient conflicts and diminish effective optimization progress. To this end, we propose EP-Order, an embedding-proximity-based data-ordering paradigm for IT of LLMs. Unlike previous paradigms that derive order from per-example scores, EP-Order explicitly accounts for inter-sample correlations by operating in representation space. EP-Order trains a warm-up model on a small subset of data (e.g., 10%), embeds all training samples for clustering, and ranks these clusters according to embedding proximity. To handle sharp gradient changes at cluster transitions and alleviate catastrophic forgetting under cluster-based training, we introduce mixed regions that interleave samples from the previous, current, and next clusters, thereby stabilizing learning. We evaluate EP-Order on seven popular multimodal LLM benchmarks and demonstrate that it is both more effective and more efficient than competing data ordering paradigms. We expand the application of EP-Order to a hybrid thinking text-only scenario, where think and no-think samples induce substantial optimization conflict, and evaluate with three benchmarks. EP-Order obtains nearly consistent improvements over random mixing. These results highlight embedding-proximity-based ordering as a promising direction for complex, high-conflict training data in IT.