Hard Labels In! Rethinking the Role of Hard Labels in Mitigating Local Semantic Drift
Jiacheng Cui ⋅ Bingkui Tong ⋅ Xinyue Bi ⋅ Xiaohan Zhao ⋅ Jiacheng Liu ⋅ Zhiqiang Shen
Abstract
Soft labels from teacher models are a $\textit{de facto}$ practice for knowledge transfer and large-scale dataset distillation (e.g., SRe$^2$L, RDED, LPLD). However, when we limit the number of crops per image to reduce the substantial cost of storing precomputed soft labels, these methods suffer severely from $\textit{local semantic drift}$: visually ambiguous crops can cause soft supervision to deviate from the image-level ground-truth semantics, leading to systematic errors and a train–test distribution mismatch. We revisit the overlooked role of hard labels and show that, when properly integrated, they act as a content-agnostic semantic anchor that calibrates such drift. We theoretically analyze the emergence of drift under sparse soft-label supervision and demonstrate that hybridizing hard and soft labels restores alignment between visual content and semantic supervision. Building on this insight, we propose a new training paradigm, $\textbf{H}$ard Label for $\textbf{A}$lleviating $\textbf{L}$ocal Semantic $\textbf{D}$rift (HALD), which uses hard labels as intermediate corrective signals while preserving the fine-grained benefits of soft labels. Extensive experiments on dataset distillation and large-scale classification benchmarks show consistent generalization improvements. On ImageNet-1K, our method achieves 42.7% accuracy with only 285M soft-label storage (reduces by ${\bf 100\times})$, outperforming prior state-of-the-art LPLD by 9.0%.
Successful Page Load