Neural Collapse by Design: Learning Class Prototypes on the Hypersphere
Panagiotis Koromilas ⋅ Theodoros Giannakopoulos ⋅ Mihalis Nicolaou ⋅ Yannis Panagakis
Abstract
Supervised classifier learning has a theoretical optimum — Neural Collapse (NC) — yet standard training does not reach it in practice. We trace this failure to a geometric limitation: cross-entropy is invariant to joint rescaling of features and weights, leaving radial degrees of freedom unconstrained and the loss landscape degenerate. Projecting optimization onto the unit hypersphere eliminates this degeneracy and exposes a hidden equivalence: normalized softmax classification and supervised contrastive learning are conceptually the same, both optimizing angular similarity to class prototypes. We formalize this unification by proving that supervised contrastive learning already produces an optimal classifier during training, the prototype classifier whose weights are given by class-wise feature means, rendering subsequent classifier learning through linear probing redundant. Building on this framework, we identify two computational bottlenecks that slow convergence to NC: the small effective negative set in classifier learning (limited to K class prototypes), and the coupling of competing optimization terms through a shared normalization. We address these with NTCE, which expands the negative set from K classes to M batch instances, and NONL, which normalizes only over negatives to decouple intra-class alignment from inter-class repulsion. Empirically, our methods surpass cross-entropy accuracy on four benchmarks including ImageNet-1K, achieve $\ge$95\% NC across all metrics, and yield consistent gains in transfer learning (+5.5\% mean relative improvement), long-tailed classification (up to +8.7\%), and robustness (lower mCE), while eliminating hours of post-hoc classifier training.
Successful Page Load