Timezone: »
Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse (NC). Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified ``unconstrained feature model''. In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of NC under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.
Author Information
Hien Dang (FPT Software Company Limited, FPT Cau Giay building, Duy Tan street, Dich Vong Hau ward, Cau Giay district, Hanoi)
Tho Tran Huu (FPT Software Company Limited, FPT Cau Giay building, Duy Tan street, Dich Vong Hau ward, Cau Giay district, Hanoi)
Stanley Osher (UCLA)
Hung Tran-The (Deakin University)
Nhat Ho (University of Texas at Austin)
TAN NGUYEN (UCLA)
More from the Same Authors
-
2023 : Fast Approximation of the Generalized Sliced-Wasserstein Distance »
Dung Le · Huy Nguyen · Khai Nguyen · Nhat Ho -
2023 Poster: Revisiting Over-smoothing and Over-squashing Using Ollivier-Ricci Curvature »
Khang Nguyen · Nong Hieu · Vinh NGUYEN · Nhat Ho · Stanley Osher · TAN NGUYEN -
2023 Poster: On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances »
Aritra Guha · Nhat Ho · XuanLong Nguyen -
2023 Poster: Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction »
Khai Nguyen · Dang Nguyen · Nhat Ho -
2022 Poster: Entropic Gromov-Wasserstein between Gaussian Distributions »
Khang Le · Dung Le · Huy Nguyen · · Tung Pham · Nhat Ho -
2022 Poster: Improving Transformers with Probabilistic Attention Keys »
Tam Nguyen · Tan Nguyen · Dung Le · Duy Khuong Nguyen · Viet-Anh Tran · Richard Baraniuk · Nhat Ho · Stanley Osher -
2022 Spotlight: Improving Transformers with Probabilistic Attention Keys »
Tam Nguyen · Tan Nguyen · Dung Le · Duy Khuong Nguyen · Viet-Anh Tran · Richard Baraniuk · Nhat Ho · Stanley Osher -
2022 Spotlight: Entropic Gromov-Wasserstein between Gaussian Distributions »
Khang Le · Dung Le · Huy Nguyen · · Tung Pham · Nhat Ho -
2022 Poster: On Transportation of Mini-batches: A Hierarchical Approach »
Khai Nguyen · Dang Nguyen · Quoc Nguyen · Tung Pham · Hung Bui · Dinh Phung · Trung Le · Nhat Ho -
2022 Poster: Architecture Agnostic Federated Learning for Neural Networks »
Disha Makhija · Xing Han · Nhat Ho · Joydeep Ghosh -
2022 Poster: Improving Mini-batch Optimal Transport via Partial Transportation »
Khai Nguyen · Dang Nguyen · The-Anh Vu-Le · Tung Pham · Nhat Ho -
2022 Spotlight: Architecture Agnostic Federated Learning for Neural Networks »
Disha Makhija · Xing Han · Nhat Ho · Joydeep Ghosh -
2022 Spotlight: Improving Mini-batch Optimal Transport via Partial Transportation »
Khai Nguyen · Dang Nguyen · The-Anh Vu-Le · Tung Pham · Nhat Ho -
2022 Spotlight: On Transportation of Mini-batches: A Hierarchical Approach »
Khai Nguyen · Dang Nguyen · Quoc Nguyen · Tung Pham · Hung Bui · Dinh Phung · Trung Le · Nhat Ho -
2022 Poster: Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models »
Tudor Manole · Nhat Ho -
2022 Oral: Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models »
Tudor Manole · Nhat Ho -
2021 Poster: LAMDA: Label Matching Deep Domain Adaptation »
Trung Le · Tuan Nguyen · Nhat Ho · Hung Bui · Dinh Phung -
2021 Spotlight: LAMDA: Label Matching Deep Domain Adaptation »
Trung Le · Tuan Nguyen · Nhat Ho · Hung Bui · Dinh Phung