FD-Loss: Supervised Feature Decorrelation as a Scale-Invariant Replacement for Random Dropout
Abstract
Standard random dropout regularizes neural networks by stochastically deactivating units, yet remains fundamentally blind to representational redundancy: when two neurons converge on identical features, masking one does not generate a corrective gradient toward diversity. We propose Feature Decorrelation Loss (FD-Loss), a supervised regularization objective that explicitly penalizes the off-diagonal entries of the per-featurenormalized cross-correlation matrix of hidden activations. A mandatory per-feature ℓ2 normalization step resolves the gradient instability that caused prior covariance penalties (e.g., DeCov) to diverge on unscaled tabular data, bounding all correlation values to [−1, +1]. Extensive evaluation across 20 datasets spanning tabular, image, and text domains shows that FD-Loss achieves a 65% win rate over dropout, with accuracy improvements up to +5.35 pp on correlated tabular benchmarks and +4.12 pp on complex visual hierarchies, while incurring negligible computational overhead.