LassoFlexNet: a Flexible Neural Architecture for Tabular Data
Abstract
Deep neural networks excel in vision, language, and audio, yet continue to underperform relative to tree-based models on tabular data. We identify and extend inductive biases crucial for tabular learning—robustness to irrelevant features, axis alignment, localized irregularities, feature heterogeneity, and training stability—and propose LassoFlexNet, a novel architecture coupled with a new training algorithm. LassoFlexNet employs a Tied Group Lasso mechanism that sparsely selects raw inputs based on nonlinear per-feature embeddings. This design encourages a raw input variable to contribute jointly with others only if it provides marginal predictive value, linearly or nonlinearly. The resulting non-homogeneity and localized irregularities introduce optimization challenges that defeat standard stochastic and proximal-gradient methods. To address this, we develop a Sequential Hierarchical Proximal Gradient optimizer with exponential moving averages (EMA), enabling stable training. Across 52 datasets from three recent benchmarks, LassoFlexNet matches or surpasses state-of-the-art tree-based models, achieving up to 10% relative gains while improving interpretability. We further validate our design through ablation studies and prove enhanced expressivity for a key architectural component.