MiniX: Mitigating Low-Rank Collapse and Attention Bottlenecks in Tabular Foundation Models
Yuanrui Wang ⋅ Xingxuan Zhang ⋅ Han Yu ⋅ Mingchao Hao ⋅ Gang Ren ⋅ hao yuan ⋅ Li Mao ⋅ Yunjia Zhang ⋅ Chun Yuan ⋅ Peng Cui
Abstract
Recent tabular foundation models routinely match or surpass strong tree ensembles and specialized deep architectures, yet their numeric embeddings remain a bottleneck. We diagnose a low-rank collapse induced by the prevalent linear+ID scheme and introduce RaBEL, a compact Radial Basis Embedding Layer that front-loads nonlinearity via localized RBF features. RaBEL increases shallow-layer effective rank and improves conditioning without deeper stacks; it is complementary to periodic mappings. We further identify a permutation-order pathology in bidirectional attention (feature$\rightarrow$sample) and propose a reordered stack: sample-attention $\rightarrow$ FFN $\rightarrow$ feature-attention, ensuring column-level context precedes feature mixing and that all attention computations influence the readout. Combining both ideas yields MiniX, a 2M-parameter model that surpasses 7M-parameter TabPFN-v2 and 27M-parameter TabICL baselines on popular benchmarks while reducing training and inference cost. Our results highlight principled nonlinear embeddings and attention-order redesign as key enablers of accuracy and efficiency gains in tabular foundation models.
Successful Page Load