Q-Tab: Quantized Tabular Data Generator
Abstract
Codebook-based generators built on masked language model (MLM) transformers have become highly effective in text and vision, yet remain underused for tabular data. This is because codebooks typically act as information bottlenecks, whereas tabular generation requires them to generalize. We address this gap with Q-Tab, a codebook-based tabular generator based on lookup-free quantization (LFQ) with residual corruption. The resulting corruption kernel induces a moving Nadaraya–Watson–style kernel regression over a large discrete code space, which turns codebook learning into a moving-target problem. We derive necessary conditions for the learnability of such moving codebooks and show how the residual LFQ construction aligns with these conditions. Q-Tab achieves state-of-the-art downstream predictive utility and missing-value imputation, while matching the distributional fidelity of diffusion-based generators, notably without any post-hoc temperature tuning.