$\mathcal{O}(\log N)$ Latent Dimension Suffices for Universal Approximation of Permutation-invariant Function
Min ZHOU ⋅ Enming Liang ⋅ Minghua Chen
Abstract
Learning permutation-invariant functions over sets of $N$ elements is fundamental to many deep learning applications. While sum-decomposable architectures like DeepSets theoretically offer universal approximation capabilities for such functions, existing constructive bounds suggest that the latent dimension must grow linearly with the set size, i.e., $\mathcal{O}(N)$. This linear scaling poses a significant bottleneck for scalability. In this paper, we break this theoretical barrier by proving that a latent dimension of $\mathcal{O}(\log N)$ suffices for the universal approximation of Wasserstein-stable permutation-invariant functions. We establish this by first showing that the covering number of the Wasserstein space scales linearly with $N$. Then, we show that random embeddings, specifically Random Fourier Features, with a logarithmic latent dimension to the covering number can preserve the geometry with high probability, thereby guaranteeing the existence of deterministic embeddings of the same width. This result advances the understanding of the expressivity of set-based neural network architectures.
Successful Page Load