Particles Don’t Care About Z: Towards Scaling Entropy Estimation of Unnormalized Densities
Safa Messaoud ⋅ Skander Charni ⋅ Elaa Bouazza ⋅ Ali Pourghasemi ⋅ Halima Bensmail
Abstract
Computing the differential entropy of distributions known only up to a normalization constant is a long-standing challenge with broad theoretical and practical significance. While variational inference is the most scalable approach for density approximation _from samples_, its potential in settings where _only the unnormalized density_ is available remains largely under-explored. The central difficulty lies in constructing variational distributions that simultaneously ($i$) exploit the structure of the unnormalized density, ($ii$) are expressive enough to capture complex target distributions, ($iii$) remain computationally tractable, and ($iv$) support efficient sampling. Recently, \citet{messaoud2024s} introduced _P-SVGD_, a particle-based variational method that leverages Stein Variational Gradient Descent dynamics, satisfies all of these constraints and demonstrates promising results in low-dimensional setups. We show, however, that _P-SVGD_ does not scale to high dimensions due to _fundamental algorithmic flaws_: ($i$) misdiagnosed sensitivity to \textit{SVGD} hyperparameters, ($ii$) violation of the global invertibility assumption in the entropy derivation, ($iii$) omission of a critical trace-of-Hessian term, ($iv$) along with suboptimal heuristics, including a divergence-based sampling check that induces mode collapse and loose informal bounds with no practical value. These issues severely limit both the correctness and the scalability of the approach. We propose _MET-SVGD_, a principled extension of _P-SVGD_ that addresses these flaws by providing a general framework for \textit{SVGD} hyperparameters selection with global invertibility and convergence guarantees. This enabled more accurate and scalable entropy estimation in high-dimensional settings. Empirically, in entropy estimation benchmarks, _MET-SVGD_ achieves accuracy improvements of up to 12$\times$ and 16$\times$ over _P-SVGD_ and baselines from the _SVGD_ literature, respectively. On CIFAR-10 Energy-Based image generation, it improves FID by $80.4$% compared to _P-SVGD_ and achieves 64$\times$ higher training stability. In Maximum-Entropy reinforcement learning, _MET-SVGD_ yields up to $16$% better returns than _P-SVGD_. We will make our code publicly available at https://tinyurl.com/2esyfx8j.
Successful Page Load