Poster

How high is ‘high’? Rethinking the roles of dimensionality in topological data analysis and manifold learning

Hannah Sansford ⋅ Nick Whiteley ⋅ Patrick Rubin-Delanchy

Abstract

High-dimensionality of data is often regarded as a fundamental statistical impediment in Machine Learning and AI. The purpose of this paper is to clarify, on the contrary, when and how high-dimensionality may be beneficial. In the setting of a general random function model of data we delineate between three notions of dimensionality: *effective dimension* $p_{\mathrm{eff}}$, measuring total variability across feature directions; *correlation rank* $r$, measuring functional complexity across samples; and *latent intrinsic dimension* $d$ of manifold structure hidden in data. Via a generalized Hanson-Wright inequality, we show that increasing $p_{\mathrm{eff}}$ drives a *blessing of dimensionality* phenomenon, whereby data dot-products concentrate about their expectations. In turn, we show that, under mild continuity assumptions (ensuring that features bring additional information as dimension grows), persistence diagrams recover latent homology when $p_{\mathrm{eff}} \in \omega (\log n)$ as $n\to\infty$. Informed by our theory, we revisit the ground-breaking neuroscience discovery of toroidal structure in grid-cell activity made by Gardner et al. (2022): our findings provide the first empirical evidence that this structure is *isometric* to a flat torus model of physical space, suggesting that grid cell activity conveys a geometrically faithful representation of the real world.