InfoGlobe: Local-and-Global Information-Preserving Statistical Manifold Learning for Single-Cell Transcriptomics
Abstract
Geometry-preserving dimension reduction is critical for single-cell transcriptomics, where low-dimensional distances should reflect biological divergence between cell types along the transcriptomic manifold. Due to inadequate metrics, the global structure is not sufficiently preserved in the low-dimensional manifold in standard dimension reduction regimes. We model RNA counts as Multinomial samples, leveraging their hierarchical closure property: gene-level counts refine functional gene-group counts via nested Multinomial distributions. Extending Chentsov's Theorem, we show that the Fisher-Rao metric on coarse (gene-group) and fine (gene) statistical manifolds is isometric. Following this isometry property, we propose InfoGlobe, an information-preserving statistical manifold learning framework that projects cells from high-dimensional hyperspheres (full transcriptome) to low-dimensional hyperspheres (functional groups) while preserving information geometry. Embeddings on the low-dimensional sphere explicitly represent Multinomial distributions by functional gene groups. Benchmarks demonstrate superior preservation of local-and-global cell-type geodesic distances, automatic and robust gene-group discovery, nuanced cell subtype resolution without manual feature engineering and natural batch effect mitigation without explicit alignments.