Much of the data that is fueling current rapid advances in machine learning is high dimensional, structurally complex, and strongly nonlinear. This poses challenges for researcher intuition when they ask (i) how and why current algorithms work and (ii) what tools will lead to the next big breakthough. Mathematicians working in topology, algebra, and geometry have more than a hundred years worth of finelydeveloped machinery whose purpose is to give structure to, help build intuition about, and generally better understand spaces and structures beyond those that we can naturally understand. Following on the success of the first TAGML workshop in 2022, this workshop will showcase work which brings methods from topology, algebra, and geometry and uses them to help answer challenging questions in machine learning. Topics include mathematical machine learning, explainability, training schemes, novel algorithms, performance metrics, and performance guarantees. All accepted papers will be included in an associated PMLR volume.
Fri 12:00 p.m.  12:10 p.m.

Open Remarks
link »
SlidesLive Video » 
Tim Doster · Tegan Emerson · Henry Kvinge · Sophia Sanborn · Nina Miolane · Bastian Rieck · Mathilde Papillon 🔗 
Fri 12:10 p.m.  12:50 p.m.

Discrete Curvature and Applications in GraphBased Learning
(
Keynote
)
SlidesLive Video » The problem of identifying geometric structure in heterogeneous, highdimensional data is a cornerstone of Representation Learning. In this talk, we study the problem of data geometry from the perspective of Discrete Geometry. We start by reviewing discrete notions of curvature with a focus on discrete Ricci curvature. Then we discuss how curvature is linked to mesoscale structure in graphs, which gives rise to applications of discrete curvature in node clustering and community detection. For downstream machine learning and data science applications, it is often beneficial to represent graphstructured data in a continuous space, which may be Euclidean or NonEuclidean. We show that discrete curvature allows for characterizing the geometry of a suitable embedding space both locally and in the sense of global curvature bounds, which have implications for graphbased learning. 
Melanie Weber 🔗 
Fri 12:50 p.m.  1:00 p.m.

A new perspective on building efficient and expressive 3D equivariant graph neural networks
(
Spotlight
)
link »
SlidesLive Video » Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects. Despite rapid progress in encoding 3D symmetries into Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness of these networks through a localtoglobal analysis lacks today. In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. Our work leads to two crucial modules for designing expressive and efficient geometric GNNs; namely local substructure encoding (\textbf{LSE}) and frame transition encoding (\textbf{FTE}). To demonstrate the applicability of our theory, we propose LEFTNet which effectively implements these modules and achieves stateoftheart performance on both scalarvalued and vectorvalued molecular property prediction tasks. We further point out the design space for future developments of equivariant graph neural networks. 
Yuanqi Du 🔗 
Fri 1:00 p.m.  1:30 p.m.

Coffee Break

🔗 
Fri 1:30 p.m.  2:10 p.m.

Neural Approaches for Geometric Problems
(
Keynote
)
SlidesLive Video » Machine learning, especially the use of neural netowrks have shown great success in a broad range of applications. Recently, neural approaches have also shown promise in tackling (combinatorial) optimization problems in a datadriven manner. On the other hand, for many problems, especially geometric optimization problems, many beautiful geometric ideas and algorithmic insights have been developed in fields such as theoretical computer science and computational geometry. Our goal is to infuse geometric and algorithmic ideas to the design of neural frameworks so that they can be more effective and generalize better. In this talk, I will give two examples in this direction. The first one is what we call a mixed Neuralalgorithmic framework for the Steiner Tree problem in the Euclidean space, leveraging the celebrated PTAS algorithm by Arora. Interestingly, here the model complexity can be made independent of the input point set size. The second one is an neural architecture for approximating the Wasserstein distance between point sets, whose design /analysis uses a geometric coreset idea. 
Yusu Wang 🔗 
Fri 2:10 p.m.  3:00 p.m.

Edge Importance Scores for Editing Graph Topology to Preserve Fairness
(
Poster
)
link »
Graph neural networks have shown promising performance on graph analytical tasks such as node classification and link prediction, contributing to great advances in many graphbased applications. Despite the success of graph neural networks, most of them lack fairness considerations. Consequently, they could yield discriminatory results towards certain populations when such algorithms are exploited in high stakes applications. In this work, we study the problem of predictive bias propagated by relational information, and subsequently propose an intraining edge editing approach to promote fairness. We introduce the notions of faithfulness and unfairness for an edge in a graph, and use it as prior knowledge for editing graph topology and improving fairness. 
Sree Harsha Tanneru 🔗 
Fri 2:10 p.m.  3:00 p.m.

Infusing invariances in neural representations
(
Poster
)
link »
It has been observed that inner representations learned by different neural networks conceal structural similarities when the networks are trained under similar inductive biases. Exploring the geometric structure of latent spaces within these networks offers insights into the underlying similarity among different neural models and facilitates reasoning about the transformations that connect them. Identifying and estimating these transformations presents a challenging task, but it holds significant potential for various downstream tasks, including merging and stitching different neural architectures for model reuse. In this study, drawing on the geometrical structure of latent spaces, we show how it is possible to define representations that incorporate invariances to the targeted transformations, in a single framework. We experimentally analyze how inducing different invariances in the representations affects downstream performances on classification and reconstruction tasks, demonstrating that the classes of transformations that relate independent latent spaces depend on the task at hand. We analyze models in a variety of settings including different initializations, architectural changes, and trained on multiple modalities (e.g., text, images), testing our framework on 8 different benchmarks. 
Irene Cannistraci · Marco Fumero · Luca Moschella · Valentino Maiorca · Emanuele Rodola 🔗 
Fri 2:10 p.m.  3:00 p.m.

Expressive Sign Equivariant Networks for Spectral Geometric Learning
(
Poster
)
link »
Recent work has shown the utility of developing machine learning models that respect the structure and symmetries of eigenvectors. These works promote sign invariance, since for any eigenvector $v$ the negation $v$ is also an eigenvector. However, we show that sign invariance is theoretically limited for tasks such as building orthogonally equivariant models and learning node positional encodings for link prediction in graphs. In this work, we demonstrate the benefits of sign equivariance for these tasks. To obtain these benefits, we develop novel sign equivariant neural network architectures. Our models are based on a new analytic characterization of sign equivariant polynomials and thus inherit provable expressiveness properties. Controlled synthetic experiments show that our networks can achieve the theoretically predicted benefits of sign equivariant models.

Derek Lim · Joshua Robinson · Stefanie Jegelka · Haggai Maron 🔗 
Fri 2:10 p.m.  3:00 p.m.

Desiderata for Representation Learning from Identifiability, Disentanglement, and GroupStructuredness
(
Poster
)
link »
Machine learning subfields define useful representations differently: disentanglement strives for semantic meaning and symmetries, identifiability for recovering the groundtruth factors of the (unobservable) data generating process, groupstructured representations for equivariance. We demonstrate that despite their merits, each approach has shortcomings. Surprisingly, joining forces helps overcome the limitations: we use insights from latent space statistics, geometry, and topology in our examples to elucidate how combining the desiderata of identifiability, disentanglement, and group structure yields more useful representations. 
Hamza Keurti · Patrik Reizinger · Bernhard Schölkopf · Wieland Brendel 🔗 
Fri 2:10 p.m.  3:00 p.m.

Neural Networks Are Graphs! Graph Neural Networks for Equivariant Processing of Neural Networks
(
Poster
)
link »
Neural networks that can process the parameters of other neural networks find applications in diverse domains, including processing implicit neural representations, domain adaptation of pretrained networks, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the weight space or rely on intricate weightsharing patterns to achieve equivariance. In this work, we propose representing neural networks as computation graphs, enabling the use of standard graph neural networks to preserve permutation symmetry. We also introduce probe features computed from the forward pass of the input neural network. Our proposed solution improves over prior methods from 86\% to 97\% accuracy on the challenging MNIST INR classification benchmark, showcasing the effectiveness of our approach. 
David Zhang · Miltiadis (Miltos) Kofinas · Yan Zhang · Yunlu Chen · Gertjan Burghouts · Cees Snoek 🔗 
Fri 2:10 p.m.  3:00 p.m.

Hyperbolic VAE via Latent Gaussian Distributions
(
Poster
)
link »
We propose a Gaussian manifold variational autoencoder (GMVAE) whose latent space consists of a set of Gaussian distributions. It is known that the set of the univariate Gaussian distributions with the Fisher information metric form a hyperbolic space, which we call a Gaussian manifold. To learn the VAE endowed with the Gaussian manifolds, we propose a pseudoGaussian manifold normal distribution based on the KullbackLeibler divergence, a local approximation of the squared FisherRao distance, to define a density over the latent space. In experiments, we demonstrate the efficacy of GMVAE on two different tasks: density estimation of image datasets and environment modeling in modelbased reinforcement learning. GMVAE outperforms the other variants of hyperbolic and EuclideanVAEs on density estimation tasks and shows competitive performance in modelbased reinforcement learning. We observe that our model provides strong numerical stability, addressing a common limitation reported in previous hyperbolicVAEs. 
Seunghyuk Cho · Juyong Lee · Dongwoo Kim 🔗 
Fri 2:10 p.m.  3:00 p.m.

Positional Encodings as Group Representations: A Unified Framework
(
Poster
)
link »
Positional encodings are ubiquitous as an input featurization tool in language modeling, computer vision, and graph representation learning, enabling neural networks to capture important geometric structure of the input. Traditionally, positional encodings have been defined anew for each data domain. In this work, we reinterpret positional encodings for disparate data types  including sequences, grids, graphs, and manifolds  in the unifying framework of group representations. We show how to express existing positional encodings as group representations, and conversely, propose new positional encodings by choosing suitable groups and representations. We validate our framework with experiments on implicit neural representations of images and vector fields, highlighting the practical utility of such positional encodings for encouraging approximate equivariance and capturing geometric structure. 
Derek Lim · Hannah Lawrence · Ningyuan Huang · Erik Thiede 🔗 
Fri 2:10 p.m.  3:00 p.m.

Fast computation of permutation equivariant layers with the partition algebra
(
Poster
)
link »
Linear neural network layers that are either equivariant or invariant to permutations of their inputs form core building blocks of modern deep learning architectures. Examples include the layers of DeepSets, as well as linear layers occurring in attention blocks of transformers and some graph neural networks. The space of permutation equivariant linear layers can be identified as the invariant subspace of a certain symmetric group representation, and recent work parameterized this space by exhibiting a basis whose vectors are sums over orbits of standard basis elements with respect to the symmetric group action. A parameterization opens up the possibility of learning the weights of permutation equivariant linear layers via gradient descent. The space of permutation equivariant linear layers is a generalization of the partition algebra, an object first discovered in statistical physics with deep connections to the representation theory of the symmetric group, and the basis described above generalizes the socalled orbit basis of the partition algebra. We exhibit an alternative basis, generalizing the diagram basis of the partition algebra, with computational benefits stemming from the fact that the tensors making up the basis are low rank in the sense that they naturally factorize into Kronecker products. Just as multiplication by a rank one matrix is far less expensive than multiplication by an arbitrary matrix, multiplication with these low rank tensors is far less expensive than multiplication with elements of the orbit basis. Finally, we describe an algorithm implementing multiplication with these basis elements. 
Charles Godfrey · Michael Rawson · Davis Brown · Henry Kvinge 🔗 
Fri 2:10 p.m.  3:00 p.m.

Evolving Computation Graphs
(
Poster
)
link »
Graph neural networks (GNNs) have demonstrated success in modeling relational data, especially for data that exhibits homophily: when a connection between nodes tends to imply that they belong to the same class. However, while this assumption is true in many relevant situations, there are important realworld scenarios that violate this assumption. In this work, we propose Evolving Computation Graphs (ECGs), a novel method for enhancing GNNs on heterophilic datasets without requiring prior domain knowledge. Our approachbuilds on prior theoretical insights linking node degree, high homophily, and inter vs intraclass embedding similarity by rewiring the GNNs’ computation graph towards adding edges that connect nodes that are likely to be in the same class. We utilise weaker classifiers to identify these edges and evaluate ECGs on a diverse set of recentlyproposed heterophilous datasets, demonstrating improvements over 95% of the relevant baselines. 
Andreea Deac · Jian Tang 🔗 
Fri 2:10 p.m.  3:00 p.m.

A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
(
Poster
)
link »
Diffusionbased manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNAsequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. 
Guillaume Huguet · Alexander Tong · Edward De Brouwer · Yanlei Zhang · Guy Wolf · Ian Adelstein · Smita Krishnaswamy 🔗 
Fri 2:10 p.m.  3:00 p.m.

UGSL: A Unified Framework for Benchmarking Graph Structure Learning
(
Poster
)
link »
Graph neural networks (GNNs) demonstrate outstanding performance in a broad range of applications. While the majority of GNN applications assume that a graph structure is given, some recent methods substantially expanded the applicability of GNNs by showing that they may be effective even when no graph structure is explicitly provided. The GNN parameters and a graph structure are jointly learned. Previous studies adopt different experimentation setups, making it difficult to compare their merits. In this paper, we propose a benchmarking strategy for graph structure learning using a unified framework. Our framework, called Unified Graph Structure Learning (UGSL), reformulates existing models into a single model. We implement a wide range of existing models in our framework and conduct extensive analyses of the effectiveness of different components in the framework. Our results provide a clear and concise understanding of the different methods in this area as well as their strengths and weaknesses. 
Bahare Fatemi · Sami AbuElHaija · Anton Tsitsulin · Mehran Kazemi · Dustin Zelle · Neslihan Bulut · Jonathan Halcrow · Bryan Perozzi 🔗 
Fri 2:10 p.m.  3:00 p.m.

Caveats of neural persistence in deep neural networks
(
Poster
)
link »
Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. First, we prove tighter bounds on neural persistence that motivate this claim theoretically. Then, we confirm that our interpretation holds in practise by calculating neural persistence for synthetic weight matrices and for trained deep neural networks. This raises the question if the benefits of neural persistence can be achieved by simpler means, since already calculating 0order persistent homology for large matrices is costly. 
Leander Girrbach · Anders Christensen · A. Koepke · Ole Winther · Zeynep Akata 🔗 
Fri 2:10 p.m.  3:00 p.m.

Group Invariant Global Pooling
(
Poster
)
link »
Much work has been devoted to devising architectures that build groupequivariant representations, while invariance is often induced using simple global pooling mechanisms. Little work has been done on creating expressive layers that are invariant to given symmetries, despite the success of permutation invariant pooling in various tasks. In this work, we present Group Invariant Global Pooling (GIGP), an invariant pooling layer that is provably sufficiently expressive to represent a large class of invariant functions. We validate GIGP on rotated MNIST and QM9, showing improvements for the latter while attaining identical results for the former. By making the pooling process over group orbits, this invariant aggregation method leads to improved performance, while performing wellprincipled group aggregations. 
Kamil Bujel · Yonatan Gideoni · Chaitanya Joshi · Pietro Lió 🔗 
Fri 2:10 p.m.  3:00 p.m.

Regression on Latent Spaces for the Analysis of MultiCondition SingleCell RNASeq Data
(
Poster
)
link »
Multicondition singlecell data reveals expression differences between corresponding cell subpopulations in different conditions. Here, we propose to use regression on latent spaces to simultaneously account for variance from known and latent factors. Our approach is built around multivariate regression on Grassmann manifolds. We use the method to analyze a drug treatment experiment on brain tumor biopsies. The method is a versatile new approach for identifying differentially expressed genes from singlecell data of heterogeneous cell subpopulations under arbitrary experimental designs without clustering. 
Constantin AhlmannEltze · Wolfgang Huber 🔗 
Fri 2:10 p.m.  3:00 p.m.

Learning Polynomial Problems with SL(2)Equivariance
(
Poster
)
link »
We introduce a set of polynomial learning problems that are equivariant to the noncompact group $SL(2,\mathbb{R})$. $SL(2,\mathbb{R})$ consists of areapreserving linear transformations, and captures the symmetries of a variety of polynomialbased problems not previously studied in the machine learning community, such as verifying positivity (for e.g. sumofsquares optimization) and minimization. While compact groups admit many architectural building blocks, such as group convolutions, noncompact groups do not fit within this paradigm and are therefore more challenging. We consider several equivariancebased learning approaches for solving polynomial problems, including both data augmentation and a fully $SL(2,\mathbb{R})$equivariant architecture for solving polynomial problems. In experiments, we broadly demonstrate that machine learning provides a promising alternative to traditional SDPbased baselines, achieving tenfold speedups while retaining high accuracy. Surprisingly, the most successful approaches incorporate only a wellconditioned subset of $SL(2,\mathbb{R})$, rather than the entire group. This provides a rare example of a symmetric problem where data augmentation outperforms full equivariance, and provides interesting lessons for other problems with noncompact symmetries.

Hannah Lawrence · Mitchell Harris 🔗 
Fri 2:10 p.m.  3:00 p.m.

Enriching Disentanglement: Definitions to Metrics
(
Poster
)
link »
A multitude of metrics for learning and evaluating disentangled representations has been proposed. However, it remains unclear what these metrics truly quantify and how to compare them. To solve this problem, we introduce a systematic approach for transforming an equational definition into a quantitative metric via enriched category theory. We show how to replace (i) equality with metric, (ii) logical connectives with order operations, (iii) universal quantifier with aggregation, and (iv) existential quantifier with the best approximation. Using this approach, we can derive useful metrics for measuring the modularity and informativeness of a disentangled representation extractor. 
Yivan Zhang · Masashi Sugiyama 🔗 
Fri 2:10 p.m.  3:00 p.m.

Geometric Algebra Transformers
(
Poster
)
link »
Problems involving geometric data arise in a variety of fields, including computer vision, robotics, chemistry, and physics. Such data can take numerous forms, such as points, direction vectors, planes, or transformations, but to date there is no single architecture that can be applied to such a wide variety of geometric types while respecting their symmetries. In this paper we introduce the Geometric Algebra Transformer (GATr), a generalpurpose architecture for geometric data. GATr represents inputs, outputs, and hidden states in the projective geometric algebra, which offers an efficient 16dimensional vector space representation of common geometric objects and operators acting on them. GATr is equivariant with respect to E(3), the symmetry group of 3D Euclidean space. As a transformer, GATr is scalable, expressive, and versatile. In experiments with nbody modeling and robotic planning, GATr shows strong improvements over nongeometric baselines. 
Johann Brehmer · Pim de Haan · Sönke Behrends · Taco Cohen 🔗 
Fri 2:10 p.m.  3:00 p.m.

Conditional Bisimulation for Generalization in Reinforcement Learning
(
Poster
)
link »
Learning policies that are robust to changes in the environment are critical for real world deployment of Reinforcement Learning (RL) agents. They are also necessary for achieving good generalization across environment shifts.Bisimulation provides a powerful means for abstracting task relevant components of the observation and learning a succinct representation space for training the RL agent in high dimensional spaces by exploiting the rich metric structure induced by the RL dynamics. In this work, we extend the bisimulation framework to also account for context dependent observation shifts. We use simulator based learning as an exemplary setting to demonstrate the use alternate observations to learn a representation space which is invariant to observation shifts using a novel bisimulation based objective. This allows us to deploy the agent to varying observation settings during test time and generalize to unseen scenarios. Empirical analysis on the highdimensional image based control domains demonstrates the efficacy of our method. 
Anuj Mahajan · Amy Zhang 🔗 
Fri 2:10 p.m.  3:00 p.m.

On the Expressive Power of OllivierRicci Curvature on Graphs
(
Poster
)
link »
Discrete curvature has recently been used in graph machine learning to improve performance, understand messagepassing and assess structural differences between graphs. Despite these advancements, the theoretical properties of discrete curvature measures, such as their representational power and their relationship to graph features is yet to be fully explored. This paper studies OllivierRicci curvature on graphs, providing both a discussion and empirical analysis of its expressivity, i.e. the ability to distinguish nonisomorphic graphs. 
Josh Southern · Jeremy Wayland · Michael Bronstein · Bastian Rieck 🔗 
Fri 2:10 p.m.  3:00 p.m.

Can Euclidean Symmetry Help in Reinforcement Learning and Planning
(
Poster
)
link »
In robotic tasks, changes of reference frames typically do not affect the underlying physical meaning. These are isometric transformations, including translations, rotations, and reflections, called Euclidean group. In this work, we study reinforcement learning and planning tasks that have Euclidean group symmetry. We provide a theory that extends prior work (on symmetry in reinforcement learning, planning, and optimal control) to compact Lie groups and covers them as special cases, and show examples to explain the benefits of equivariance to Euclidean symmetry. We extend the 2D path planning with valuebased planning to continuous MDPs and propose a pipeline for equivariant samplingbased planning algorithm with empirical evidence. 
Linfeng Zhao · Owen Howell · Jung Yeon Park · Xupeng Zhu · Robin Walters · Lawson Wong 🔗 
Fri 2:10 p.m.  3:00 p.m.

Lie Point Symmetry and Physics Informed Networks
(
Poster
)
link »
Physicsinformed neural networks (PINNs) are computationally efficient alternatives to traditional partial differential equation (PDE) solvers.However, their reliability is dependent on the accuracy of the trained neural network. We introduce a mechanism for leveraging the symmetries of a given PDE to improve the neural solver. In particular, we propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE. Intuitively, our symmetry loss tries to ensure that infinitesimal generators of the Lie group preserve solutions of the PDE. This means that once the network learns a solution, it also learns the neighbouring solutions generated by Lie point symmetries.Our results show that symmetry is an effective inductive bias for PINNs and lead to a significant increase in sample efficiency. 
Tara AkhoundSadegh · Laurence PerreaultLevasseur · Johannes Brandstetter · Max Welling · Siamak Ravanbakhsh 🔗 
Fri 2:10 p.m.  3:00 p.m.

Equivariant Selfsupervised Deep Pose Estimation for Cryo EM
(
Poster
)
link »
Reconstructing the 3D volume of a molecule from its differently oriented 2D projections is the central problem of cryoEM, one of the main techniques for macromolecule imaging. Because the orientations are unknown, the estimation of the images' poses is essential to solve this inverse problem.Typical methods either rely on synchronization, which leverages the estimated relative poses of the images to constrain their absolute ones, or jointly estimate the poses and the 3D density of the molecule in an iterative fashion.Unfortunately, synchronization methods don't account for the complete images' generative process and, therefore, achieve lower noise robustness.In the second case, the iterative joint optimization suffers from convergence issues and a higher computational cost, due to the 3D reconstruction steps.In this work, we directly estimate individual poses with equivariant deep graph networks trained using a selfsupervised loss, which enforces agreement in Fourier domain of images pairs along the common lines defined by their poses.In particular, the equivariant design turns out essential for the proper convergence.As a result, our method can leverage the synchronization constraints  encoded by the synchronization graph structure  to improve convergence as well as the images generative process  via the common lines loss , with no need to perform intermediate reconstructions. 
Gabriele Cesa · Kumar Pratik · Arash Behboodi 🔗 
Fri 2:10 p.m.  3:00 p.m.

Learning Lie Group Symmetry Transformations with Neural Networks
(
Poster
)
link »
The problem of detecting and quantifying the presence of symmetries in datasets is useful for model selection, generative modeling, and data analysis, amongst others. While existing methods for hardcoding transformations in neural networks require prior knowledge of the symmetries of the task at hand, this work focuses on discovering and characterising unknown symmetries present in the dataset, namely, Lie group symmetry transformations beyond the traditional ones usually considered in the field (rotation, scaling, and translation). Specifically, we consider a scenario in which a dataset has been transformed by a oneparameter subgroup of transformations with different parameter values for each data point. Our goal is to characterise the transformation group and the distribution of the parameter values, even when they aren’t small or the transformation group isn’t one of the traditional ones. The results showcase the effectiveness of the approach in both these settings. 
Riccardo Valperga · Victoria Klein · Alex Gabel · Efstratios Gavves 🔗 
Fri 2:10 p.m.  3:00 p.m.

Sumformer: Universal Approximation for Efficient Transformers
(
Poster
)
link »
Natural language processing (NLP) made an impressive jump with the introduction of Transformers. ChatGPT is one of the most famous examples, changing the perception of the possibilities of AI even outside the research community. However, besides the impressive performance, the quadratic time and space complexity of Transformers with respect to sequence length pose significant limitations for handling long sequences. While efficient Transformer architectures like Linformer and Performer with linear complexity have emerged as promising solutions, their theoretical understanding remains limited. In this paper, we introduce Sumformer, a novel and simple architecture capable of universally approximating equivariant sequencetosequence functions. We use Sumformer to give the first universal approximation results for Linformer and Performer and a new proof for Transformer, where one attention layer suffices for the approximation. 
Silas Alberti · Niclas Dern · Laura Thesing · Gitta Kutyniok 🔗 
Fri 2:10 p.m.  3:00 p.m.

Deep Networks as Paths on the Manifold of Neural Representations
(
Poster
)
link »
Deep neural networks implement a sequence of layerbylayer operations that are each relatively easy to understand, but the resulting overall computation is generally difficult to understand. An intuitive hypothesis is that the role of each layer is to reformat information to reduce the "distance" to the desired outputs. With this spatial analogy, the layerwise computation implemented by a deep neural network can be viewed as a path along a highdimensional manifold of neural representations. With this framework, each hidden layer transforms its inputs by taking a step of a particular size and direction along the manifold, ideally moving towards the desired network outputs. We formalize this intuitive idea by leveraging recent advances in metric representational similarity. We extend existing representational distance methods by defining and characterizing the manifold that neural representations live on, allowing us to calculate quantities like the shortest path or tangent direction separating representations between hidden layers of a network or across different networks. We then demonstrate these tools by visualizing and comparing the paths taken by a collection of trained neural networks with a variety of architectures, finding systematic relationships between model depth and width, and properties of their paths. 
Richard Lange · Devin Kwok · Jordan Matelsky · Xinyue Wang · David Rolnick · Konrad Kording 🔗 
Fri 2:10 p.m.  3:00 p.m.

Explaining Graph Neural Networks Using Interpretable Local Surrogates
(
Poster
)
link »
We propose an interpretable local surrogate (ILS) method for understanding the predictions of blackbox graph models. Explainability methods are commonly employed to gain insights into blackbox models and, given the widespread adoption of GNNs in diverse applications, understanding the underlying reasoning behind their decisionmaking processes becomes crucial.Our ILS method approximates the behavior of a blackbox graph model by fitting a simple surrogate model in the local neighborhood of a given input example. Leveraging the interpretability of the surrogate, ILS is able to identify the most relevant nodes contributing to a specific prediction. To efficiently identify these nodes, we utilize group sparse linear models as local surrogates. Through empirical evaluations on explainability benchmarks, our method consistently outperforms stateoftheart graph explainability methods. This demonstrates the effectiveness of our approach in providing enhanced interpretability for GNN predictions. 
Farzaneh Heidari · Guillaume Rabusseau · Perouz Taslakian 🔗 
Fri 2:10 p.m.  3:00 p.m.

An Exact Kernel Equivalence for Finite Classification Models
(
Poster
)
link »
We explore the equivalence between neural networks and kernel methods by deriving the first exact representation of any finitesize parametric classification model trained with gradient descent as a kernel machine. We compare our exact representation to the wellknown Neural Tangent Kernel (NTK) and discuss approximation error relative to the NTK and other nonexact path kernel formulations. We experimentally demonstrate that the kernel can be computed for realistic networks up to machine precision. We use this exact kernel to show that our theoretical contribution can provide useful insights into the predictions made by neural networks, particularly the way in which they generalize. 
Brian Bell · Michael Geyer · David Glickenstein · Amanda Fernandez · Juston Moore 🔗 
Fri 2:10 p.m.  3:00 p.m.

Learning to Explain Hypergraph Neural Networks
(
Poster
)
link »
Hypergraphs are expressive structures for describing higherorder relationships among entities, with widespread applications across biology and drug discovery. Hypergraph neuralnetworks (HGNNs) have recently emerged as apromising representation learning approach onthese structures for clustering, classification, andmore. However, despite their promising performance, HGNNs remain a black box, and explaining how they make predictions remains an open challenge. To address this problem, we proposeHyperEX, a posthoc explainability frameworkfor hypergraphs that can be applied to any trainedHGNN. HyperEX computes nodehyperedge pairimportance to identify subhypergraphs as explanations. Our experiments demonstrate how HyperEX learns important subhypergraphs responsible for driving node classification to give usefulinsight into HGNNs. 
Sepideh Maleki · Ehsan Hajiramezanali · Gabriele Scalia · Tommaso Biancalani · Kangway Chuang 🔗 
Fri 2:10 p.m.  3:00 p.m.

Homological Neural Networks: A Sparse Architecture for Multivariate Complexity
(
Poster
)
link »
The rapid progress of Artificial Intelligence research came with the development of increasingly complex deep learning models, leading to growing challenges in terms of computational complexity, energy efficiency and interpretability. In this study, we apply advanced networkbased information filtering techniques to design a novel deep neural network unit characterized by a sparse higherorder graphical architecture built over the homological structure of underlying data. We demonstrate its effectiveness in two application domains which are traditionally challenging for deep learning: tabular data and time series regression problems. Results demonstrate the advantages of this novel design which can tie or overcome the results of stateoftheart machine learning and deep learning models using only a fraction of parameters. 
Yuanrong Wang · Antonio Briola · Tomaso Aste 🔗 
Fri 2:10 p.m.  3:00 p.m.

Can strong structural encoding reduce the importance of Message Passing?
(
Poster
)
link »
The most prevalent class of neural networks operating on graphs are message passing neural networks (MPNNs), in which the representation of a node is updated iteratively by aggregating information in the 1hop neighborhood. Since this paradigm for computing node embeddings may prevent the model from learning coarse topological structures, the initial features are often augmented with structural information of the graph, typically in the form of Laplacian eigenvectors or Random Walk transition probabilities. In this work, we explore the contribution of message passing when strong structural encodings are provided. We introduce a novel way of modeling the interaction between feature and structural information based on their tensor product rather than the standard concatenation. The choice of interaction is compared in common scenarios and in settings where the capacity of the messagepassing layer is severely reduced and ultimately the messagepassing phase is removed altogether. Our results indicate that using tensorbased encodings is always at least on par with the concatenationbased encoding and that it makes the model much more robust when the message passing layers are removed, on some tasks incurring almost no drop in performance. This suggests that the importance of message passing is limited when the model can construct strong structural encodings. 
Floor Eijkelboom · Erik Bekkers · Michael Bronstein · Francesco Di Giovanni 🔗 
Fri 2:10 p.m.  3:00 p.m.

Linear Regression on Manifold Structured Data: the Impact of Extrinsic Geometry on Solutions
(
Poster
)
link »
In this paper, we study linear regression applied to data structured on a manifold. We assume that the data manifold is smooth and is embedded in a Euclidean space, and our objective is to reveal the impact of the data manifold's extrinsic geometry on the regression. Specifically, we analyze the impact of the manifold's curvatures (or higher order nonlinearity in the parameterization when the curvatures are locally zero) on the uniqueness of the regression solution. Our findings suggest that the corresponding linear regression does not have a unique solution when the manifold is flat. Otherwise, the manifold's curvature (or higher order nonlinearity in the embedding) may contribute significantly, particularly in the solution associated with the normal directions of the manifold. Our findings thus reveal the role of data manifold geometry in ensuring the stability of regression models for outofdistribution inferences. 
Liangchen Liu · Juncai He · YenHsi Tsai 🔗 
Fri 2:10 p.m.  3:00 p.m.

kMeans Clustering with DistanceBased Privacy
(
Poster
)
link »
In this paper, we initiate the study of Euclidean clustering with Distancebased differential privacy. Distancebased privacy is motivated by the fact that it is often only needed to protect the privacy of exact, rather than approximate, locations. We provide constantapproximate algorithms for $k$means and $k$median clustering, with additive error depending only on the attacker's precision bound $\rho$, rather than the radius $\Lambda$ of the space. In addition, we empirically demonstrate that our algorithm performs significantly better than previous differentially private clustering algorithms, as well as naive distancebased private clustering baselines.

Alessandro Epasto · Vahab Mirrokni · Shyam Narayanan · Peilin Zhong 🔗 
Fri 2:10 p.m.  3:00 p.m.

Episodic Memory Theory of Recurrent Neural Networks: Insights into LongTerm Information Storage and Manipulation
(
Poster
)
link »
Recurrent neural networks (RNNs) have emerged as powerful models capable of storing and manipulating external information over long periods in various domains. Yet, the mechanisms that underly this behavior remain a mystery due to the blackbox nature of these models. This paper addresses this question by proposing an episodic memory theory of RNN dynamics, enabling a more comprehensive understanding of the RNN weights as memories and intermemory interactions. This approach sheds light on the inner workings of RNNs and connects to existing research on memory representation and organization. The theory extends the current linearization approaches by providing alternative interpretations of the eigenspectrum and its connection to the longterm storage and manipulation of information. We discuss how the segregation, representation, and composition of the variable binding problem—a fundamental question in cognitive science and artificial intelligence—can be mechanistically interpreted within the theory. Using an elementary task  repeat copy, we demonstrate the validity of the theory in experimental settings. Our work represents a step towards opening the black box of RNNs, offering new insights into their functionality and bridging the gap between recurrent neural networks and memory models. 
Arjun Karuvally · Peter DelMastro · Hava Siegelmann 🔗 
Fri 2:10 p.m.  3:00 p.m.

A new perspective on building efficient and expressive 3D equivariant graph neural networks
(
Poster
)
link »
Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects. Despite rapid progress in encoding 3D symmetries into Graph Neural Networks (GNNs), a comprehensive evaluation of the expressiveness of these networks through a localtoglobal analysis lacks today. In this paper, we propose a local hierarchy of 3D isomorphism to evaluate the expressive power of equivariant GNNs and investigate the process of representing global geometric information from local patches. Our work leads to two crucial modules for designing expressive and efficient geometric GNNs; namely local substructure encoding (\textbf{LSE}) and frame transition encoding (\textbf{FTE}). To demonstrate the applicability of our theory, we propose LEFTNet which effectively implements these modules and achieves stateoftheart performance on both scalarvalued and vectorvalued molecular property prediction tasks. We further point out the design space for future developments of equivariant graph neural networks. 
weitao du · Yuanqi Du · Limei Wang · Dieqiao Feng · Guifeng Wang · Shuiwang Ji · Carla Gomes · Zhiming Ma 🔗 
Fri 2:10 p.m.  3:00 p.m.

Geometrically Regularized Wasserstein Dictionary Learning
(
Poster
)
link »
Wasserstein dictionary learning is an unsupervised approach to learning a collection of probability distributions that generate observed distributions as Wasserstein barycentric combinations. Existing methods solve an optimization problem that only seeks a dictionary and weights that minimize the reconstruction accuracy. However, there is no a priori reason to believe there are unique solutions in general to this problem. Moreover, the learned dictionary is, by design, optimized to represent the observed data set, and may not be useful for classification tasks or generative modeling. Just as regularization plays a key role in linear dictionary learning, we propose a geometric regularizer for Wasserstein space that promotes representations of a data distributions using nearby dictionary elements. We show that this regularizer leads to barycentric weights that concentrate on dictionary atoms local to each data distribution. When data are generated as Wasserstein barycenters of fixed distributions, this regularizer facilitates the recovery of the generating distributions in cases that are illposed for unregularized Wasserstein dictionary learning. Through experimentation on synthetic and real data, we show that our geometrically regularized approach yields more interpretable dictionaries in Wasserstein space which perform better in downstream applications. 
Marshall Mueller · Shuchin Aeron · James Murphy · Abiy Tasissa 🔗 
Fri 2:10 p.m.  3:00 p.m.

The WeisfeilerLehman Distance: Reinterpretation and Connection with GNNs
(
Poster
)
link »
In this paper, we present a novel interpretation of the WeisfeilerLehman (WL) distance introduced by \cite{chen2022weisfeilerlehman} using concepts from stochastic processes. The WL distance aims compares graphs with node features, has the same discriminative power as the classic WeisfeilerLehman graph isomorphism test and has deep connections to the GromovWasserstein distance. Our interpretation connects the WL distance to the literature on distances for stochastic processes, which also makes the interpretation of the distance more accessible and intuitive. We further explore the connections between the WL distance and certain Message Passing Neural Networks, and discuss the implications of the WL distance for understanding the Lipschitz property and the universal approximation results for these networks. 
Samantha Chen · Sunhyuk Lim · Facundo Memoli · Zhengchao Wan · Yusu Wang 🔗 
Fri 2:10 p.m.  3:00 p.m.

Learning To See Topological Properties In 4D Using Convolutional Neural Networks
(
Poster
)
link »
Topology describes the essential structure of a space, and in 4D, a larger variety of topologically distinct manifolds can be embedded versus 2D or 3D. The present study investigates an endtoend visual approach, which couples data generation software and convolutional neural networks (CNNs) to estimate the topology of 4D data. A synthetic 4D training data set is generated with the use of several manifolds, and then labelled by using techniques from algebraic topology. Several approaches to implementing a 4D convolution layer are compared. Experiments demonstrate that already a basic CNN can be trained to provide estimates for the Betti numbers associated with the number of one, two, and threedimensional holes in the data. Some of the intricacies of topological data analysis in the 4D setting are also put on view, including aspects of persistent homology. 
Khalil Mathieu Hannouch · Stephan Chalup 🔗 
Fri 2:10 p.m.  3:00 p.m.

Learning Large Graph Property Prediction via Graph Segment Training
(
Poster
)
link »
Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divideandconquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GSTEFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GSTEFD is both memoryefficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime. 
Kaidi Cao · Phitchaya Phothilimthana · Sami AbuElHaija · Dustin Zelle · Yanqi Zhou · Charith Mendis · Jure Leskovec · Bryan Perozzi 🔗 
Fri 2:10 p.m.  3:00 p.m.

FisherRao and pullback Hilbert cone distances on the multivariate Gaussian manifold with applications to simplification and quantization of mixtures
(
Poster
)
link »
Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor medical imaging, structure tensor computer vision, radar signal processing, machine learning, etc.In order to process those data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities and paths joining normal distributions. The FisherRao distance defined as the Riemannian geodesic distance induced by the Fisher information is such a principled distance which however is not known in closedform excepts on a few particular cases.We first report a fast and robust method to approximate arbitrarily finely the FisherRao distance between normal distributions.Second, we introduce a distance based on a diffeomorphic embedding of the Gaussian manifold into a submanifold of the higherdimensional symmetric positivedefinite cone. We show that the projective Hilbert distance on the cone is a metric on the embedded Gaussian submanifold and pullback that distance with the straight line Hilbert cone geodesics to obtain a distance and paths between normal distributions.Compared to the FisherRao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only extreme eigenvalues of matrices.Finally, we show how to use those distances in clustering tasks. 
Frank Nielsen 🔗 
Fri 2:10 p.m.  3:00 p.m.

MASIL: Towards Maximum Separable Class Representation for Few Shot Class Incremental Learning
(
Poster
)
link »
Few Shot Class Incremental Learning (FSCIL) with few examples per class for each incremental session is the realistic setting of continual learning since obtaining large number of annotated samples is not feasible and cost effective. We present the framework MASIL as a step towards learning the maximal separable classifier. It addresses the common problem i.e forgetting of old classes and overfitting to novel classes by learning the classifier weights to be maximally separable between classes forming a simplex Equiangular Tight Frame. We propose the idea of concept factorization explaining the collapsed features for base session classes in terms of concept basis and use these to induce classifier simplex for few shot classes. We further adds fine tuning to reduce any error occurred during factorization and train the classifier jointly on base and novel classes without retaining any base class samples in memory. Experimental results on miniImageNet, CIFAR100 and CUB200 demonstrate that MASIL outperforms all the benchmarks. 
Anant Khandelwal 🔗 
Fri 2:10 p.m.  3:00 p.m.

Poster Session 1
(
Poster Session
)

🔗 
Fri 3:00 p.m.  4:00 p.m.

Lunch

🔗 
Fri 4:00 p.m.  4:40 p.m.

Graph Rewiring in GNNs
(
Keynote
)
SlidesLive Video » 
Michael M. Bronstein 🔗 
Fri 4:40 p.m.  4:50 p.m.

GRIL: A $2$parameter Persistence Based Vectorization for Machine Learning
(
Spotlight
)
link »
SlidesLive Video »
$1$parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$parameter persistence modules induced by bifiltration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (Gril) for $2$parameter persistence modules. We show that this vector representation is $1$Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets and compare the results with previous vector representations of $1$parameter and $2$parameter persistence modules. Further, we augment GNNs with Gril features and observe an increase in performance indicating that Gril can capture additional features enriching GNNs.

Shreyas N. Samaga 🔗 
Fri 4:50 p.m.  5:00 p.m.

Breaking the Curse of Depth in Graph Convolutional Networks via Refined Initialization Strategy
(
Spotlight
)
link »
SlidesLive Video » Graph convolutional networks (GCNs) suffer from the curse of depth, a phenomenon where performance degrades significantly as network depth increases. While oversmoothing has been considered the primary cause of this issue, we discover that gradient vanishing or exploding under commonlyused initialization methods also contributes to the curse of depth. To this end, we propose to evaluate GCN initialization quality from three aspects: forwardpropagation, backwardpropagation, and output diversity. We theoretically prove that conventional initialization methods fail to simultaneously maintain reasonable forward propagation and output diversity. To tackle this problem, We develop a new GCN initialization method called Signal Propagation on Graph (SPoGInit). By carefully designing and optimizing initial weight metrics, SPoGInit effectively alleviates performance degradation in deep GCNs. We further introduce a new architecture termed ReZeroGCN, which simultaneously addresses the three aspects at initialization. This architecture achieves performance gains on node classification tasks when increasing the depth from 4 to 64, e.g., 10\% gain in training and 3\% gain in test accuracy on OGBNArxiv. To the best of our knowledge, this is the first result to fully resolve the curse of depth on OGBNArxiv over such a range of depths. 
Senmiao Wang 🔗 
Fri 5:00 p.m.  5:40 p.m.

Intuition for the Data Types and Interactions of Euclidean Neural Networks
(
Keynote
)
SlidesLive Video » 3D Euclidean symmetryequivariant neural networks (E(3)NNs) are emerging as an effective machine learning paradigm in molecular modeling, protein design, computer graphics, and beyond. In this talk, I'll discuss the fundamental building blocks of E(3)NNs and how these pieces are combined to create the growing zoo of E(3)NNs available today. 
Tess Smidt 🔗 
Fri 5:40 p.m.  5:50 p.m.

Unsupervised Embedding Quality Evaluation
(
Spotlight
)
link »
SlidesLive Video » Unsupervised learning has recently significantly gained in popularity, especially with deep learningbased approaches. Despite numerous successes and approaching supervisedlevel performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often unclear whether they will perform well when transferred to another domain.Past works are generally limited to assessing the amount of information contained in embeddings, which is most relevant for selfsupervised learning of deep neural networks. This works chooses to follow a different approach: can we quantify how easy it is to linearly separate the data in a stable way? We survey the literature and uncover three methods that could be potentially used for evaluating quality of representations. We also introduce one novel method based on recent advances in understanding the highdimensional geometric structure selfsupervised learning.We conduct extensive experiments and study the properties of these metrics and ones introduced in the previous work. Our results suggest that while there is no free lunch, there are metrics that can robustly estimate embedding quality in an unsupervised way. 
Anton Tsitsulin 🔗 
Fri 5:50 p.m.  6:00 p.m.

Topologically Attributed Graphs for Shape Discrimination
(
Spotlight
)
link »
SlidesLive Video » In this paper we introduce a novel family of attributed graphs for the purpose of shape discrimination. Our graphs typically arise from variations on the Mapper graph construction, which is an an approximation of the Reeb graph for point cloud data. Our attributions enrich these constructions with (persistent) homology in ways that are provably stable, thereby recording extra topological information that is typically lost in these graph constructions. We provide experiments which illustrate the use of these invariants for shape representation and classification. In particular, we obtain competitive shape classification results when using our topologically attributed graphs as inputs to a simple graph neural network classifier. 
Florian Russold 🔗 
Fri 6:00 p.m.  6:30 p.m.

Coffee Break

🔗 
Fri 6:30 p.m.  7:00 p.m.

Topological Deep Learning Challenge
(
Presentation
)
link »
SlidesLive Video » 
Mathilde Papillon · Nina Miolane · Sophia Sanborn 🔗 
Fri 7:00 p.m.  8:00 p.m.

MixedCurvature Transformers for Graph Representation Learning
(
Poster
)
link »
Realworld graphs naturally exhibit hierarchical or cyclical structures that are unfit for the typical Euclidean space. While there exist graph neural networks that leverage nonEuclidean spaces to embed such structures more accurately, these methods are confined under the messagepassing paradigm, making the models vulnerable against sideeffects such as oversmoothing. More recent work have proposed attentionbased graph Transformers that can easily model longrange interactions, but their extensions towards nonEuclidean geometry are yet unexplored. To bridge this gap, we propose Fully ProductStereographic Transformer, a generalization of Transformers towards operating entirely on the product of constant curvature spaces. Our model can learn the curvature appropriate for the input graph in an endtoend fashion, without the need of additional tuning on different curvature initializations. We also provide a kernelized approach to nonEuclidean attention, which enables our model to run in cost linear to the number of nodes and edges while respecting the underlying geometry. Experiments on graph reconstruction and node classification demonstrate the benefits of our approach. 
Sungjun Cho · Seunghyuk Cho · Sungwoo Park · Hankook Lee · Honglak Lee · Moontae Lee 🔗 
Fri 7:00 p.m.  8:00 p.m.

Sample Complexity Bounds for Estimating the Wasserstein Distance under Invariances
(
Poster
)
link »
Groupinvariant probability distributions appear in many datagenerative models in machine learning, such as graphs, point clouds, and images. In practice, one often needs to estimate divergences between such distributions. In this work, we study how the inherent invariances with respect to any smooth action of a Lie group on a manifold improve sample complexity when estimating the Wasserstein distance. Our result indicates a twofold gain: (1) reducing the sample complexity by a multiplicative factor corresponding to the group size (for finite groups) or the normalized volume of the quotient space (for groups of positive dimension), (2) improving the exponent in the convergence rate (for groups of positive dimension). These results are completely new for groups of positive dimension and tighten recent bounds for finite group actions. 
Behrooz Tahmasebi · Stefanie Jegelka 🔗 
Fri 7:00 p.m.  8:00 p.m.

Which Features are Learned by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression
(
Poster
)
link »
Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. However, supervised CL is prone to collapsing representations of subclasses within a class by not capturing all their features, and unsupervised CL may suppress harder classrelevant features by focusing on learning easy classirrelevant features; both significantly compromise representation quality. Yet, there is no theoretical understanding of \textit{class collapse} or \textit{feature suppression} at \textit{test} time. We provide the first unified theoretically rigorous framework to determine \textit{which} features are learnt by CL. Our analysis indicate that, perhaps surprisingly, bias of (stochastic) gradient descent towards finding simpler solutions is a key factor in collapsing subclass representations and suppressing harder classrelevant features. We also provide the first theoretical explanation for why employingsupervised and unsupervised CL together yields higherquality representations, even when using commonlyused stochastic gradient methods. 
Yihao Xue · Siddharth Joshi · Eric Gan · PinYu Chen · Baharan Mirzasoleiman 🔗 
Fri 7:00 p.m.  8:00 p.m.

Unsupervised Learning of 3colorings using Simplicial HigherOrder Neural Networks
(
Poster
)
link »
We propose HigherOrder Networks (HONs) for historically challenging problems for Graph Neural Networks (GNNs), such as Constraint Satisfaction Problems (CSPs). We apply a simple extension of GNNs to HONs and show its advantages for solving 3coloring. 
Lucas Laird · Robin Walters · Wolfgang Gatterbauer 🔗 
Fri 7:00 p.m.  8:00 p.m.

Equivariant Representation Learning with Equivariant Convolutional Kernel Networks
(
Poster
)
link »
Convolutional Kernel Networks (CKNs) were proposed as multilayered representation learning models that are based on stacking multiple Reproducing Kernel Hilbert Spaces (RKHSs) in a hierarchical manner. CKN has been studied to understand the (near) group invariance and (geometric) deformation stability properties of deep convolutional representations by exploiting the geometry of corresponding RKHSs. The objective of this paper is twofold: (1) Analyzing the construction of group equivariant Convolutional Kernel Networks (equivCKNs) that induce in the model symmetries like translation, rotation etc., (2) Understandingthe deformation stability of equivCKNs that takes into account the geometry of inductive biases and that of RKHSs. Multiple kernel based construction ofequivariant representations might be helpful in understanding the geometric model complexity of equivariant CNNs as well as shed lights on the construction practicalities of robust equivariant networks. 
Soutrik Roy Chowdhury · Johan Suykens 🔗 
Fri 7:00 p.m.  8:00 p.m.

On the Relationship Between Data Manifolds and Adversarial Examples
(
Poster
)
link »
In this work we study adversarial examples in deep neural networks through the lens of a predefined data manifold.By forcing certain geometric properties of this manifold, we are able to analyze the behavior of the learned decision boundaries.It has been shown previously that training to be robust against adversarial attacks produces models with gradients aligned to a small set of principal variations in the data. We demonstrate the converse of this statement; aligning model gradients with a select set of principal variations improves robustness against gradient based adversarial attacks. Our analysis shows that this also makes data more orthogonal to decision boundaries. We conclude that robust training methods make the problem better posed by focusing the model on more important dimensions of variation. 
Michael Geyer · Brian Bell · Amanda Fernandez · Juston Moore 🔗 
Fri 7:00 p.m.  8:00 p.m.

Asynchronous Algorithmic Alignment with Cocycles
(
Poster
)
link »
Stateoftheart neural algorithmic reasoners make use of message passing in graph neural networks (GNNs). But typical GNNs blur the distinction between the definition and invocation of the message function, forcing a node to send messages to its neighbours at every layer, synchronously. When applying GNNs to learn to execute dynamic programming algorithms, however, on most steps only a handful of the nodes would have meaningful updates to send. One, hence, runs the risk of inefficiencies by sending too much irrelevant data across the graphwith many intermediate GNN steps having to learn identity functions. In this work, we explicitly separate the concepts of node state update and message function invocation. With this separation, we obtain a mathematical formulation that allows us to reason about asynchronous computation in both algorithms and neural networks. 
Andrew Dudzik · Tamara von Glehn · Razvan Pascanu · Petar Veličković 🔗 
Fri 7:00 p.m.  8:00 p.m.

DeepEMD: A Transformerbased Fast Estimation of the Earth Mover's Distance
(
Poster
)
link »
The Earth Mover's Distance (EMD) is the measure of choice to assess similarity between point clouds. However the computational cost of standard algorithms to compute it makes it prohibitive as a training loss, and the standard approach is to use a surrogate such as the Chamfer distance. We propose instead to use a deep model dubbed DeepEMD to directly get an estimate of the EMD. We formulate casting the prediction of the bipartite matching as that of an attention matrix, from which we get an accurate estimate of both the EMD, and its gradient. Experiments demonstrate not only the accuracy of this model, in particular even when test and train data are from different origins. Moreover, in our experiments, the model performs accurately when processing point clouds which are several times larger than those seen during training. Computationwise, while the complexity of the exact Hungarian algorithm is $O(N^3)$, DeepEMD scales as $O(N^2)$, where $N$ is the total number of points. This leads to a $100\times$ wallclock speedup with $1024$ points. DeepEMD also achieves better performance than the standard Sinkhorn algorithm, with about $40\times$ speedup. The availability of gradients allows DeepEMD to be used for training a VAE, leading to a model with lower reconstruction EMD than a standard baseline trained with Chamfer distance.

Atul Kumar Sinha · François Fleuret 🔗 
Fri 7:00 p.m.  8:00 p.m.

Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision
(
Poster
)
link »
A NN consisting of piecewise affine building blocks, such as fullyconnected layers and ReLU activations, is itself a piecewise affine function supported on a polyhedral complex. This complex has been studied to characterize theoretical properties of NNs and linked to geometry representations, but, in practice, extracting it remains a challenge. Previous works subdivide the regions via intersections with hyperplanes induced by each neuron. Instead, we propose to subdivide the edges, leading to a novel method for polyhedral complex extraction.This alleviates computational redundancy and affords efficient datastructures. A key to this are signvectors, which encode the combinatorial structure of the complex. Our implementation (available on GitHub) uses standard tensor operations and can run exclusively on the GPU, taking seconds for millions of cells on a consumer grade machine. Motivated by the growing interest in neural shape representation, we use the speed and differentiablility of our method to optimize geometric properties of the complex. 
Arturs Berzins 🔗 
Fri 7:00 p.m.  8:00 p.m.

The Exact Sample Complexity Gain from Invariances for Kernel Regression
(
Poster
)
link »
In practice, encoding invariances into models improves sample complexity. In this work, we study this phenomenon from a theoretical perspective. In particular, we provide minimax optimal rates for kernel ridge regression on compact manifolds, with a target function that is invariant to a group action on the manifold. Our results hold for any smooth compact Lie group action, even groups of positive dimension. For a finite group, the gain effectively multiplies the number of samples by the group size. For groups of positive dimension, the gain is observed by a reduction in the manifold's dimension, in addition to a factor proportional to the volume of the quotient space. Our proof takes the viewpoint of differential geometry, in contrast to the more common strategy of using invariant polynomials. This new geometric viewpoint on learning with invariances may be of independent interest. 
Behrooz Tahmasebi · Stefanie Jegelka 🔗 
Fri 7:00 p.m.  8:00 p.m.

Implicitly Learned Invariance and Equivariance in Linear Regression
(
Poster
)
link »
Can deep learning models generalize if their problem's underlying structure is unknown a priori? We analyze this theoretically and empirically in an idealistic setting for linear regression with invariant/equivariant data. We prove that linear regression models learn to become invariant/equivariant, with their weights being decomposed into a component that respects the symmetry and one that does not. These two components evolve independently over time, with the asymmetric component decaying exponentially given sufficient data. Extending these results to complex systems will be pursued in future work. 
Yonatan Gideoni 🔗 
Fri 7:00 p.m.  8:00 p.m.

Learning Structured Representations with Equivariant Contrastive Learning
(
Poster
)
link »
Selfsupervised learning converts raw perceptual data such as images to a compact space where simple Euclidean distances measure meaningful variations in data. In this paper, we extend this formulation by adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations of embedding space. Specifically, in the contrastive learning setting, we introduce an equivariance objective and theoretically prove that its minima forces augmentations on input space to correspond to rotations on the spherical embedding space. We show that merely combining our equivariant loss with a noncollapse term results in nontrivial representations, without requiring invariance to data augmentations. Optimal performance is achieved by also encouraging approximate invariance, where input augmentations correspond to small rotations. Our method, CARE: Contrastive Augmentationinduced Rotational Equivariance, leads to improved performance on downstream tasks and ensures sensitivity in embedding space to important variations in data (e.g., color) that standard contrastive methods do not achieve. 
Sharut Gupta · Joshua Robinson · Derek Lim · Soledad Villar · Stefanie Jegelka 🔗 
Fri 7:00 p.m.  8:00 p.m.

Nonisotropic Persistent Homology
(
Poster
)
link »
Persistent Homology is a widely used topological data analysis tool that creates a concise description of the topological properties of a point cloud based on a specified filtration.Most of these filtrations used for persistent homology depend (implicitly) on a chosen metric, which is typically agnostically chosen as the standard euclidean metric on $\mathbb{R}^n$.Recent work has tried to uncover the "true" metric on the point cloud using distancetomeasure functions, in order to obtain more meaningful persistent homology results.Here we propose an alternative look at this problem:we posit that information on the point cloud is lost when restricting persistent homology to a single (correct) distance function. Instead, we show how by varying the distance function on the underlying space and analysing the corresponding shifts in the persistence diagrams, we can extract additional topological and geometrical information.Finally, we show in synthetic experiments that nonisotropic persistent homology (NIPH) can extract information on orientation, orientational variance, and scaling of randomly generated point clouds with good accuracy.

Vincent Grande · Michael Schaub 🔗 
Fri 7:00 p.m.  8:00 p.m.

GromovHausdorff Distances for Comparing Product Manifolds of Model Spaces
(
Poster
)
link »
Recent studies propose enhancing machine learning models by aligning the geometric characteristics of the latent space with the underlying data structure. Instead of relying solely on Euclidean space, researchers suggest using hyperbolic and spherical spaces with constant curvature, or their combinations (product manifolds), to improve model performance. However, determining the best latent product manifold signature, which refers to the choice and dimensionality of manifold components, lacks a principled technique. To address this, we introduce a novel notion of distance between candidate latent geometries using GromovHausdorff from metric geometry. We propose using a graph search space that utilizes computed GromovHausdorff distances to search for the optimal latent geometry. In this work we focus on providing a description of an algorithm to compute the GromovHausdorff distance between model spaces and its computational implementation. 
Haitz Sáez de Ocáriz Borde · Alvaro Arroyo · Ismael Morales · Ingmar Posner · Xiaowen Dong 🔗 
Fri 7:00 p.m.  8:00 p.m.

Latent Space Symmetry Discovery
(
Poster
)
link »
Existing equivariant neural networks require explicit knowledge of the symmetry group before model implementation. Various symmetry discovery methods have been developed to learn invariance and equivariance from data, but their search spaces are limited to linear symmetries. We propose to discover arbitrary nonlinear symmetries by factorizing the group action into nonlinear transformations parameterized by an autoencoder network and linear symmetries generated by an existing symmetry discovery framework, LieGAN. Our method can capture the intrinsic symmetry in highdimensional observations, which also results in a wellstructured latent space that is useful for other downstream tasks, including longterm prediction and latent space equation discovery. 
Jianke Yang · Nima Dehmamy · Robin Walters · Rose Yu 🔗 
Fri 7:00 p.m.  8:00 p.m.

Assessing Neural Network Representations During Training Using Data Diffusion Spectra
(
Poster
)
link »
Here we present information theoretic measures based on the data diffusion operator as characterisations of the representations learned by neural networks. Specifically, we define diffusion spectral entropy (DSE), i.e., entropy of the diffusion operator computed on the neural representation of a dataset as well as diffusion spectral mutual information (DSMI), which assesses the relationship between different sets of variables representing data. First, we show that these definitions form robust measures of intrinsic dimensionality and relationship strength respectively on toy data, outperforming binned Shannon entropy in terms of accuracy. Then we study the evolution of representations within classification networks and networks with selfsupervised losses. In both cases, we see that generalizable training results in decrease in DSE over epochs  starting from a random initialization. We also see that there is an increase in DSMI with the class label over time. On the other hand, training with corrupt labels results in a maintenance or increase in entropy and nearzero DSMI with labels. We also assess DSMI with the input and observe differing trends. On MNIST it grows until plateaus, whereas on CIFAR it increases and then decreases. Overall results show that these measures can elucidate characteristics of network performance as well as data complexity. Code is available at https://github.com/ChenLiu1996/DiffusionSpectralEntropy. 
Danqi Liao · Chen Liu · Alexander Tong · Guillaume Huguet · Guy Wolf · Maximilian Nickel · Ian Adelstein · Smita Krishnaswamy 🔗 
Fri 7:00 p.m.  8:00 p.m.

Learned Gridification for Efficient Point Cloud Processing
(
Poster
)
link »
Neural operations that rely on neighborhood information are much more expensive when deployed on point clouds than on grid data due to the irregular distances between points in a point cloud. For example, in a convolution, the convolutional kernel must be recomputed for every point in a point cloud to consider the distances to all other points in its neighbourhood. In a grid, on the other hand, we can compute the kernel only once and reuse it for all query positions. As a result, operations that rely on neighborhood information scale much worse for point clouds than for grid data, specially for large inputs and large neighborhoods.In this work, we address the scalability issue of point cloud methods by tackling its root cause: the irregularity of the data. To this end, we propose learnable gridification as the first step in a point cloud processing pipeline to transform the point cloud into a compact, regular grid. Thanks to gridification, subsequent layers can use operations defined on regular grids, e.g., Conv3D, which scale much better than native point cloud methods. We then extend gridification to point cloud to point cloud tasks, e.g., segmentation, by adding a earnable degridification step at the end of the point cloud processing pipeline to map the compact, regular grid back to its original point cloud form. Through theoretical and empirical analysis, we show that gridified networks scale better in terms of memory and time than networks directly applied on raw point cloud data, while being able to achieve competitive results. 
Putri van der Linden · David Romero · Erik Bekkers 🔗 
Fri 7:00 p.m.  8:00 p.m.

FAM: Relative Flatness Aware Minimization
(
Poster
)
link »
Flatness of the loss curve around a model at hand has been shown to empirically correlate with its generalization ability. Optimizing for flatness has been proposed as early as 1994 by Hochreiter and Schmidthuber, and followed by more recent successful sharpnessaware optimization techniques. Their widespread adoption in practice, though, is dubious because of the lack of theoretically grounded connection between flatness and generalization, in particular in light of the reparameterization cursecertain reparameterizations of a neural network change most flatness measures but do not change generalization. Recent theoretical work suggests that a particular relative flatness measure can be connected to generalization and solves the reparameterization curse. In this paper, we derive a regularizer based on this relative flatness that is easy to compute, fast, efficient, and works with arbitrary loss functions. It requires computing the Hessian only of a single layer of the network, which makes it applicable to large neural networks, and with it avoids an expensive mapping of the loss surface in the vicinity of the model. In an extensive empirical evaluation we show that this relative flatness aware minimization (FAM) improves generalization in a multitude of applications and models, both in finetuning and standard training. 
Linara Adilova · Amr Abourayya · Jianning Li · Amin Dada · Henning Petzka · Jan Egger · Jens Kleesiek · Michael Kamp 🔗 
Fri 7:00 p.m.  8:00 p.m.

OneShot Neural Network Pruning via Spectral Graph Sparsification
(
Poster
)
link »
Neural network pruning has gained significant attention for its potential to reduce computational resources required for training and inference. A large body of research has shown that networks can be pruned both after training and at initialisation, while maintaining competitive accuracy compared to dense networks. However, current methods rely on iteratively pruning or repairing the network to avoid overpruning and layer collapse. Recent work has found that by treating neural networks as a sequence of bipartite graphs, pruning can be studied through the lens of spectral graph theory. Therefore, in this work, we propose a novel pruning approach using spectral sparsification, which aims to preserve meaningful properties of a dense graph with a sparse subgraph, by preserving the spectrum of the dense graph's adjacency matrix. We empirically validate and investigate our method, and show that oneshot pruning using spectral sparsification preserves performance at higher levels of sparsity compared to its oneshot counterparts. Additionally, we theoretically analyse our method with respect to local and global connectivity. 
Steinar Laenen 🔗 
Fri 7:00 p.m.  8:00 p.m.

Topologically Attributed Graphs for Shape Discrimination
(
Poster
)
link »
In this paper we introduce a novel family of attributed graphs for the purpose of shape discrimination. Our graphs typically arise from variations on the Mapper graph construction, which is an an approximation of the Reeb graph for point cloud data. Our attributions enrich these constructions with (persistent) homology in ways that are provably stable, thereby recording extra topological information that is typically lost in these graph constructions. We provide experiments which illustrate the use of these invariants for shape representation and classification. In particular, we obtain competitive shape classification results when using our topologically attributed graphs as inputs to a simple graph neural network classifier. 
Justin Curry · Washington Mio · Tom Needham · Osman Okutan · Florian Russold 🔗 
Fri 7:00 p.m.  8:00 p.m.

Visualizing and Analyzing the Topology of Neuron Activations in Deep Adversarial Training
(
Poster
)
link »
Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model's predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure  in particular, mapper graphs  of neuron activations of data samples produced by deep adversarial training.Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection.We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness. 
Youjia Zhou · Yi Zhou · Jie Ding · Bei Wang 🔗 
Fri 7:00 p.m.  8:00 p.m.

Bridging Equational Properties and Patterns on Graphs: an AIBased Approach
(
Poster
)
link »
AIassisted solutions have recently proven successful when applied to Mathematics and have opened new possibilities for exploring unsolved problems that have eluded traditional approaches for years or even centuries.Following this direction, this paper presents an innovative approach aiming at establishing correlations between equational properties of algebraic structures that can be represented through graphs and specific subportions of their topological representation. The methodology incorporates the utilization of graph neural architectures to validate theorems or conjectures, complemented by Explainability (XAI) metrics that lend support to these statements.In particular, we examine the distributive and modular properties of algebraic lattices, whose characterization is wellknown in universal algebra, hence using these properties as an experimental test bench. The findings of this study demonstrate the effectiveness of the proposed approach in identifying and retrieving established subpatterns that characterize the equational properties under investigation. Moreover, the approach exhibits the capability to generate novel and noteworthy candidates as theorem suggesters, thereby offering valuable prospects for further exploration by mathematicians. 
Oguzhan Keskin · Alisia Lupidi · Stefano Fioravanti · Lucie Charlotte Magister · Pietro Barbiero · Pietro Lió · Francesco Giannini 🔗 
Fri 7:00 p.m.  8:00 p.m.

Unsupervised Embedding Quality Evaluation
(
Poster
)
link »
Unsupervised learning has recently significantly gained in popularity, especially with deep learningbased approaches. Despite numerous successes and approaching supervisedlevel performance on a variety of academic benchmarks, it is still hard to train and evaluate SSL models in practice due to the unsupervised nature of the problem. Even with networks trained in a supervised fashion, it is often unclear whether they will perform well when transferred to another domain.Past works are generally limited to assessing the amount of information contained in embeddings, which is most relevant for selfsupervised learning of deep neural networks. This works chooses to follow a different approach: can we quantify how easy it is to linearly separate the data in a stable way? We survey the literature and uncover three methods that could be potentially used for evaluating quality of representations. We also introduce one novel method based on recent advances in understanding the highdimensional geometric structure selfsupervised learning.We conduct extensive experiments and study the properties of these metrics and ones introduced in the previous work. Our results suggest that while there is no free lunch, there are metrics that can robustly estimate embedding quality in an unsupervised way. 
Anton Tsitsulin · Marina Munkhoeva · Bryan Perozzi 🔗 
Fri 7:00 p.m.  8:00 p.m.

A marginbased multiclass generalization bound via geometric complexity
(
Poster
)
link »
There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper we investigate marginbased multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks and which measures the variability of the model function.We derive a new upper bound on the generalization error which scale with the marginnormalized geometric complexity of the network and which hold for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet18 model trained with SGD on the CIFAR10 and CIFAR100 datasets with both original and random labels. 
Michael Munn · Benoit Dherin · Xavi Gonzalvo 🔗 
Fri 7:00 p.m.  8:00 p.m.

On genuine invariance learning without weighttying
(
Poster
)
link »
This paper investigates the properties and limitations of learned invariance in neural networks as opposed to builtin invariance achieved through the invariant weighttying. We demonstrate that learned invariance heavily relies on the input data and degrades quickly when moving away from the training data. Next, we address the challenge of aligning datadriven invariance learning with the genuine invariance of weighttying networks. We show that with a simple invariance regularization it is possible for networks to learn the invariance which closely reassembles the rigid invariance of the weighttying networks, but at the cost of the downstream task performance. We further investigate the performance decay under the learned invariance and we demonstrate that it is due to the implicit regularization of datadriven invariance learning, which constrains the sensitivity of a networks to input any perturbations. This presents a new challenging problem of achieving genuine invariance by learning, while also maintaining the downstream task performance. 
Artem Moskalev · Anna Sepliarskaia · Erik Bekkers · Arnold Smeulders 🔗 
Fri 7:00 p.m.  8:00 p.m.

Breaking the Curse of Depth in Graph Convolutional Networks via Refined Initialization Strategy
(
Poster
)
link »
Graph convolutional networks (GCNs) suffer from the curse of depth, a phenomenon where performance degrades significantly as network depth increases. While oversmoothing has been considered the primary cause of this issue, we discover that gradient vanishing or exploding under commonlyused initialization methods also contributes to the curse of depth. To this end, we propose to evaluate GCN initialization quality from three aspects: forwardpropagation, backwardpropagation, and output diversity. We theoretically prove that conventional initialization methods fail to simultaneously maintain reasonable forward propagation and output diversity. To tackle this problem, We develop a new GCN initialization method called Signal Propagation on Graph (SPoGInit). By carefully designing and optimizing initial weight metrics, SPoGInit effectively alleviates performance degradation in deep GCNs. We further introduce a new architecture termed ReZeroGCN, which simultaneously addresses the three aspects at initialization. This architecture achieves performance gains on node classification tasks when increasing the depth from 4 to 64, e.g., 10\% gain in training and 3\% gain in test accuracy on OGBNArxiv. To the best of our knowledge, this is the first result to fully resolve the curse of depth on OGBNArxiv over such a range of depths. 
Senmiao Wang · Yupeng Chen · Yushun Zhang · Tian Ding · Ruoyu Sun 🔗 
Fri 7:00 p.m.  8:00 p.m.

Metric Space Magnitude and Generalisation in Neural Networks
(
Poster
)
link »
Deep learning models have seen significant successes in numerous applications, but their inner workings remain elusive. The purpose of this work is to quantify the learning process of deep neural networks through the lens of a novel topological invariant called magnitude. Magnitude is an isometry invariant; its properties are an active area of research as it encodes many known invariants of a metric space. We use magnitude to study the internal representations of neural networks and propose a new method for determining their generalisation capabilities. Moreover, we theoretically connect magnitude dimension and the generalisation error, and demonstrate experimentally that the proposed framework can be a good indicator of the latter. 
Rayna Andreeva · Katharina Limbeck · Bastian Rieck · Rik Sarkar 🔗 
Fri 7:00 p.m.  8:00 p.m.

Nonlinear Embeddings in Hilbert Simplex Geometry
(
Poster
)
link »
A key technique of machine learning and computer vision is to embed discrete weighted graphs into continuous spaces for further downstream analysis. Embedding discrete hierarchical structures in hyperbolic geometry has proven very successful since it was shown that any weighted tree can be embedded in that geometry with arbitrary low distortion. Various optimization methods for hyperbolic embeddings based on common models of hyperbolic geometry have been studied. In this paper, we consider Hilbert geometry for the standard simplex which is isometric to a vector space equipped with the variation polytope norm. We study the representation power of this Hilbert simplex geometry by embedding distance matrices of graphs. Our findings demonstrate that Hilbert simplex geometry is competitive to alternative geometries such as the Poincar\'e hyperbolic ball or the Euclidean geometry for embedding tasks while being fast and numerically robust. 
Frank Nielsen · Ke Sun 🔗 
Fri 7:00 p.m.  8:00 p.m.

Product Manifold Learning with Independent Coordinate Selection
(
Poster
)
link »
In many dimensionality reduction tasks, we wish to identify the constituent components that explain our observations. For manifold learning, this can be formalized as factoring a Riemannian product manifold. Recovering this factorization, however, may suffer from certain difficulties in practice, especially when data is sparse or noisy, or when one factor is distorted by the other. To address these limitations, we propose identifying nonredundant coordinates on the product manifold before applying product manifold learning to identify which coordinates correspond to different factor manifolds. We demonstrate our approach on both synthetic and realworld data. 
Jesse He · Tristan Brugère · Gal Mishne 🔗 
Fri 7:00 p.m.  8:00 p.m.

Breaking the Structure of Multilayer Perceptrons with Complex Topologies
(
Poster
)
link »
Recent advances in neural network (NN) architectures have demonstrated that complex topologies possess the potential to surpass the performance of conventional feedforward networks. Nonetheless, previous studies investigating the relationship between network topology and model performance have yielded inconsistent results, complicating their applicability in contexts beyond those scrutinized. In this study, we examine the utility of directed acyclic graphs (DAGs) for modeling intricate relationships among neurons within NNs. We introduce a novel algorithm for the efficient training of DAGbased networks and assess their performance relative to multilayer perceptrons (MLPs). Through experimentation on synthetic datasets featuring varying levels of difficulty and noise, we observe that complex networks founded on pertinent graphs outperform MLPs in terms of accuracy, particularly within highdifficulty scenarios. Additionally, we explore the theoretical underpinnings of these observations and explore the potential tradeoffs associated with employing complex networks. Our research offers valuable insights into the capabilities and constraints of sophisticated NN architectures, thus contributing to the ongoing pursuit of designing more potent and efficient deep learning models. 
Tommaso Boccato · Matteo Ferrante · Andrea Duggento · Nicola Toschi 🔗 
Fri 7:00 p.m.  8:00 p.m.

Data, Geometry and Homology
(
Poster
)
link »
Homologybased invariants can be used to characterize the geometry of datasets and thereby gain some understanding of the processes generating those datasets. In this work we investigate how the geometry of a dataset changes when it is subsampled in various ways. In our framework the dataset serves as a reference object; we then consider different points in the ambient space and endow them with a geometry defined in relation to the reference object, for instance by subsampling the dataset proportionally to the distance between its elements and the point under consideration. We illustrate how this process can be used to extract rich geometrical information, allowing for example to classify points coming from different data distributions. 
Jens Agerberg · Wojciech Chacholski · Ryan Ramanujam 🔗 
Fri 7:00 p.m.  8:00 p.m.

Differentially Private Clustering in Data Streams
(
Poster
)
link »
Clustering problems (such as kmeans and kmedian) are fundamental unsupervised machine learning primitives. Recently, these problems have been subject to large interest in the privacy literature. All prior work on private clustering, however, has been devoted to the \emph{offline} case where the entire dataset is known in advance.In this work, we focus on the more challenging private data stream setting where the aim is to design memoryefficient algorithms that process a large stream \emph{incrementally} as points arrive in a private way. Our main contribution is to provide the first differentially private algorithms for $k$means and $k$median clustering in data streams. In particular, our algorithms are the first to guarantee differential privacy both in the continual release and in the oneshot setting while achieving space sublinear in the stream size. We complement our theoretical results with an empirical analysis of our algorithms on real data.

Alessandro Epasto · Tamalika Mukherjee · Peilin Zhong 🔗 
Fri 7:00 p.m.  8:00 p.m.

GRIL: A $2$parameter Persistence Based Vectorization for Machine Learning
(
Poster
)
link »
$1$parameter persistent homology, a cornerstone in Topological Data Analysis (TDA), studies the evolution of topological features such as connected components and cycles hidden in data. It has been applied to enhance the representation power of deep learning models, such as Graph Neural Networks (GNNs). To enrich the representations of topological features, here we propose to study $2$parameter persistence modules induced by bifiltration functions. In order to incorporate these representations into machine learning models, we introduce a novel vector representation called Generalized Rank Invariant Landscape (Gril) for $2$parameter persistence modules. We show that this vector representation is $1$Lipschitz stable and differentiable with respect to underlying filtration functions and can be easily integrated into machine learning models to augment encoding topological features. We present an algorithm to compute the vector representation efficiently. We also test our methods on synthetic and benchmark graph datasets and compare the results with previous vector representations of $1$parameter and $2$parameter persistence modules. Further, we augment GNNs with Gril features and observe an increase in performance indicating that Gril can capture additional features enriching GNNs.

Cheng Xin · Soham Mukherjee · Shreyas N. Samaga · Tamal Dey 🔗 
Fri 7:00 p.m.  8:00 p.m.

Hyperbolic SlicedWasserstein via Geodesic and Horospherical Projections
(
Poster
)
link »
Hyperbolic space embeddings have been shown beneficial for many learning tasks where data have an underlying hierarchical structure. Consequently, many machine learning tools were extended to such spaces, but only few discrepancies to compare probability distributions defined over those spaces exist. Among the possible candidates, optimal transport distances are well defined on such Riemannian manifolds and enjoy strong theoretical properties, but suffer from high computational cost. On Euclidean spaces, slicedWasserstein distances, which leverage a closedform solution of the Wasserstein distance in one dimension, are more computationally efficient, but are not readily available on hyperbolic spaces. In this work, we propose to derive novel hyperbolic slicedWasserstein discrepancies. These constructions use projections on the underlying geodesics either along horospheres or geodesics. We study and compare them on different tasks where hyperbolic representations are relevant, such as sampling or image classification. 
Clément Bonet · Laetitia Chapel · Lucas Drumetz · Nicolas Courty 🔗 
Fri 7:00 p.m.  8:00 p.m.

On the Ability of Graph Neural Networks to Model Interactions Between Vertices
(
Poster
)
link »
Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i.e. between the sides of a given partition of input vertices. Our results reveal that the ability to model interaction is primarily determined by the partition's walk index  a graphtheoretical characteristic defined by the number of walks originating from the boundary of the partition. Experiments with common GNN architectures corroborate this finding. As a practical application of our theory, we design an edge sparsification algorithm named Walk Index Sparsification (WIS), which preserves the ability of a GNN to model interactions when input edges are removed. WIS is simple, computationally efficient, and in our experiments has markedly outperformed alternative methods in terms of induced prediction accuracy. More broadly, it showcases the potential of improving GNNs by theoretically analyzing the interactions they can model. 
Noam Razin · Tom Verbin · Nadav Cohen 🔗 
Fri 7:00 p.m.  8:00 p.m.

A Geometric Insight into Equivariant Message Passing Neural Networks on Riemannian Manifolds
(
Poster
)
link »
In this work, we propose a geometric insight into equivariant message passing on Riemannian manifolds. As previously proposed, numerical features on Riemannian manifolds are represented as coordinate independent feature fields. To any coordinateindependent feature field on a manifold comes canonically attached an equivariant embedding of the principal bundle to the space of numerical features. The minimization of a twisted form of the Polyakov action with respect to the graph of this embedding yields an equivariant diffusion process on the associated vector bundle. By discretizing the diffusion equation flow for a fixed time step, we obtain a message passing scheme on the manifold. We propose a higherorder equivariant diffusion process equivalent to diffusion on the cartesian product of the base manifold. The discretization of the higherorder diffusion process on a graph yields a new general class of equivariant GNN generalizing the ACE and MACE formalism to Riemannian manifolds 
Ilyes Batatia 🔗 
Fri 7:00 p.m.  8:00 p.m.

ReLU Neural Networks, Polyhedral Decompositions, and Persistent Homology
(
Poster
)
link »
A ReLU neural network leads to a finite polyhedral decomposition of input space and a corresponding finite dual graph. We show that while this dual graph is a coarse quantization of input space, it is sufficiently robust that it can be combined with persistent homology to detect homological signals of manifolds in the input space from samples. This property holds for a wide range of networks trained for a wide range of purposes that have nothing to do with this topological application. We found this feature to be surprising and interesting; we hope it will also be useful. 
Yajing Liu · Christina Cole · Chris Peterson · Michael Kirby 🔗 
Fri 7:00 p.m.  8:00 p.m.

An ML approach to resolution of singularities
(
Poster
)
link »
The solution set of a system of polynomial equations typically contains illbehaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowingup", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2player game, the socalled Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities.In certain domains, the trained model outperforms stateoftheart selection heuristics in total number of polynomial additions performed, which provides a proofofconcept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation. 
Gergely Berczi · Honglu Fan · Mingcong Zeng 🔗 
Fri 7:00 p.m.  8:00 p.m.

On Explicit Curvature Regularization in Deep Generative Models
(
Poster
)
link »
We propose a family of curvaturebased regularization terms for deep generative model learning. Explicit coordinateinvariant formulas for both intrinsic and extrinsic curvature measures are derived for the case of arbitrary data manifolds embedded in higherdimensional Euclidean space. Because computing the curvature is a highly computationintensive process involving the evaluation of secondorder derivatives, efficient formulas are derived for approximately evaluating intrinsic and extrinsic curvatures. Comparative studies are conducted that compare the relative efficacy of intrinsic versus extrinsic curvaturebased regularization measures, as well as performance comparisons against existing autoencoder training methods. Experiments involving noisy motion capture data confirm that curvaturebased methods outperform existing autoencoder regularization methods, with intrinsic curvature measures slightly more effective than extrinsic curvature measures. 
Yonghyeon Lee · Frank Chongwoo Park 🔗 
Fri 7:00 p.m.  8:00 p.m.

Topological Feature Selection
(
Poster
)
link »
In this paper, we introduce a novel unsupervised, graphbased filter feature selection technique which exploits the power of topologically constrained network representations. We model dependency structures among features using a family of chordal graphs (i.e. the Triangulated Maximally Filtered Graph), and we maximise the likelihood of features' relevance by studying their relative position inside the network. Such an approach presents three aspects that are particularly satisfactory compared to its alternatives: (i) it is highly tunable and easily adaptable to the nature of input data; (ii) it is fully explainable, maintaining, at the same time, a remarkable level of simplicity; (iii) it is computationally cheap. We test our algorithm on $16$ benchmark datasets from different application domains showing that it outperforms or matches the current stateoftheart under heterogeneous evaluation conditions.

Antonio Briola · Tomaso Aste 🔗 
Fri 7:00 p.m.  8:00 p.m.

Poster Session 2
(
Poster Session
)

🔗 