Much of the data that is fueling current rapid advances in machine learning is: high dimensional, structurally complex, and strongly nonlinear. This poses challenges for researcher intuition when they ask (i) how and why current algorithms work and (ii) what tools will lead to the next big break-though. Mathematicians working in topology, algebra, and geometry have more than a hundred years worth of finely-developed machinery whose purpose is to give structure to, help build intuition about, and generally better understand spaces and structures beyond those that we can naturally understand. This workshop will show-case work which brings methods from topology, algebra, and geometry and uses them to help answer challenging questions in machine learning. With this workshop we will create a vehicle for disseminating machine learning techniques that utilize rich mathematics and address core challenges described in the ICML call for papers. Additionally, this workshop creates opportunity for presentation of approaches which may address critical, domain-specific ML challenges but do not necessarily demonstrate improved performance on mainstream, data-rich benchmarks. To this end our proposed workshop will open up IMCL to new researchers who in the past were not able to discuss their novel but data set-dependent analysis methods.We interpret topology, algebra, and geometry broadly and welcome submissions ranging from manifold methods to optimal transport to topological data analysis to mathematically informed deep learning. Through intellectual cross-pollination between data-driven and mathematically-inspired communities we believe this workshop will support the continued development of both groups and enable new solutions to problems in machine learning.
Fri 5:45 a.m. - 6:00 a.m.
|
Welcome and Comments from the Organizer
(
Welcome
)
link »
SlidesLive Video » |
Tegan Emerson · Henry Kvinge · Tim Doster · Sarah Tymochko · Alexander Cloninger 🔗 |
Fri 6:00 a.m. - 7:00 a.m.
|
The Memory of Persistence
(
Keynote
)
link »
SlidesLive Video » As we pass the 20th anniversary of persistent homology and the 10th anniversary of the deep learning revolution, it is time to take stock. In this talk, I will look back and discuss some of the milestones, i.e. work that embodies the intricate interplay of geometry, topology, and machine learning techniques, with a particular eye towards applications. I hope to convince the audience that, indeed, there is an unreasonable effectiveness to topological methods in general and persistent homology in particular. Next to this retrospective, I will also outline areas of improvement and 'unmet needs', with the goal of providing a vision for an even better future. To this end, I will highlight new research directions that aim to advance geometry, topology, and machine learning. |
Bastian Rieck 🔗 |
Fri 7:00 a.m. - 7:15 a.m.
|
Invariance-adapted Decomposition and Lasso-type Contrastive Learning
(
Spotlight Presentation
)
link »
SlidesLive Video » Recent years have witnessed the effectiveness of contrastive learning in obtaining the representation of dataset that is useful in interpretation and downstream tasks. However, the mechanism that describes this effectiveness have not been thoroughly analyzed, and many studies have been conducted to investigate the data structures captured by contrastive learning. In particular, the recent study of von Kugelgen et al. (2021) has shown that contrastive learning is capable of decomposing the data space into the space that is invariant to all augmentations and its complement. In this paper, we introduce the notion of invariance-adapted latent space that decomposes the data space into the intersections of the invariant spaces of each augmentation and their complements. This decomposition generalizes the one introduced in von Kugelgen et al. (2021), and describes a structure that is analogous to the frequencies in the harmonic analysis of a group. We experimentally show that contrastive learning with lasso-type metric can be used to find an invariance-adapted latent space, thereby suggesting a new potential for the contrastive learning. We also investigate when such a latent space can be identified up to mixings within each component. |
Masanori Koyama 🔗 |
Fri 7:15 a.m. - 7:45 a.m.
|
Break
|
🔗 |
Fri 7:45 a.m. - 8:45 a.m.
|
A Brief History of Geometric Data Science
(
Keynote
)
link »
SlidesLive Video » The article "50 years of Data Science" by the David Donoho has been influential in characterizing the workflow of Data Science research. It also outlines the recent history of Data Science from the perspective of the interplay between computer science and statistics. The focus of this talk will be on geometry and the applications of geometric frameworks to Data Science and how these ideas provide insight into understanding fundamental tools such as dimension reducing mappings. We start with optimal linear transformations which we trace to Beltrami and Jordan. We proceed to nonlinear data reducing mappings which are rooted in artificial neural networks. We will include at least 5 intriguing applications of the singular value decomposition and advocate that mathematical theory, e.g., Whitney's theorem, plays a central role in the Foundations of Data Science. |
Michael Kirby 🔗 |
Fri 8:45 a.m. - 9:00 a.m.
|
Sign and Basis Invariant Networks for Spectral Graph Representation Learning
(
Spotlight Presentation
)
link »
SlidesLive Video » We introduce SignNet and BasisNet—new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if v is an eigenvector then so is −v; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. We prove that our networks are universal, i.e., they can approximate any continuous function of eigenvectors with proper invariances. When used with Laplacian eigenvectors, our architectures are also theoretically expressive for graph representation learning, in that they can approximate any spectral graph convolution, can compute spectral invariants that go beyond message passing neural networks, and can provably simulate previously proposed graph positional encodings. Experiments show the strength of our networks for processing geometric data, in tasks including: molecular graph regression, learning expressive graph representations, and learning neural fields on triangle meshes. Our code is available at https://github.com/cptq/SignNet-BasisNet. |
Derek Lim · Joshua Robinson 🔗 |
Fri 9:00 a.m. - 10:30 a.m.
|
Lunch
|
🔗 |
Fri 10:30 a.m. - 11:30 a.m.
|
Equivariant Machine Learning with Classical Invariant Theory
(
Keynote
)
SlidesLive Video » There has been enormous progress in the last few years in designing neural networks that respect the fundamental symmetries and coordinate freedoms of physical law. Some of these frameworks make use of irreducible representations, some make use of high-order tensor objects, and some apply symmetry-enforcing constraints. Different physical laws obey different combinations of fundamental symmetries, but a large fraction (possibly all) of classical physics is equivariant to translation, rotation, reflection (parity), boost (relativity), units scalings, and permutations. Here we show that it is simple to parameterize universally approximating polynomial functions that are equivariant under these symmetries, or under the Euclidean, Lorentz, and Poincaré groups, at any dimensionality d. The key observation is that nonlinear O(d)-equivariant (and related-group-equivariant) functions can be universally expressed in terms of a lightweight collection of (dimensionless) scalars -- scalar products and scalar contractions of the scalar, vector, and tensor inputs. We complement our theory with numerical examples that show that the scalar-based method is simple, efficient, and scalable. |
Soledad Villar 🔗 |
Fri 11:30 a.m. - 11:45 a.m.
|
GALE: Globally Assessing Local Explanations
(
Spotlight Presentation
)
link »
SlidesLive Video » Local explainability methods – those which seek to generate an explanation for each prediction are increasingly prevalent. However, results from different local explainability methods are difficult to compare since they may be parameter dependant, unstable due to sampling variability, or in various scales and dimensions. We propose GALE, a topology-based framework to extract a simplified representation from a set of local explanations. GALE models the relationship between the explanation space and model predictions to generate a topological skeleton, which we use to compare local explanation outputs. We demonstrate that GALE can not only reliably identify differences between explainability techniques but also provides stable representations. Then, we show how our framework can be used to identify appropriate parameters for local explainability methods. Our framework is simple, does not require complex optimizations, and can be broadly applied to most local explanation methods. |
Peter Xenopoulos 🔗 |
Fri 11:45 a.m. - 12:15 p.m.
|
Break
|
🔗 |
Fri 12:15 p.m. - 1:15 p.m.
|
Recent Advances in Equivariant Learning
(
Keynote
)
SlidesLive Video » Originally inspired by convolutional neural networks in computer vision, equivariant networks have now emerged as a successful class of models in a wide variety of domains such as protein design, drug discovery, reinforcement learning, learning physics etc. The development of such networks however, requires a careful examination of the underlying symmetries/geometric structure of the problem. In the first part of this talk, I will give an overview of some theoretical results (including ongoing work) in the area that have unified and often guided some of the development of such networks. Then I will present an efficient and very general framework to construct such networks, with an example application via spherical image recognition, and summarizing some open questions. Next, I will discuss the construction of such networks in the context of partial symmetries (groupoids) and dynamical systems. Finally, I will very briefly discuss some work on quantifying the expressivity of equivariant networks and its implications for future network design. |
Shubhendu Trivedi 🔗 |
Fri 1:15 p.m. - 1:30 p.m.
|
Zeroth-order Topological Insights into Iterative Magnitude Pruning
(
Spotlight Presentation
)
link »
SlidesLive Video » Modern-day neural networks are famously large, yet also highly redundant and compressible; there exist numerous pruning strategies in the deep learning literature that yield over 90% sparser sub-networks of fully-trained, dense architectures while still maintaining their original accuracies. Amongst these many methods though – thanks to its conceptual simplicity, ease of implementation, and efficacy – Iterative Magnitude Pruning (IMP) dominates in practice and is the de facto baseline to beat in the pruning community. However, theoretical explanations as to why a simplistic method such as IMP works at all are few and limited. In this work, we leverage the notion of persistent homology to gain insights into the workings of IMP and show that it inherently encourages retention of those weights which preserve topological information in a trained network. Subsequently, we also provide bounds on how much different networks can be pruned while perfectly preserving their zeroth order topological features, and present a modified version of IMP to do the same. |
Aishwarya H. Balwani 🔗 |
Fri 1:30 p.m. - 1:45 p.m.
|
Closing Remarks and Transition to Poster Session
(
Transition to Poster Session
)
link »
SlidesLive Video » |
Tegan Emerson · Henry Kvinge · Tim Doster · Sarah Tymochko · Alexander Cloninger 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Poster Session
|
🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Fast Proximal Gradient Descent for Support Regularized Sparse Graph
(
Poster
)
Sparse graphs built by sparse representation has been demonstrated to be effective in clustering high-dimensional data. Albeit the compelling empirical performance, the vanilla sparse graph ignores the geometric information of the data by performing sparse representation for each datum separately. In order to obtain a sparse graph aligned with the local geometric structure of data, we propose a novel Support Regularized Sparse Graph, abbreviated as SRSG, for data clustering. SRSG encourages local smoothness on the neighborhoods of nearby data points by a well-defined support regularization term. We propose a fast proximal gradient descent method to solve the non-convex optimization problem of SRSG with the convergence matching the Nesterov's optimal convergence rate of first-order methods on smooth and convex objective function with Lipschitz continuous gradient. Extensive experimental results on various real data sets demonstrate the superiority of SRSG over other competing clustering methods. |
Dongfang Sun · Yingzhen Yang 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
The Shape of Words - topological structure in natural language data
(
Poster
)
This paper presents a novel method, based on the ideas from algebraic topology, for the analysis of raw natural language text. The paper introduces the notion of a word manifold - a simplicial complex, whose topology encodes grammatical structure expressed by the corpus. Results of experiments with a variety of natural and synthetic languages are presented, showing that the homotopy type of the word manifold is influenced by linguistic structure. The analysis includes a new approach to the Voynich Manuscript - an unsolved puzzle in corpus linguistics. In contrast to existing topological data analysis approaches, we do not rely on the apparatus of persistent homology. Instead, we develop a method of generating topological structure directly from strings of words. |
Stephen Fitz 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering
(
Poster
)
Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix).The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than 1) slowing convergence.This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute. |
Elise van der Pol · Ian Gemp · Yoram Bachrach · Richard Everett 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Multiresolution Matrix Factorization and Wavelet Networks on Graphs
(
Poster
)
Multiresolution Matrix Factorization (MMF) is unusual amongst fast matrix factorization algorithms in that it does not make a low rank assumption. This makes MMF especially well suited to modeling certain types of graphs with complex multiscale or hierarchical strucutre. While MMF promises to yields a useful wavelet basis, finding the factorization itself is hard, and existing greedy methods tend to be brittle. In this paper, we propose a "learnable'' version of MMF that carfully optimizes the factorization with a combination of reinforcement learning and Stiefel manifold optimization through backpropagating errors. We show that the resulting wavelet basis far outperforms prior MMF algorithms and provides the first version of this type of factorization that can be robustly deployed on standard learning tasks. Furthermore, we construct the wavelet neural networks (WNNs) learning graphs on the spectral domain with the wavelet basis produced by our MMF learning algorithm. Our wavelet networks are competitive against other state-of-the-art methods in molecular graphs classification and node classification on citation graphs. |
Truong Son Hy · Risi Kondor 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
A simple and universal rotation equivariant point-cloud network
(
Poster
)
Equivariance to permutations and rigid motions is an important inductive bias for various 3D learning problems. Recently it has been shown that the equivariant Tensor Field Network architecture is universal- it can approximate any equivariant function. In this paper we suggest a much simpler architecture, prove that it enjoys the same universality guarantees and evaluate its performance on Modelnet40.The code to reproduce our experiments is available at \url{https://github.com/simpleinvariance/UniversalNetwork} |
Ben Finkelshtein · Chaim Baskin · Haggai Maron · Nadav Dym 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Robust Graph Representation Learning for Local Corruption Recovery
(
Poster
)
Real-world graph observations may contain local corruptions by abnormal behaviors. While existing research usually pursues global smoothness in graph embedding, these rarely observed anomalies are harmful to an accurate prediction. This work establishes a graph learning scheme that automatically detects corrupted node attributes and recovers robust embedding for prediction tasks. The detection operation does not make any assumptions about the distribution of the local corruptions. It pinpoints the positions of the anomalous node attributes in an unbiased mask matrix, where robust estimations are recovered with sparsity promoting regularizer. We alleviate an inertial alternating direction method of multipliers to approach a new embedding that is sparse in the framelet domain and conditionally close to input observations. Extensive experiments validate the model recovers robust graph representations from black-box poisoning and achieves excellent performance. |
Bingxin Zhou · · Yu Guang Wang · Jingwei Liang · Junbin Gao · Shirui Pan · Xiaoqun Zhang 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Invariance-adapted decomposition and Lasso-type contrastive learning
(
Poster
)
Recent years have witnessed the effectiveness ofcontrastive learning in obtaining the representa-tion of dataset that is useful in interpretation anddownstream tasks. However, the mechanism thatdescribes this effectiveness have not been thor-oughly analyzed, and many studies have been con-ducted to investigate the data structures capturedby contrastive learning. In particular, the recentstudy of von K ̈ugelgen et al. (2021) has shownthat contrastive learning is capable of decompos-ing the data space into the space that is invariant toall augmentations and its complement. In this pa-per, we introduce the notion of invariance-adaptedlatent space that decomposes the data space intothe intersections of the invariant spaces of eachaugmentation and their complements. This de-composition generalizes the one introduced invon K ̈ugelgen et al. (2021), and describes a struc-ture that is analogous to the frequencies in theharmonic analysis of a group. We experimentallyshow that contrastive learning with lasso-type met-ric can be used to find an invariance-adapted latentspace, thereby suggesting a new potential for thecontrastive learning. We also investigate whensuch a latent space can be identified up to mixingswithin each component. |
Masanori Koyama · Takeru Miyato · Kenji Fukumizu 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
EXACT: How to Train Your Accuracy
(
Poster
)
Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, Hinge loss, or other surrogate losses, which can lead to suboptimal results.In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected accuracy, i.e. accuracy of the stochastic model. Extensive experiments on image classification show that the proposed optimization method is a powerful alternative to widely used classification losses. |
Ivan Karpukhin · Stanislav Dereka · Sergey Kolesnikov 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
On the Surprising Behaviour of node2vec
(
Poster
)
Graph embedding techniques are a staple of modern graph learning research. When using embeddings for downstream tasks such as classification, information about their stability and robustness, i.e. their susceptibility to sources of noise, stochastic effects, or specific parameter choices, becomes increasingly important. As one of the most prominent graph embedding schemes, we focus on node2vec and analyse its embedding quality under multiple aspects. Our findings indicate that embedding quality is unstable with respect to parameter choices, and we propose strategies to remedy this in practice. |
Celia Hacker · Bastian Rieck 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Sign and Basis Invariant Networks for Spectral Graph Representation Learning
(
Poster
)
We introduce SignNet and BasisNet---new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if v is an eigenvector then so is -v; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. We prove that our networks are universal, i.e., they can approximate any continuous function of eigenvectors with proper invariances. When used with Laplacian eigenvectors, our architectures are also theoretically expressive for graph representation learning, in that they can approximate any spectral graph convolution, can compute spectral invariants that go beyond message passing neural networks, and can provably simulate previously proposed graph positional encodings. Experiments show the strength of our networks for processing geometric data, in tasks including: molecular graph regression, learning expressive graph representations, and learning neural fields on triangle meshes. |
Derek Lim · Joshua Robinson · Lingxiao Zhao · Tess Smidt · Suvrit Sra · Haggai Maron · Stefanie Jegelka 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Riemannian Residual Neural Networks
(
Poster
)
Recent methods in geometric deep learning have introduced various neural networks to operate over data that lie on Riemannian manifolds. These methods are often inspired by and directly generalize standard Euclidean neural networks. However, extending Euclidean networks is difficult and has only been done for a select few manifolds. In this work, we examine the residual neural network (ResNet) and show how to extend this construction to general Riemannian manifolds in a geometrically principled manner. Originally introduced to help solve the vanishing gradient problem, ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks. We find that our Riemannian ResNets mirror these desirable properties: when compared to existing manifold neural networks designed to learn over hyperbolic space and the manifold of symmetric positive definite matrices, we outperform both kinds of networks in terms of relevant testing metrics and training dynamics. |
Isay Katsman · Eric Chen · Sidhanth Holalkere · Aaron Lou · Ser Nam Lim · Christopher De Sa 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
The PWLR graph representation: A Persistent Weisfeiler-Lehman scheme with Random walks for graph classification
(
Poster
)
This paper presents the Persistent Weisfeiler-Lehman Random walk scheme (abbreviated as PWLR) for graph representations, a novel mathematical framework which produces a collection of explainable low-dimensional representations of graphs with discrete and continuous node features. The proposed scheme effectively incorporates normalized Weisfeiler-Lehman procedure, random walks on graphs, and persistent homology. We thereby integrate three distinct properties of graphs, which are local topological features, node degrees, and global topological invariants, while preserving stability from graph perturbations. This generalizes many variants of Weisfeiler-Lehman procedures, which are primarily used to embed graphs with discrete node labels. Empirical results suggest that these representations can be efficiently utilized to produce comparable results to state-of-the-art techniques in classifying graphs with discrete node labels, and enhanced performances in classifying those with continuous node features. |
Sun Woo Park · YUN YOUNG CHOI · · U Jin Choi · Youngho Woo 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Higher-order Clustering and Pooling for Graph Neural Networks
(
Poster
)
Graph Neural Networks achieve state-of-the-art performance on a plethora of graph classification tasks, especially due to pooling operators, which aggregate learned node embeddings hierarchically into a final graph representation. However, they ignore completely higher-order connectivity patterns, although essential for GNNs. To tackle this issue, we propose HoscPool, a clustering-based graph pooling operator that captures higher-order information hierarchically, leading to richer graph representations. In fact, we learn a probabilistic cluster assignment matrix end-to-end by minimising a relaxed formulation of motif spectral clustering in our objective function, and then extend it to a pooling operator. We evaluate both HoscPool on graph classification tasks and its clustering component on graph data with ground-truth community structure, achieving increased performance on multiple datasets. |
ALEXANDRE DUVAL · Fragkiskos Malliaros 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Hypergraph Convolutional Networks via Equivalence Between Hypergraphs and Undirected Graphs
(
Poster
)
As a powerful tool for modeling complex relationships, hypergraphs are gaining popularity from the graph learning community. However, commonly used frameworks in deep hypergraph learning focus on hypergraphs with edge-independent vertex weights (EIVWs), without considering hypergraphs with edge-dependent vertex weights (EDVWs) that have more modeling power. To compensate for this, we present General Hypergraph Spectral Convolution (GHSC), a general learning framework that not only handles EDVW and EIVW hypergraphs, but more importantly, enables theoretically explicitly utilizing the existing powerful Graph Convolutional Neural Networks (GCNNs) such that largely ease the design of Hypergraph Neural Networks. In this framework, the graph Laplacian of the given undirected GCNNs is replaced with a unified hypergraphLaplacian that incorporates vertex weight information from a random walk perspective by equating our defined generalized hypergraphs with simple undirected graphs. Extensive experiments from various domains including social network analysis, visual objective classification, and protein learning demonstrate the state-of-the-art performance of the proposed framework. |
Jiying Zhang · fuyang li · Xi Xiao · Tingyang Xu · Yu Rong · Junzhou Huang · Yatao Bian 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
A Geometrical Approach to Finding Difficult Examples in Language
(
Poster
)
A growing body of evidence has suggested that metrics like accuracy overestimate the classifier's generalization ability. Several state-of-the-art NLP classifiers like BERT and LSTM rely on superficial cue words(e.g., if a movie review has the word “romantic”, the review tends to be positive), or unnecessary words (e.g., learning a proper noun to classify a movie as positive or negative). One approach to test NLP classifiers for such fragilities is analogous to how teachers discover gaps in a student's understanding: by finding problems where small perturbations confuse the student. While several perturbation strategies like contrast sets or random word substitutions have been proposed, they are typically based on heuristics and/or require expensive human involvement. In this work, using tools from information geometry, we propose a principled way to quantify the fragility of an example for an NLP classifier. By discovering such fragile examples for several state-of-the-art NLP models like BERT, LSTM, and CNN, we demonstrate their susceptibility to meaningless perturbations like noun/synonym substitution, causing their accuracy to drop down to 20 percent in some cases. Our approach is simple, architecture agnostic, and can be used to study the fragilities of text classification models. All the code used will be made publicly available, including a tool to explore the fragile examples for multiple datasets and models. |
Debo Datta · Shashwat Kumar · Laura Barnes · P. Thomas Fletcher 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Rethinking Persistent Homology for Visual Recognition
(
Poster
)
Persistent topological properties of an image serve as an additional descriptor providing an insight that might not be discovered by traditional neural networks. The existing research in this area focuses primarily on efficiently integrating topological properties of the data in the learning process in order to enhance the performance. However, there is no existing study to demonstrate all possible scenarios where introducing topological properties can boost or harm the performance. This paper performs a detailed analysis of the effectiveness of topological properties for image classification in various training scenarios, defined by: the number of training samples, the complexity of the training data and the complexity of the backbone network.We identify the scenarios that benefit the most from topological features, e.g., training simple networks on small datasets. Additionally, we discuss the problem of topological consistency of the datasets which is one of the major bottlenecks for using topological features for classification. We further demonstrate how the topological inconsistency can harm the performance for certain scenarios. |
Ekaterina Khramtsova · Guido Zuccon · Xi Wang · Mahsa Baktashmotlagh 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Sheaf Neural Networks with Connection Laplacians
(
Poster
)
A Sheaf Neural Network (SNN) is a type of Graph Neural Network (GNN) that operates on a sheaf, an object that equips a graph with vector spaces over its nodes and edges and linear maps between these spaces. SNNs have been shown to have useful theoretical properties that help tackle issues arising from heterophily and over-smoothing. One complication intrinsic to these models is finding a good sheaf for the task to be solved. Previous works proposed two diametrically opposed approaches: manually constructing the sheaf based on domain knowledge and learning the sheaf end-to-end using gradient-based methods. However, domain knowledge is often insufficient, while learning a sheaf could lead to overfitting and significant computational overhead. In this work, we propose a novel way of computing sheaves drawing inspiration from Riemannian geometry: we leverage the manifold assumption to compute manifold-and-graph-aware orthogonal maps, which optimally align the tangent spaces of neighbouring data points. We show that this approach achieves promising results with less computational overhead when compared to previous SNN models. Overall, this work provides an interesting connection between algebraic topology and differential geometry, and we hope that it will spark future research in this direction. |
Federico Barbero · Cristian Bodnar · Haitz Sáez de Ocáriz Borde · Michael Bronstein · Petar Veličković · Pietro Lió 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Score Matching for Truncated Density Estimation of Spherical Distributions
(
Poster
)
When observations are truncated, we are limited to an incomplete picture of our dataset. Recent methods deal with truncated density estimation problems by turning to score matching, where the access to the intractable normalising constant is not required. We present a novel extension to truncated score matching for a Riemannian manifold. Applications are presented for the von Mises-Fisher and Kent distributions on a two dimensional sphere in $\R^3$, as well as a real-world application of extreme storm observations in the USA. In simulated data experiments, our score matching estimator is able to approximate the true parameter values with a low estimation error and shows improvements over a maximum likelihood estimator.
|
Daniel Williams · Song Liu 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Local distance preserving autoencoders using continuous kNN graphs
(
Poster
)
In this paper, we introduce several auto-encoder models that preserve local distances in the latent space. We use a local distance preserving loss that is based on the continuous k-nearest neighbour graph which is known to capture topological features at all scales simultaneously. To improve training performance, we formulate learning as a constraint optimisation problem with local distance preservation as the main objective and reconstruction accuracy as a constraint. We generalise this approach to hierarchical variational auto-encoders thus learning generative models with geometrically consistent latent and data spaces. Our method provides state-of-the-art performance across several standard datasets and evaluation metrics. |
Nutan Chen · Patrick van der Smagt · Botond Cseke 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Geometric Properties of Graph Convolutional Networks from the Perspective of Sheaves and the Neural Tangent Kernel
(
Poster
)
Graph convolutional networks are a popular class of deep neural network algorithms which have shown success in a number of relational learning tasks. Despite their success, graph convolutional networks exhibit a number of peculiar features, including a bias towards learning oversmoothed and homophilic functions, which are not easily diagnosed due to the complex nature of these algorithms. We propose to bridge this gap in understanding by studying the neural tangent kernel of sheaf convolutional networks--a topological generalization of graph convolutional networks. To this end, we derive a parameterization of the neural tangent kernel for sheaf convolutional networks which separates the function into two parts: one driven by a forward diffusion process determined by the graph, and the other determined by the composite effect of nodes' activations on the output layer. This geometrically-focused derivation produces a number of immediate insights which we discuss in detail. |
Thomas Gebhart 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Riemannian CUR Decompositions for Robust Principal Component Analysis
(
Poster
)
Robust Principal Component Analysis (PCA) has received massive attention in recent years. It aims to recover a low-rank matrix and a sparse matrix from their sum. This paper proposes a novel nonconvex Robust PCA algorithm, coined Riemannian CUR (RieCUR), which utilizes the ideas of Riemannian optimization and robust CUR decompositions. This algorithm has the same computational complexity as Iterated Robust CUR, which is currently state-of-the-art, but is more robust to outliers. RieCUR is also able to tolerate a significant amount of outliers, and is comparable to Accelerated Alternating Projections, which has high outlier tolerance but worse computational complexity than the proposed method. Thus, the proposed algorithm achieves state-of-the-art performance on Robust PCA both in terms of computational complexity and outlier tolerance. |
Keaton Hamm · Mohamed Meskini · HanQin Cai 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
For Manifold Learning, Deep Neural Networks can be Locality Sensitive Hash Functions
(
Poster
)
It is well established that training deep neural networks gives useful representations that capture essential features of the inputs. However, these representations are poorly understood in theory and practice. An important question for supervised learning is whether these representations capture features informative for classification, while filtering out non-informative noisy ones. We study this question formally by considering a generative process where each class is associated with a high-dimensional manifold and different classes define different manifolds. Each input of a class is produced using two latent vectors: (i) a ``manifold identifier" $\gamma$ and; (ii)~a ``transformation parameter" $\theta$ that shifts examples along the surface of a manifold. E.g., $\gamma$ might represent a canonical image of a dog, and $\theta$ might stand for variations in pose or lighting. We provide theoretical evidence that neural representations can be viewed as LSH-like functions that map each input to an embedding that is a function of solely the informative $\gamma$ and invariant to $\theta$, effectively recovering the manifold identifier $\gamma$. We prove that we get one-shot learning to unseen classes as one of the desirable consequences of this behavior.
|
Nishanth Dikkala · Gal Kaplun · Rina Panigrahy 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Nearest Class-Center Simplification through Intermediate Layers
(
Poster
)
Recent advances in theoretical Deep Learning have introduced geometric properties that occur during training, past the Interpolation Threshold- where the training error reaches zero. We inquire into the phenomena coined \emph{Neural Collapse} in the intermediate layers of the networks, and emphasize the innerworkings of Nearest Class-Center Mismatch inside the deepnet. We further show that these processes occur both in vision and language model architectures. Lastly, we propose a Stochastic Variability-Simplification Loss (SVSL) that encourages better geometrical features in intermediate layers, and improves both train metrics and generalization. |
Ido Ben Shaul · Shai Dekel 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Deoscillated Adaptive Graph Collaborative Filtering
(
Poster
)
Collaborative Filtering~(CF) signals are crucial for a Recommender System~(RS) model to learn user and item embeddings. High-order information can alleviate the cold-start issue of CF-based methods, which is modeled through propagating the information over the user-item bipartite graph. Recent Graph Neural Networks~(GNNs) propose to stack multiple aggregation layers to propagate high-order signals. However, there are three challenges, the oscillation problem, varying locality of bipartite graphs, and the fixed propagation pattern, which spoil the ability of the multi-layer structure to propagate information.In this paper, we theoretically prove the existence and boundary of the oscillation problem, and empirically study the varying locality and layer-fixed propagation problems. We propose a new RS model, named as \textbf{D}eoscillated adaptive \textbf{G}raph \textbf{C}ollaborative \textbf{F}iltering~(DGCF), which is constituted by stacking multiple CHP layers and LA layers.We conduct extensive experiments on real-world datasets to verify the effectiveness of DGCF. Detailed analyses indicate that DGCF solves oscillation problems, adaptively learns local factors, and has layer-wise propagation patterns. |
Zhiwei Liu · Lin Meng · Fei Jiang · Jiawei Zhang · Philip Yu 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Robust Lp-Norm Linear Discriminant Analysis with Proxy Matrix Optimization
(
Poster
)
Linear Discriminant Analysis (LDA) is an established supervised dimensionality reduction method that is traditionally based on the L2-norm. However, the standard L2-norm LDA is susceptible to outliers in the data that often contribute to a drop in accuracy. Using the L1 or fractional p-norms makes LDA more robust to outliers, but it is a harder problem to solve due to the nature of the corresponding objective functions. In this paper, we leverage the orthogonal constraint of the Grassmann manifold to iteratively obtain the optimal projection matrix for the data in a lower dimensional space. Instead of optimizing the matrix directly on the manifold, we use the proxy matrix optimization (PMO) method, utilizing an auxiliary matrix in ambient space that is retracted to the closest location on the manifold along the loss minimizing geodesic. The Lp-LDA-PMO learning is based on backpropagation, which allows easy integration in a neural network and flexibility to change the value of the p-norm. Our experiments on synthetic and real data show that using fractional p-norms for LDA leads to an improvement in accuracy compared to the traditional L2-based LDA. |
Navya Nagananda · Breton Minnehan · Andreas Savakis 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
A Topological characterisation of Weisfeiler-Leman equivalence classes
(
Poster
)
Graph Neural Networks (GNNs) are learning models aimed at processing graphs and signals on graphs. The most popular and successful GNNs are based on message passing schemes. Such schemes inherently have limited expressive power when it comes to distinguishing two non-isomorphic graphs. In this article, we rely on the theory of covering spaces to fully characterize the classes of graphs that GNNs cannot distinguish. We then generate arbitrarily many non-isomorphic graphs that cannot be distinguished by GNNs, leading to the GraphCovers dataset. We also show that the number of indistinguishable graphs in our dataset grows super-exponentially with the number of nodes. Finally, we test the GraphCovers dataset on several GNN architectures, showing that none of them can distinguish any two graphs it contains. |
Jacob Bamberger 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
GALE: Globally Assessing Local Explanations
(
Poster
)
Local explainability methods -- those which seek to generate an explanation for each prediction -- are increasingly prevalent. However, results from different local explainability methods are difficult to compare since they may be parameter-dependant, unstable due to sampling variability, or in various scales and dimensions. We propose GALE, a topology-based framework to extract a simplified representation from a set of local explanations. GALE models the relationship between the explanation space and model predictions to generate a topological skeleton, which we use to compare local explanation outputs. We demonstrate that GALE can not only reliably identify differences between explainability techniques but also provides stable representations. Then, we show how our framework can be used to identify appropriate parameters for local explainability methods. Our framework is simple, does not require complex optimizations, and can be broadly applied to most local explanation methods. |
Peter Xenopoulos · Gromit Yeuk-Yin Chan · Harish Doraiswamy · Luis Nonato · Brian Barr · Claudio Silva 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Neural Geometric Embedding Flows
(
Poster
)
Embedding flows are novel normalizing flows which relax the bijectivity requirement while retaining many benefits such as left invertibility and exact log-likelihoods. As such, they can learn distributions residing on lower-dimensional submanifolds. However, previous research models data with simple normalizing flows like RealNVP applied to Euclidean subspace, ignoring much of the core geometric machinery and resulting in subpar modelling and pathological behavior. In this paper, we address these issues by connecting embedding flows and the field of extrinsic geometric flows. Using these insights, we introduce two geometrically-motivated embedding flows. First, to partially overcome topological mismatch, we show how to apply these models to arbitrary submanifolds. Second, we construct a continuous time embedding flow and show that the increased expressivity produces more accurate results. |
Aaron Lou · Yang Song · Jiaming Song · Stefano Ermon 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Neural Implicit Manifold Learning for Topology-Aware Generative Modelling
(
Poster
)
Natural data observed in R^n is often constrained to an m-dimensional manifold M, where m < n. Current generative models represent this manifold by mapping an m-dimensional latent variable through a neural network f : R^m → R^n. Such procedures, which we call pushforward models, incur a straightforward limitation:manifolds cannot in general be represented with a single parameterization, meaning that attempts to do so will incur either computational instability or the inability to learn probability densities within the manifold. To remedy this problem, we propose to model M as a "neural implicit manifold": the set of zeros of a neural network. To learn the data distribution within M, we introduce the "constrained energy-based model," which uses a constrained variant of Langevin dynamics to train and sample within the learned manifold. The resulting model can be manipulated with an "arithmetic of manifolds" which allows practitioners to take unions and intersections of model manifolds. In experiments on synthetic and natural data, we show that constrained EBMs can learn manifold-supported distributions with complex topologies more accurately than pushforward models. |
Brendan Ross · Gabriel Loaiza-Ganem · Anthony Caterini · Jesse Cresswell 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Geodesic Properties of a Generalized Wasserstein Embedding for Time Series Analysis
(
Poster
)
Transport-based metrics and related embeddings (transforms) have recently been used to model signal classes where nonlinear structures or variations are present. In this paper, we study the geodesic properties of time series data with a generalized Wasserstein metric and the geometry related to their signed cumulative distribution transforms in the embedding space. Moreover, we show how understanding such geometric characteristics can provide added interpretability to certain time series classifiers, and be an inspiration for more robust classifiers. |
Shiying Li · Abu Hasnat Mohammad Rubaiyat · Gustavo Rohde 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Evaluating Disentanglement in Generative Models Without Knowledge of Latent Factors
(
Poster
)
Probabilistic generative models provide a flexible and systematic framework for learning the underlying geometry of data. However, model selection in this setting is challenging, particularly when selecting for ill-defined qualities such as disentanglement or interpretability. In this work, we address this gap by introducing a method for ranking generative models based on the training dynamics exhibited during learning. Our method is inspired by recent theoretical characterizations of disentanglement. Furthermore, our method does not require supervision of the underlying latent factors. We evaluate our approach by demonstrating the need for disentanglement metrics which do not require labels\textemdash the underlying generative factors. We additionally demonstrate that our approach correlates with baseline supervised methods for evaluating disentanglement. Finally, we show that our method can be used as an unsupervised indicator for downstream performance on simple reinforcement learning and fairness-classification problems. |
Chester Holtz · Gal Mishne · Alexander Cloninger 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
The Power of Recursion in Graph Neural Networks for Counting Substructures
(
Poster
)
To achieve a graph representation, most Graph Neural Networks (GNNs) follow two steps: first, each graph is decomposed into a number of subgraphs (which we call the recursion step), and then the collection of subgraphs is encoded. While recently proposed higher-order networks show a remarkable increase in the expressive power through the idea of deeper neighborhoods (i.e., deeper recursion), the power of (deeper) recursion in GNNs is still not fully understood. In this paper, we study a fundamental question: what is the power of recursion by itself, without the use of any other pooling operation? To make it concrete, we consider a pure recursion-based GNN which we call Recursive Neighborhood Pooling GNN (RNP-GNN). The expressive power of an RNP-GNN and its computational cost quantifies the (purified) power of recursion for a graph representation network. We quantify the power by means of counting substructures which is one main limitation of the Message Passing graph Neural Networks (MPNNs), and show how they can exploit the sparsity of the underlying graph to achieve low-cost powerful representations. We also compare the recent lower bounds on the time complexity and show how recursive-based networks are near optimal. |
Behrooz Tahmasebi · Derek Lim · Stefanie Jegelka 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
The Manifold Scattering Transform for High-Dimensional Point Cloud Data
(
Poster
)
The manifold scattering transform is a multiscalefeature extractor for data defined on a Riemannianmanifold. The initial work on this model focusedprimarily on its theoretical stability and invarianceproperties but did not provide methods forits numerical implementation except in the case oftwo-dimensional surfaces with predefined meshes.In this work, we present practical schemes, basedon the theory of diffusion maps, for implementingthe manifold scattering transform to datasets arisingin, e.g., single cell genetics, where the datais a high-dimensional point cloud modeled as lyingon a low-dimensional manifold. We will thenshow that our methods are effective for signalclassification and manifold classification tasks. |
Joyce Chew · Holly Steach · Siddharth Viswanath · Deanna Needell · Smita Krishnaswamy · Michael Perlmutter 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Zeroth-Order Topological Insights into Iterative Magnitude Pruning
(
Poster
)
Modern-day neural networks are famously large, yet also highly redundant and compressible; there exist numerous pruning strategies in the deep learning literature that yield over 90% sparser sub-networks of fully-trained, dense architectures while still maintaining their original accuracies. Amongst these many methods though – thanks to its conceptual simplicity, ease of implementation, and efficacy – Iterative Magnitude Pruning (IMP) dominates in practice and is the de facto baseline to beat for the present-day deep neural network pruning community. However, theoretical explanations as to why a simplistic method such as IMP works at all are few and limited. In this work, we leverage the notion of persistent homology to gain insights into the workings of IMP and show that it inherently encourages retention of those weights which preserve topological information in a trained network. Subsequently, we also provide bounds on how much different networks can be pruned while perfectly preserving their zeroth order topological features, and present a modified version of IMP to do the same. |
Aishwarya H. Balwani · Jakob Krzyston 🔗 |
Fri 1:45 p.m. - 3:00 p.m.
|
Approximate Equivariance SO(3) Needlet Convolution
(
Poster
)
This paper develops a rotate-invariant needlet convolution for rotation group SO(3) to distill multiscale information of spherical signals. The spherical needlet transform is generalized from S2 onto a SO(3) group, which decomposes a spherical signal to approximate and detailed spectral coefficients by a set of tight framelet operators. The spherical signal during the decomposition and reconstruction achieves rotation invariance. Based on needlet transforms, we form a Needlet approximate Equivariance Spherical CNN (NES) with multiple SO(3) needlet convolution layers. The network establishes a powerful tool to extract geometric-invariant features of spherical signals. The model allows sufficient network scalability with multi-resolution representation. A robust signal embedding is learned with wavelet shrinkage activation function, which filters out redundant high-pass representation while maintaining approximate rotation invariance. The NES achieves state-of-the-art performance for quantum chemistry regression and Cosmic Microwave Background (CMB) delensing reconstruction, which shows great potential for solving scientific challenges with high-resolution and multi-scale spherical signal representation. |
Kai Yi · Yu Guang Wang · Bingxin Zhou · Pietro Lió · Yanan Fan · Jan Hamann 🔗 |