Track: Representation Learning

Wed 12 June 16:00 - 16:20 PDT

Adversarially Learned Representations for Information Obfuscation and Inference

Martin A Bertran · Natalia Martinez Gil · Afroditi Papadaki · Qiang Qiu · Miguel Rodrigues · Galen Reeves · Guillermo Sapiro

Data collection and sharing are pervasive aspects of modern society. This process can either be voluntary, as in the case of a person taking a facial image to unlock his/her phone, or incidental, such as traffic cameras collecting videos on pedestrians. An undesirable side effect of these processes is that shared data can carry information about attributes that users might consider as sensitive, even when such information is of limited use for the task. It is therefore desirable for both data collectors and users to design procedures that minimize sensitive information leakage. Balancing the competing objectives of providing meaningful individualized service levels and inference while obfuscating sensitive information is still an open problem. In this work, we take an information theoretic approach that is implemented as an unconstrained adversarial game between Deep Neural Networks in a principled, data-driven manner. This approach enables us to learn domain-preserving stochastic transformations that maintain performance on existing algorithms while minimizing sensitive information leakage.

Wed 12 June 16:20 - 16:25 PDT

Adaptive Neural Trees

Ryutaro Tanno · Kai Arulkumaran · Daniel Alexander · Antonio Criminisi · Aditya Nori

Deep neural networks and decision trees operate on largely separate paradigms; typically, the former performs representation learning with pre-specified architectures, while the latter is characterised by learning hierarchies over pre-specified features with data-driven architectures. We unite the two via adaptive neural trees (ANTs), a model that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers). We demonstrate that, whilst achieving competitive performance on classification and regression datasets, ANTs benefit from (i) lightweight inference via conditional computation, (ii) hierarchical separation of features useful to the predictive task e.g. learning meaningful class associations, such as separating natural vs. man-made objects, and (iii) a mechanism to adapt the architecture to the size and complexity of the training dataset.

Wed 12 June 16:25 - 16:30 PDT

Connectivity-Optimized Representation Learning via Persistent Homology

Christoph Hofer · Roland Kwitt · Marc Niethammer · Mandar Dixit

We study the problem of learning representations with controllable connectivity properties. This is beneficial in situations when the imposed structure can be leveraged upstream. In particular, we control the connectivity of an autoencoder's latent space via a novel type of loss, operating on information from persistent homology. Under mild conditions, this loss is differentiable and we present a theoretical analysis of the properties induced by the loss. We choose one-class learning as our upstream task and demonstrate that the imposed structure enables informed parameter selection for modeling the in-class distribution via kernel density estimators. Evaluated on computer vision data, these one-class models exhibit competitive performance and, in a low sample size regime, outperform other methods by a large margin. Notably, our results indicate that a single autoencoder, trained on auxiliary (unlabeled) data, yields a mapping into latent space that can be reused across datasets for one-class learning.

Wed 12 June 16:30 - 16:35 PDT

Minimal Achievable Sufficient Statistic Learning

Milan Cvitkovic · Günther Koliander

We introduce Minimal Achievable Sufficient Statistic (MASS) Learning, a training objective for machine learning models whose minimizers are minimal sufficient statistics with respect to the class of functions being optimized over (e.g. deep networks). In deriving MASS Learning, we also introduce Conserved Differential Information (CDI), an information-theoretic quantity that - unlike standard mutual information - can be usefully applied to deterministically-dependent continuous random variables like the input and output of a deep network. In a series of experiments, we show that deep networks trained with MASS Learning match state–of–the–art performance on supervised learning, uncertainty quantification, and adversarial robustness benchmarks.

Wed 12 June 16:35 - 16:40 PDT

Learning to Route in Similarity Graphs

Dmitry Baranchuk · Dmitry Persiyanov · Anton Sinitsin · Artem Babenko

Recently similarity graphs became the leading paradigm for efficient nearest neighbor search, outperforming traditional tree-based and LSH-based methods. Similarity graphs perform the search via greedy routing: a query traverses the graph and in each vertex moves to the adjacent vertex that is the closest to this query. In practice, similarity graphs are often susceptible to local minima, when queries do not reach its nearest neighbors, getting stuck in suboptimal vertices. In this paper we propose to learn the routing function that overcomes local minima via incorporating information about the graph global structure. In particular, we augment the vertices of a given graph with additional representations that are learned to provide the optimal routing from the start vertex to the query nearest neighbor. By thorough experiments, we demonstrate that the proposed learnable routing successfully diminishes the local minima problem and significantly improves the overall search performance.

Wed 12 June 16:40 - 17:00 PDT

Invariant-Equivariant Representation Learning for Multi-Class Data

Ilya Feige

Representations learnt through deep neural networks tend to be highly informative, but opaque in terms of what information they learn to encode. We introduce an approach to probabilistic modelling that learns to represent data with two separate deep representations: an invariant representation that encodes the information of the class from which the data belongs, and an equivariant representation that encodes the symmetry transformation defining the particular data point within the class manifold (equivariant in the sense that the representation varies naturally with symmetry transformations). This approach is based primarily on the strategic routing of data through the two latent variables, and thus is conceptually transparent, easy to implement, and in-principle generally applicable to any data comprised of discrete classes of continuous distributions (e.g. objects in images, topics in language, individuals in behavioural data). We demonstrate qualitatively compelling representation learning and competitive quantitative performance, in both supervised and semi-supervised settings, versus comparable modelling approaches in the literature with little fine tuning.

Wed 12 June 17:00 - 17:05 PDT

Infinite Mixture Prototypes for Few-shot Learning

Kelsey Allen · Evan Shelhamer · Hanul Shin · Josh Tenenbaum

We propose infinite mixture prototypes to adaptively represent both simple and complex data distributions for few-shot learning. Our infinite mixture prototypes represent each class by a set of clusters, unlike existing prototypical methods that represent each class by a single cluster. By infer-ring the number of clusters, infinite mixture prototypes interpolate between nearest neighbor and prototypical representations, which improves ac-curacy and robustness in the few-shot regime. We show the importance of adaptive capacity for capturing complex data distributions such as alpha-bets, with 25% absolute accuracy improvements over prototypical networks, while still maintain-ing or improving accuracy on the standard Omniglot and mini-ImageNet benchmarks. In clustering labeled and unlabeled data by the same clustering rule, infinite mixture prototypes achieves state-of-the-art semi-supervised accuracy. As a further capability, we show that infinite mixture prototypes can perform purely unsupervised clustering, unlike existing prototypical methods.

Wed 12 June 17:05 - 17:10 PDT

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

Sami Abu-El-Haija · Bryan Perozzi · Amol Kapoor · Nazanin Alipourfard · Kristina Lerman · Hrayr Harutyunyan · Greg Ver Steeg · Aram Galstyan

In this work, we show that popular methods for semi-supervised learning with Graph Neural Networks (such as the Graph Convolutional Network) do not model and cannot learn a class of general neighborhood mixing relationships. To address this weakness, we propose a new model, MixHop, that can capture these difference relationships by learning mixed feature representations of neighbors at various distances. MixHop requires no additional memory or computational complexity, and outperforms challenging baselines on several graph datasets. In addition, we propose a sparsity regularization that allows us to visualize how the network prioritizes neighborhood information across different graph datasets. Our analysis of the learned parameters reveals that different datasets utilize neighborhood mixing in different ways

Wed 12 June 17:10 - 17:15 PDT

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

Xilai Li · Yingbo Zhou · Tianfu Wu · Richard Socher · Caiming Xiong

Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks. Despite recent remarkable progress in state-of-the-art deep learning, deep neural networks (DNNs) are still plagued with the catastrophic forgetting problem. This paper presents a conceptually simple yet general and effective framework for handling catastrophic forgetting in continual learning with DNNs. The proposed method consists of two components: a neural structure optimization component and a parameter learning and/or fine-tuning component. The former learns the best neural structure for the current task on top of the current DNN trained with previous tasks. It learns whether to reuse or adapt building blocks in the current DNN, or to create new ones if needed under the differentiable neural architecture search framework. The latter estimates parameters for newly introduced structures, and fine-tunes the old ones if preferred. By separating the explicit neural structure learning and the parameter estimation, not only is the proposed method capable of evolving neural structures in an intuitively meaningful way, but also shows strong capabilities of alleviating catestrophic forgetting in experiments. Furthermore, the proposed method outperforms all other baselines on the permuted MNIST dataset, the split CIFAR100 dataset and the Visual Domain Decathlon dataset in continual learning setting.

Main Navigation

Session

Representation Learning