Track: Deep Learning Algorithms

Tue 11 June 11:00 - 11:20 PDT

SelectiveNet: A Deep Neural Network with an Integrated Reject Option

Yonatan Geifman · Ran El-Yaniv

We consider the problem of selective prediction (also known as reject option) in deep neural networks, and introduce SelectiveNet, a deep neural architecture with an integrated reject option. Existing rejection mechanisms are based mostly on a threshold over the prediction confidence of a pre-trained network. In contrast, SelectiveNet is trained to optimize both classification (or regression) and rejection simultaneously, end-to-end. The result is a deep neural network that is optimized over the covered domain. In our experiments, we show a consistently improved risk-coverage trade-off over several well-known classification and regression datasets, thus reaching new state-of-the-art results for deep selective classification.

Tue 11 June 11:20 - 11:25 PDT

Manifold Mixup: Better Representations by Interpolating Hidden States

Vikas Verma · Alex Lamb · Christopher Beckham · Amir Najafi · Ioannis Mitliagkas · David Lopez-Paz · Yoshua Bengio

Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose \manifoldmixup{}, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. \manifoldmixup{} leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with \manifoldmixup{} learn flatter class-representations, that is, with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it empirically on practical situations, and connect it to the previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, \manifoldmixup{} improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.

Tue 11 June 11:25 - 11:30 PDT

Processing Megapixel Images with Deep Attention-Sampling Models

Angelos Katharopoulos · Francois Fleuret

Existing deep architectures cannot operate on very large signals such as megapixel images due to computational and memory constraints. To tackle this limitation, we propose a fully differentiable end-to-end trainable model that samples and processes only a fraction of the full resolution input image.

The locations to process are sampled from an attention distribution computed from a low resolution view of the input. We refer to our method as attention sampling and it can process images of several megapixels with a standard single GPU setup.

We show that sampling from the attention distribution results in an unbiased estimator of the full model with minimal variance, and we derive an unbiased estimator of the gradient that we use to train our model end-to-end with a normal SGD procedure.

This new method is evaluated on three classification tasks, where we show that it allows to reduce computation and memory footprint by an order of magnitude for the same accuracy as classical architectures. We also show the consistency of the sampling that indeed focuses on informative parts of the input images.

Tue 11 June 11:30 - 11:35 PDT

TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning

Sung Whan Yoon · Jun Seo · Jaekyun Moon

Handling previously unseen tasks after given only a few training examples continues to be a tough challenge in machine learning. We propose TapNets, a neural network augmented with task-adaptive projection for improved few-shot learning. Here, employing a meta-learning strategy with episode-based training, a network and a set of per-class reference vectors are learned slowly over widely varying tasks. At the same time, for every episode, features in the embedding space are linearly projected into a new space as a form of quick task-specific conditioning. Training loss is obtained based on a distance metric between the query and the reference vectors in the projection space. Excellent generalization results in this way. When tested on the Omniglot, miniImageNet and tieredImageNet datasets, we obtain state of the art classification accuracies under different few-shot scenarios.

Tue 11 June 11:35 - 11:40 PDT

Online Meta-Learning

Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine

A central capability of intelligent systems is the ability to continuously build upon previous experiences to speed up and enhance learning of new tasks. Two distinct research paradigms have studied this question. Meta-learning views this problem as learning a prior over model parameters that is amenable for fast adaptation on a new task, but typically assumes the set of tasks are available together as a batch. In contrast, online (regret based) learning considers a sequential setting in which problems are revealed one after the other, but conventionally train only a single model without any task-specific adaptation. This work introduces an online meta-learning problem setting, which merges ideas from both the aforementioned paradigms in order to better capture the spirit and practice of continual lifelong learning. We propose the follow the meta leader (FTML) algorithm which extends the MAML algorithm to this setting. Theoretically, this work provides an O(logT) regret guarantee with only an additional higher order smoothness assumption (in comparison to the standard online setting). Our experimental evaluation on three different large-scale tasks suggest that the proposed algorithm significantly outperforms alternatives based on traditional online learning approaches.

Tue 11 June 11:40 - 12:00 PDT

Training Neural Networks with Local Error Signals

Arild Nøkland · Lars Hiller Eidnes

Supervised training of neural networks for classification is typically performed with a global loss function. The loss function provides a gradient for the output layer, and this gradient is back-propagated to hidden layers to dictate an update direction for the weights. An alternative approach is to train the network with layer-wise loss functions. In this paper we demonstrate, for the first time, that layer-wise training can approach the state-of-the-art on a variety of image datasets. We use single-layer sub-networks and two different supervised loss functions to generate local error signals for the hidden layers, and we show that the combination of these losses help with optimization in the context of local learning. Using local errors could be a step towards more biologically plausible deep learning because the global error does not have to be transported back to hidden layers. A completely backprop free variant outperforms previously reported results among methods aiming for higher biological plausibility.

Tue 11 June 12:00 - 12:05 PDT

GMNN: Graph Markov Neural Networks

Meng Qu · Yoshua Bengio · Jian Tang

This paper studies semi-supervised object classification in relational data, which is a fundamental problem in relational data modeling. The problem has been extensively studied in the literature of both statistical relational learning (e.g. Relational Markov Networks) and graph neural networks (e.g. Graph Convolutional Networks). Statistical relational learning methods can effectively model the dependency of object labels through conditional random fields for collective classification, whereas graph neural networks learn effective object representations for classification through end-to-end training. In this paper, we propose Graph Markov Neural Network (GMNN) that combines the advantages of both worlds. GMNN models the joint distribution of object labels with a conditional random field, which can be effectively trained with the variational EM algorithm. In the E-step, one graph neural network learns effective object representations for approximating the posterior distributions of object labels. In the M-step, another graph neural network is used to model the local label dependency. Experiments on the tasks of object classification, link classification, and unsupervised node representation learning show that GMNN achieves state-of-the-art results.

Tue 11 June 12:05 - 12:10 PDT

Self-Attention Graph Pooling

Junhyun Lee · Inyeop Lee · Jaewoo Kang

Advanced methods of applying deep learning to structured data such as graphs have been proposed in recent years. In particular, studies have focused on generalizing convolutional neural networks to graph data, which includes redefining the convolution and the downsampling (pooling) operations for graphs. The method of generalizing the convolution operation to graphs has been proven to improve performance and is widely used. However, the method of applying downsampling to graphs is still difficult to perform and has room for improvement. In this paper, we propose a graph pooling method based on self-attention. Self-attention using graph convolution allows our pooling method to consider both node features and graph topology. To ensure a fair comparison, the same training procedures and model architectures were used for the existing pooling methods and our method. The experimental results demonstrate that our method achieves superior graph classification performance on the benchmark datasets using a reasonable number of parameters.

Tue 11 June 12:10 - 12:15 PDT

Combating Label Noise in Deep Learning using Abstention

Sunil Thulasidasan · Tanmoy Bhattacharya · Jeff Bilmes · Gopinath Chennupati · Jamal Mohd-Yusof

We introduce a novel method to combat label noise when training deep neural networks for classification. We propose a loss function that permits abstention during training thereby allowing the DNN to abstain on confusing samples while continuing to learn and improve classification performance on the non-abstained samples. We show how such a deep abstaining classifier (DAC) can be used for robust learning in the presence of different types of label noise. In the case of structured or systematic label noise -- where noisy training labels or confusing examples are correlated with underlying features of the data-- training with abstention enables representation learning for features that are associated with unreliable labels. In the case of unstructured (arbitrary) label noise, abstention during training enables the DAC to be used as a very effective data cleaner by identifying samples that are likely to have label noise. We provide analytical results on the loss function behavior that enable dynamic adaption of abstention rates based on learning progress during training. We demonstrate the utility of the deep abstaining classifier for various image classification tasks under different types of label noise; in the case of arbitrary label noise, we show significant improvements over previously published results on multiple image benchmarks.

Tue 11 June 12:15 - 12:20 PDT

LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning

Huaiyu Li · Weiming Dong · Xing Mei · Chongyang Ma · Feiyue Huang · Bao-Gang Hu

In this paper, we propose a novel meta learning approach, namely LGM-Net, for few-shot classification. The approach learns transferable prior knowledge across tasks and directly produces network parameters for similar unseen tasks with training samples. LGM-Net includes two key modules: TargetNet and MetaNet. The TargetNet module is a neural network for solving a specific task. The MetaNet module aims at learning to generate functional weights for TargetNet by observing training samples. A new intertask normalization strategy which makes use of common information shared across tasks is utilized during training. Experimental results demonstrate that LGM-Net adapts well to similar unseen tasks and achieves state-of-the-art performance on Omniglot and \textit{mini}ImageNet datasets. And experiments on synthetic datasets are given to show that the transferable prior knowledge is learned by the MetaNet which can help to solve unseen tasks through mapping training data to functional weights. The proposed approach achieves the goal of fast learning and adaptation since no further tuning steps are required in comparison with other exisiting meta learning approaches.

Main Navigation

Session

Deep Learning Algorithms