Skip to yearly menu bar Skip to main content


Session

Deep Learning (Neural Network Architectures) 10

Abstract:
Chat is not available.

Thu 12 July 8:00 - 8:20 PDT

ContextNet: Deep learning for Star Galaxy Classification

Noble Kennamer · University of California David Kirkby · Alexander Ihler · University of California Francisco Javier Sanchez-Lopez

We present a framework to compose artificial neural networks in cases where the data cannot be treated as independent events. Our particular motivation is star galaxy classification for ground based optical surveys. Due to a turbulent atmosphere and imperfect instruments, a single image of an astronomical object is not enough to definitively classify it as a star or galaxy. Instead the context of the surrounding objects imaged at the same time need to be considered in order to make an optimal classification. The model we present is divided into three distinct ANNs: one designed to capture local features about each object, the second to compare these features across all objects in an image, and the third to make a final prediction for each object based on the local and compared features. By exploiting the ability to replicate the weights of an ANN, the model can handle an arbitrary and variable number of individual objects embedded in a larger exposure. We train and test our model on simulations of a large up and coming ground based survey, the Large Synoptic Survey Telescope (LSST). We compare to the state of the art approach, showing improved overall performance as well as better performance for a specific class of objects that is important for the LSST.

Thu 12 July 8:20 - 8:30 PDT

Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Mikolaj Binkowski · Gautier Marti · Philippe Donnat

We propose Significance-Offset Convolutional Neural Network, a deep convolutional network architecture for regression of multivariate asynchronous time series. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks. It involves an AR-like weighting system, where the final predictor is obtained as a weighted sum of adjusted regressors, while the weights are data-dependent functions learnt through a convolutional network. The architecture was designed for applications on asynchronous time series and is evaluated on such datasets: a hedge fund proprietary dataset of over 2 million quotes for a credit derivative index, an artificially generated noisy autoregressive series and UCI household electricity consumption dataset. The proposed architecture achieves promising results as compared to convolutional and recurrent neural networks.

Thu 12 July 8:30 - 8:40 PDT

Hierarchical Multi-Label Classification Networks

Jonatas Wehrmann · Ricardo Cerri · Rodrigo Barros

One of the most challenging machine learning problems is a particular case of data classification in which classes are hierarchically structured and objects can be assigned to multiple paths of the class hierarchy at the same time. This task is known as hierarchical multi-label classification (HMC), with applications in text classification, image annotation, and in bioinformatics problems such as protein function prediction. In this paper, we propose novel neural network architectures for HMC called HMCN, capable of simultaneously optimizing local and global loss functions for discovering local hierarchical class-relationships and global information from the entire class hierarchy while penalizing hierarchical violations. We evaluate its performance in 21 datasets from four distinct domains, and we compare it against the current HMC state-of-the-art approaches. Results show that HMCN substantially outperforms all baselines with statistical significance, arising as the novel state-of-the-art for HMC.

Thu 12 July 8:40 - 8:50 PDT

Nonparametric variable importance using an augmented neural network with multi-task learning

Jean Feng · Brian Williamson · Noah Simon · Marco Carone

In predictive modeling applications, it is often of interest to determine the relative contribution of subsets of features in explaining the variability of an outcome. It is useful to consider this variable importance as a function of the unknown, underlying data-generating mechanism rather than the specific predictive algorithm used to fit the data. In this paper, we connect these ideas in nonparametric variable importance to machine learning, and provide a method for efficient estimation of variable importance when building a predictive model using a neural network. We show how a single augmented neural network with multi-task learning simultaneously estimates the importance of many feature subsets, improving on previous procedures for estimating importance. We demonstrate on simulated data that our method is both accurate and computationally efficient, and apply our method to both a study of heart disease and for predicting mortality in ICU patients.

Thu 12 July 8:50 - 9:00 PDT

Knowledge Transfer with Jacobian Matching

Suraj Srinivas · Francois Fleuret

Classical distillation methods transfer representations from a teacher'' neural network to astudent'' network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.