Track: Deep Learning (Neural Network Architectures) 9

Thu 12 July 7:00 - 7:20 PDT

Using Inherent Structures to design Lean 2-layer RBMs

Abhishek Bansal · Abhinav Anand · Chiranjib Bhattacharyya

Understanding the representational power of Restricted Boltzmann Machines (RBMs) with multiple layers is an ill-understood problem and is an area of active research. Motivated from the approach of Inherent Structure formalism (Stillinger & Weber, 1982), extensively used in analysing Spin Glasses, we propose a novel measure called Inherent Structure Capacity (ISC), which characterizes the representation capacity of a fixed architecture RBM by the expected number of modes of distributions emanating from the RBM with parameters drawn from a prior distribution. Though ISC is intractable, we show that for a single layer RBM architecture ISC approaches a finite constant as number of hidden units are increased and to further improve the ISC, one needs to add a second layer. Furthermore, we introduce Lean RBMs, which are multi-layer RBMs where each layer can have at-most O(n) units with the number of visible units being n. We show that for every single layer RBM with Omega(n^{2+r}), r >= 0, hidden units there exists a two-layered lean RBM with Theta(n^2) parameters with the same ISC, establishing that 2 layer RBMs can achieve the same representational power as single-layer RBMs but using far fewer number of parameters. To the best of our knowledge, this is the first result which quantitatively establishes the need for layering.

Thu 12 July 7:20 - 7:30 PDT

Deep Asymmetric Multi-task Feature Learning

Hae Beom Lee · Eunho Yang · Sung Ju Hwang

We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppressing the influences of unreliable predictors for more difficult tasks. This allows the learning of less noisy representations, and enables unreliable predictors to exploit knowledge from the reliable predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer.We validate our Deep-AMTFL model on multiple benchmark datasets for multitask learning and image classification, on which it significantly outperforms existing symmetric and asymmetric multitasklearning models, by effectively preventing negative transfer in deep feature learning.

Thu 12 July 7:30 - 7:40 PDT

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Yiping Lu · Aoxiao Zhong · Quanzheng Li · Bin Dong

Deep neural networks have become the state-of-the-art models in numerous machine learning tasks. However, general guidance to network architecture design is still missing. In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress (>50%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.

Thu 12 July 7:40 - 7:50 PDT

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples

Gail Weiss · Yoav Goldberg · Eran Yahav

We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin's \lstar algorithm as a learner and the trained RNN as an oracle. Our technique efficiently extracts accurate automata from trained RNNs, even when the state vectors are large and require fine differentiation.

Thu 12 July 7:50 - 8:00 PDT

High Performance Zero-Memory Overhead Direct Convolutions

Jiyuan Zhang · Franz Franchetti · Tze Meng Low

The computation of convolution layers in deep neural networkstypically rely on high performance routines that tradespace for time by using additional memory (either for packing purposesor required as part of the algorithm) to improve performance. Theproblems with such an approach are two-fold. First, these routinesincur additional memory overhead which reduces the overall size of thenetwork that can fit on embedded devices with limited memory capacity. Second, these high performance routines were not optimized for performing convolution, which means that the performance obtained is usually less than conventionally expected. Inthis paper, we demonstrate that direct convolution,when implemented correctly, eliminates allmemory overhead, and yields performance that isbetween 10% to 400% times better than existinghigh performance implementations of convolutionlayers on conventional and embedded CPU architectures.We also show that a high performancedirect convolution exhibits better scaling performance,i.e. suffers less performance drop, whenincreasing the number of threads.

Main Navigation

Session

Deep Learning (Neural Network Architectures) 9

Using Inherent Structures to design Lean 2-layer RBMs

Deep Asymmetric Multi-task Feature Learning

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples

High Performance Zero-Memory Overhead Direct Convolutions