Spotlight
in
Workshop: Dynamic Neural Networks
Triangular Dropout: Variable Network Width without Retraining
Ted Staley · Jared Markowitz
One of the most fundamental choices in neural network design is layer width: it affects the capacity of what a network can learn and determines the complexity of the solution. The latter is often exploited when introducing information bottlenecks, forcing a network to learn compressed representations. Unfortunately, network architecture is typically immutable once training begins; switching to a more compressed architecture requires retraining. In this paper we present a new training strategy, Triangular Dropout, that allows effective compression without retraining. It provides for ordered removal of parameters by the user after training, enabling an explicit trade-off between performance and computational efficiency. We demonstrate the construction and utility of the approach through two examples. First, we formulate Triangular Dropout for autoencoders, creating models with configurable compression after training. Second, we apply Triangular Dropout to retrain the fully connected top layer of VGG19 on ImageNet. In both cases, we find only minimal degradation in the performance of the pruned network for even dramatic reductions in its number of parameters.