ICML 2017 Schedule

( events) Timezone: America/Los_Angeles

Poster

Tue Aug 08 01:30 AM -- 05:00 AM (PDT) @ Gallery #11

Learned Optimizers that Scale and Generalize

Olga Wichrowska · Niru Maheswaranathan · Matthew Hoffman · Sergio Gómez Colmenarejo · Misha Denil · Nando de Freitas · Jascha Sohl-Dickstein

[ PDF] [

Summary/Notes]

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse, optimization tasks capturing common properties of loss landscapes. The optimizer learns to outperform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset for thousands of steps, optimization problems that are of a vastly different scale than those it was trained on.