Timezone: »
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e.,\ learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to support arbitrary optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.
Author Information
Sang Keun Choe (Carnegie Mellon University)
Sanket Vaibhav Mehta (Carnegie Mellon University)
Hwijeen Ahn (Carnegie Mellon University)
Willie Neiswanger (Stanford University)
Pengtao Xie (UC San Diego)
Emma Strubell (Carnegie Mellon University)
Eric Xing (Petuum Inc. and CMU)
More from the Same Authors
-
2021 : Synthetic Benchmarks for Scientific Research in Explainable Machine Learning »
· Yang Liu · Colin White · Willie Neiswanger -
2021 : Towards Principled Disentanglement for Domain Generalization »
Hanlin Zhang · Yi-Fan Zhang · Weiyang Liu · Adrian Weller · Bernhard Schölkopf · Eric Xing -
2023 : Adapting to Gradual Distribution Shifts with Continual Weight Averaging »
Jared Fernandez · Saujas Vaduguru · Sanket Vaibhav Mehta · Yonatan Bisk · Emma Strubell -
2023 : Counterfactual Generation with Identifiability Guarantees »
Hanqi Yan · Lingjing Kong · Lin Gui · Yuejie Chi · Eric Xing · Yulan He · Kun Zhang -
2023 : Identification of Nonlinear Latent Hierarchical Causal Models »
Lingjing Kong · Biwei Huang · Feng Xie · Eric Xing · Yuejie Chi · Kun Zhang -
2023 : The Framework Tax: Disparities Between Inference Efficiency in Research and Deployment »
Jared Fernandez · Jacob Kahn · Clara Na · Yonatan Bisk · Emma Strubell -
2023 : Dissecting Efficient Architectures for Wake-Word Detection »
Cody Berger · Juncheng Li · Yiyuan Li · Aaron Berger · Dmitri Berger · Karthik Ganesan · Emma Strubell · Florian Metze -
2023 : Conditional Diffusion Replay for Continual Learning in Medical Settings »
Yewon Byun · Saurabh Garg · Sanket Vaibhav Mehta · Praveer Singh · Jayashree Kalpathy-cramer · Bryan Wilder · Zachary Lipton -
2023 : Prompt-based Generative Replay: A Text-to-Image Approach for Continual Learning in Medical Settings »
Yewon Byun · Saurabh Garg · Sanket Vaibhav Mehta · Jayashree Kalpathy-Cramer · Praveer Singh · Bryan Wilder · Zachary Lipton -
2023 Workshop: Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators »
Felix Petersen · Marco Cuturi · Mathias Niepert · Hilde Kuehne · Michael Kagan · Willie Neiswanger · Stefano Ermon -
2023 Poster: Improving Bi-level Optimization Based Methods with Inspiration from Humans' Classroom Study Techniques »
Pengtao Xie -
2023 Poster: Fair and Accurate Decision Making through Group-Aware Learning »
Ramtin Hosseini · Li Zhang · Bhanu Garg · Pengtao Xie -
2023 Poster: Learning Compiler Pass Orders using Coreset and Normalized Value Prediction »
Youwei Liang · Kevin Stone · Ali Shameli · Chris Cummins · Mostafa Elhoushi · Jiadong Guo · Benoit Steiner · Xiaomeng Yang · Pengtao Xie · Hugh Leather · Yuandong Tian -
2022 Workshop: The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward »
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn -
2022 : Opening Remarks »
Willie Neiswanger · Mojmir Mutny · Ilija Bogunovic -
2022 Workshop: Adaptive Experimental Design and Active Learning in the Real World »
Mojmir Mutny · Willie Neiswanger · Ilija Bogunovic · Stefano Ermon · Yisong Yue · Andreas Krause -
2022 Poster: Graph Neural Architecture Search Under Distribution Shifts »
Yijian Qin · Xin Wang · Ziwei Zhang · Pengtao Xie · Wenwu Zhu -
2022 Spotlight: Graph Neural Architecture Search Under Distribution Shifts »
Yijian Qin · Xin Wang · Ziwei Zhang · Pengtao Xie · Wenwu Zhu -
2022 Poster: SDQ: Stochastic Differentiable Quantization with Mixed Precision »
Xijie Huang · Zhiqiang Shen · Shichao Li · Zechun Liu · Hu Xianghong · Jeffry Wicaksana · Eric Xing · Kwang-Ting Cheng -
2022 Poster: A General Recipe for Likelihood-free Bayesian Optimization »
Jiaming Song · Lantao Yu · Willie Neiswanger · Stefano Ermon -
2022 Oral: A General Recipe for Likelihood-free Bayesian Optimization »
Jiaming Song · Lantao Yu · Willie Neiswanger · Stefano Ermon -
2022 Spotlight: SDQ: Stochastic Differentiable Quantization with Mixed Precision »
Xijie Huang · Zhiqiang Shen · Shichao Li · Zechun Liu · Hu Xianghong · Jeffry Wicaksana · Eric Xing · Kwang-Ting Cheng -
2022 Poster: Modular Conformal Calibration »
Charles Marx · Shengjia Zhao · Willie Neiswanger · Stefano Ermon -
2022 Spotlight: Modular Conformal Calibration »
Charles Marx · Shengjia Zhao · Willie Neiswanger · Stefano Ermon -
2021 Workshop: Self-Supervised Learning for Reasoning and Perception »
Pengtao Xie · Shanghang Zhang · Ishan Misra · Pulkit Agrawal · Katerina Fragkiadaki · Ruisi Zhang · Tassilo Klein · Asli Celikyilmaz · Mihaela van der Schaar · Eric Xing -
2021 : Oral3 »
Sanket Vaibhav Mehta -
2021 : Invited Talk: Eric P. Xing. A Data-Centric View for Composable Natural Language Processing. »
Eric Xing -
2021 Workshop: Interpretable Machine Learning in Healthcare »
Yuyin Zhou · Xiaoxiao Li · Vicky Yao · Pengtao Xie · DOU QI · Nicha Dvornek · Julia Schnabel · Judy Wawira · Yifan Peng · Ronald Summers · Alan Karthikesalingam · Lei Xing · Eric Xing -
2019 Workshop: Adaptive and Multitask Learning: Algorithms & Systems »
Maruan Al-Shedivat · Anthony Platanios · Otilia Stretcu · Jacob Andreas · Ameet Talwalkar · Rich Caruana · Tom Mitchell · Eric Xing -
2019 Workshop: Learning and Reasoning with Graph-Structured Representations »
Ethan Fetaya · Zhiting Hu · Thomas Kipf · Yujia Li · Xiaodan Liang · Renjie Liao · Raquel Urtasun · Hao Wang · Max Welling · Eric Xing · Richard Zemel -
2019 Poster: Theoretically Principled Trade-off between Robustness and Accuracy »
Hongyang Zhang · Yaodong Yu · Jiantao Jiao · Eric Xing · Laurent El Ghaoui · Michael Jordan -
2019 Oral: Theoretically Principled Trade-off between Robustness and Accuracy »
Hongyang Zhang · Yaodong Yu · Jiantao Jiao · Eric Xing · Laurent El Ghaoui · Michael Jordan -
2018 Poster: Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis »
Pengtao Xie · Wei Wu · Yichen Zhu · Eric Xing -
2018 Poster: Transformation Autoregressive Networks »
Junier Oliva · Kumar Avinava Dubey · Manzil Zaheer · Barnabás Póczos · Ruslan Salakhutdinov · Eric Xing · Jeff Schneider -
2018 Oral: Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis »
Pengtao Xie · Wei Wu · Yichen Zhu · Eric Xing -
2018 Oral: Transformation Autoregressive Networks »
Junier Oliva · Kumar Avinava Dubey · Manzil Zaheer · Barnabás Póczos · Ruslan Salakhutdinov · Eric Xing · Jeff Schneider -
2018 Poster: Nonoverlap-Promoting Variable Selection »
Pengtao Xie · Hongbao Zhang · Yichen Zhu · Eric Xing -
2018 Poster: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2018 Poster: Gated Path Planning Networks »
Lisa Lee · Emilio Parisotto · Devendra Singh Chaplot · Eric Xing · Ruslan Salakhutdinov -
2018 Oral: Gated Path Planning Networks »
Lisa Lee · Emilio Parisotto · Devendra Singh Chaplot · Eric Xing · Ruslan Salakhutdinov -
2018 Oral: Nonoverlap-Promoting Variable Selection »
Pengtao Xie · Hongbao Zhang · Yichen Zhu · Eric Xing -
2018 Oral: DiCE: The Infinitely Differentiable Monte Carlo Estimator »
Jakob Foerster · Gregory Farquhar · Maruan Al-Shedivat · Tim Rocktäschel · Eric Xing · Shimon Whiteson -
2017 Poster: Toward Controlled Generation of Text »
Zhiting Hu · Zichao Yang · Xiaodan Liang · Ruslan Salakhutdinov · Eric Xing -
2017 Talk: Toward Controlled Generation of Text »
Zhiting Hu · Zichao Yang · Xiaodan Liang · Ruslan Salakhutdinov · Eric Xing -
2017 Poster: Uncorrelation and Evenness: a New Diversity-Promoting Regularizer »
Pengtao Xie · Aarti Singh · Eric Xing -
2017 Poster: Learning Latent Space Models with Angular Constraints »
Pengtao Xie · Yuntian Deng · Yi Zhou · Abhimanu Kumar · Yaoliang Yu · James Zou · Eric Xing -
2017 Talk: Learning Latent Space Models with Angular Constraints »
Pengtao Xie · Yuntian Deng · Yi Zhou · Abhimanu Kumar · Yaoliang Yu · James Zou · Eric Xing -
2017 Talk: Uncorrelation and Evenness: a New Diversity-Promoting Regularizer »
Pengtao Xie · Aarti Singh · Eric Xing -
2017 Poster: Post-Inference Prior Swapping »
Willie Neiswanger · Eric Xing -
2017 Talk: Post-Inference Prior Swapping »
Willie Neiswanger · Eric Xing