Workshop
The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward
Huaxiu Yao 路 Hugo Larochelle 路 Percy Liang 路 Colin Raffel 路 Jian Tang 路 Ying WEI 路 Saining Xie 路 Eric Xing 路 Chelsea Finn
Hall F
Sat 23 Jul, 5:50 a.m. PDT
The past five years have seen rapid progress in large-scale pre-trained models across a variety of domains, such as computer vision, natural language processing, robotics, bioinformatics, etc. Leveraging a huge number of parameters, large-scale pre-trained models are capable of encoding rich knowledge from labeled and/or unlabeled examples. Supervised and self-supervised pre-training have been the two most representative paradigms, through which pre-trained models have demonstrated large benefits on a wide spectrum of downstream tasks. There are also other pre-training paradigms, e.g., meta-learning for few-shot learning, where pre-trained models are trained so that they quickly adapt to solve new tasks. However, there are still many remaining challenges and new opportunities ahead for pre-training, In this workshop, we propose to have the following two foci: (1) Which pre-training methods transfer across different applications/domains, which ones don't, and why? (2) In what settings should we expect pre-training to be effective, compared to learning from scratch?
Schedule
Sat 5:50 a.m. - 6:00 a.m.
|
Introduction and Opening Remarks
(
Introduction and Opening Remarks
)
>
SlidesLive Video |
馃敆 |
Sat 6:00 a.m. - 6:30 a.m.
|
Neural Scaling of Deep Chemical Models
(
Invited Talk
)
>
SlidesLive Video |
Connor Coley 路 Nathan C. Frey 馃敆 |
Sat 6:30 a.m. - 7:00 a.m.
|
Chinchillas, Flamingos, and Gatos: Few-Shot Learning through Pre-training
(
Invited Talk
)
>
SlidesLive Video |
Oriol Vinyals 馃敆 |
Sat 7:00 a.m. - 7:15 a.m.
|
Multimodal Masked Autoencoders Learn Transferable Representations
(
Oral
)
>
SlidesLive Video |
Xinyang Geng 路 Hao Liu 路 Lisa Lee 路 Dale Schuurmans 路 Sergey Levine 路 Pieter Abbeel 馃敆 |
Sat 7:15 a.m. - 7:45 a.m.
|
How Neural Networks See, Learn and Forget
(
Invited Talk
)
>
SlidesLive Video |
Maithra Raghu 馃敆 |
Sat 7:45 a.m. - 8:15 a.m.
|
Program Synthesis, Program Semantics, and Large Language Models
(
Invited Talk
)
>
SlidesLive Video |
Charles Sutton 馃敆 |
Sat 8:15 a.m. - 9:15 a.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video |
馃敆 |
Sat 10:30 a.m. - 11:00 a.m.
|
Exploring the Limits of Large Scale Pre-training
(
Invited Talk
)
>
|
Hanie Sedghi 馃敆 |
Sat 11:00 a.m. - 11:15 a.m.
|
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior
(
Oral
)
>
link
SlidesLive Video |
Ravid Shwartz-Ziv 路 Micah Goldblum 路 Hossein Souri 路 Sanyam Kapoor 路 Chen Zhu 路 Yann LeCun 路 Andrew Wilson 馃敆 |
Sat 11:15 a.m. - 11:45 a.m.
|
Simplifying and Simplifying Self-Supervised Visual Representation Pre-Training
(
Invited Talk
)
>
SlidesLive Video |
Xinlei Chen 馃敆 |
Sat 11:45 a.m. - 12:00 p.m.
|
Plex: Towards Reliability using Pretrained Large Model Extensions
(
Oral
)
>
link
SlidesLive Video |
24 presentersDustin Tran 路 Andreas Kirsch 路 Balaji Lakshminarayanan 路 Huiyi Hu 路 Du Phan 路 D. Sculley 路 Jasper Snoek 路 Jeremiah Liu 路 JIE REN 路 Joost van Amersfoort 路 Kehang Han 路 Estefany Kelly Buchanan 路 Kevin Murphy 路 Mark Collier 路 Michael Dusenberry 路 Neil Band 路 Nithum Thain 路 Rodolphe Jenatton 路 Tim G. J Rudner 路 Yarin Gal 路 Zachary Nado 路 Zelda Mariet 路 Zi Wang 路 Zoubin Ghahramani |
Sat 12:00 p.m. - 1:30 p.m.
|
Poster Session
(
Poster Session
)
>
|
馃敆 |
Sat 1:30 p.m. - 2:00 p.m.
|
Unified and Efficient Multimodal Pretraining across Vision and Language
(
Invited Talk
)
>
SlidesLive Video |
Mohit Bansal 馃敆 |
Sat 2:00 p.m. - 2:30 p.m.
|
Benefits and Challenges of Pre-training for Environmental Monitoring
(
Invited Talk
)
>
SlidesLive Video |
Sara Beery 馃敆 |
-
|
Efficient Task Adaptation by Mixing Discovered Skills ( Poster ) > link | Eunseok Yang 路 JUNGSUB RHIM 路 Taesup Kim 馃敆 |
-
|
Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments ( Poster ) > link | Pietro Maldini 路 Mirco Mutti 路 Riccardo De Santi 路 Marcello Restelli 馃敆 |
-
|
On the Importance of Hyperparameters and Data Augmentation for Self-Supervised Learning ( Poster ) > link | Diane Wagner 路 Fabio Ferreira 路 Danny Stoll 路 Robin Tibor Schirrmeister 路 Samuel Gabriel M眉ller 路 Frank Hutter 馃敆 |
-
|
Learning Large-scale Universal User Representation with Sparse Mixture of Experts ( Poster ) > link | Caigao Jiang 路 Siqiao Xue 路 James Zhang 路 Lingyue Liu 路 Zhibo Zhu 路 Hongyan Hao 馃敆 |
-
|
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? ( Poster ) > link | Nenad Tomasev 路 Ioana Bica 路 Brian McWilliams 路 Lars Buesing 路 Razvan Pascanu 路 Charles Blundell 路 Jovana Mitrovic 馃敆 |
-
|
How robust are pre-trained models to distribution shift? ( Poster ) > link | Yuge Shi 路 Imant Daunhawer 路 Julia Vogt 路 Phil Torr 路 Amartya Sanyal 馃敆 |
-
|
Multimodal Masked Autoencoders Learn Transferable Representations ( Poster ) > link | Xinyang Geng 路 Hao Liu 路 Lisa Lee 路 Dale Schuurmans 路 Sergey Levine 路 Pieter Abbeel 馃敆 |
-
|
Is Self-Supervised Contrastive Learning More Robust Than Supervised Learning? ( Poster ) > link | Yuanyi Zhong 路 Haoran Tang 路 Junkun Chen 路 Jian Peng 路 Yu-Xiong Wang 馃敆 |
-
|
Leader-based Pre-training Framework for Cooperative Multi-Agent Reinforcement Learning ( Poster ) > link | Wenqi Chen 路 Xin Zeng 路 Amber Li 馃敆 |
-
|
Pixel-level Correspondence for Self-Supervised Learning from Video ( Poster ) > link | Yash Sharma 路 Yi Zhu 路 Chris Russell 路 Thomas Brox 馃敆 |
-
|
Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training ( Poster ) > link | Mansheej Paul 路 Brett Larsen 路 Surya Ganguli 路 Jonathan Frankle 路 Gintare Karolina Dziugaite 馃敆 |
-
|
Enhancing Multi-hop Connectivity for Graph Convolutional Networks ( Poster ) > link | Songtao Liu 路 Shixiong Jing 路 Tong Zhao 路 Zengfeng Huang 路 Dinghao Wu 馃敆 |
-
|
Investigating Why Contrastive Learning Benefits Robustness against Label Noise ( Poster ) > link | Yihao Xue 路 Kyle Whitecross 路 Baharan Mirzasoleiman 馃敆 |
-
|
Pretraining a Neural Network before Knowing Its Architecture ( Poster ) > link | Boris Knyazev 馃敆 |
-
|
Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming ( Poster ) > link | Hanlin Zhang 路 Ziyang Li 路 Jiani Huang 路 Mayur Naik 路 Eric Xing 馃敆 |
-
|
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior ( Poster ) > link | Ravid Shwartz-Ziv 路 Micah Goldblum 路 Hossein Souri 路 Sanyam Kapoor 路 Chen Zhu 路 Yann LeCun 路 Andrew Wilson 馃敆 |
-
|
How well do contrastively trained models transfer? ( Poster ) > link | M. Moein Shariatnia 路 Rahim Entezari 路 Mitchell Wortsman 路 Olga Saukh 路 Ludwig Schmidt 馃敆 |
-
|
Vote for Nearest Neighbors Meta-Pruning of Self-Supervised Networks ( Poster ) > link | Haiyan Zhao 路 Tianyi Zhou 路 Guodong Long 路 Jing Jiang 路 Chengqi Zhang 馃敆 |
-
|
On Combining Global and Localized Self-Supervised Models of Speech ( Poster ) > link | Sri Harsha Dumpala 路 Chandramouli Shama Sastry 路 Rudolf Uher 路 Sageev Oore 馃敆 |
-
|
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning ( Poster ) > link | Weixin Liang 路 Yuhui Zhang 路 Yongchan Kwon 路 Serena Yeung 路 James Zou 馃敆 |
-
|
Robustness to Adversarial Gradients: A Glimpse Into the Loss Landscape of Contrastive Pre-training ( Poster ) > link | Philip Fradkin 路 Lazar Atanackovic 路 Michael Zhang 馃敆 |
-
|
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models ( Poster ) > link | Eric Mitchell 路 Peter Henderson 路 Christopher Manning 路 Dan Jurafsky 路 Chelsea Finn 馃敆 |
-
|
Flaky Performances when Pre-Training on Relational Databases with a Plan for Future Characterization Efforts ( Poster ) > link | Shengchao Liu 路 David Vazquez 路 Jian Tang 路 Pierre-Andr茅 No毛l 馃敆 |
-
|
Training strategies with unlabeled and few labeled examples under 1-pixel attack by combining supervised and self-supervised learning ( Poster ) > link | Gabriel Biscaro Cavallari 路 Moacir Ponti 馃敆 |
-
|
Plex: Towards Reliability using Pretrained Large Model Extensions ( Poster ) > link |
24 presentersDustin Tran 路 Andreas Kirsch 路 Balaji Lakshminarayanan 路 Huiyi Hu 路 Du Phan 路 D. Sculley 路 Jasper Snoek 路 Jeremiah Liu 路 JIE REN 路 Joost van Amersfoort 路 Kehang Han 路 Estefany Kelly Buchanan 路 Kevin Murphy 路 Mark Collier 路 Michael Dusenberry 路 Neil Band 路 Nithum Thain 路 Rodolphe Jenatton 路 Tim G. J Rudner 路 Yarin Gal 路 Zachary Nado 路 Zelda Mariet 路 Zi Wang 路 Zoubin Ghahramani |
-
|
Contrastive Learning Can Find An Optimal Basis For Approximately Invariant Functions ( Poster ) > link | Daniel D. Johnson 路 Daniel D. Johnson 路 Ayoub El Hanchi 路 Ayoub El Hanchi 路 Chris Maddison 路 Chris Maddison 馃敆 |
-
|
Memorization in NLP Fine-tuning Methods ( Poster ) > link | FatemehSadat Mireshghallah 路 FatemehSadat Mireshghallah 路 Archit Uniyal 路 Archit Uniyal 路 Tianhao Wang 路 Tianhao Wang 路 David Evans 路 David Evans 路 Taylor Berg-Kirkpatrick 路 Taylor Berg-Kirkpatrick 馃敆 |
-
|
Feed-Forward Source-Free Latent Domain Adaptation via Cross-Attention ( Poster ) > link | Ondrej Bohdal 路 Da Li 路 Xu Hu 路 Timothy Hospedales 馃敆 |
-
|
On the Subspace Structure of Gradient-Based Meta-Learning ( Poster ) > link | Gustaf Tegn茅r 路 Alfredo Reichlin 路 Hang Yin 路 M氓rten Bj枚rkman 路 Danica Kragic 馃敆 |
-
|
Reinforcement Learning Assisted Layer-wise Fine-Tuning for Transfer Learning ( Poster ) > link | Tanvir Mahmud 路 Natalia Frumkin 路 Diana Marculescu 馃敆 |
-
|
Improved Generalization Bounds for Transfer Learning via Neural Collapse ( Poster ) > link | Tomer Galanti 路 Andras Gyorgy 路 Marcus Hutter 馃敆 |
-
|
Predicting Human Similarity Judgments Using Large Language Models ( Poster ) > link | Raja Marjieh 路 Ilia Sucholutsky 路 Theodore R Sumers 路 Nori Jacoby 路 Thomas Griffiths 路 Thomas Griffiths 馃敆 |
-
|
Federated Learning from Pre-Trained Models: A Contrastive Learning Approach ( Poster ) > link | Yue Tan 路 Yue Tan 路 Guodong Long 路 Guodong Long 路 Jie Ma 路 Jie Ma 路 LU LIU 路 LU LIU 路 Tianyi Zhou 路 Tianyi Zhou 路 Jing Jiang 路 Jing Jiang 馃敆 |
-
|
Similarity of Pre-trained and Fine-tuned Representations ( Poster ) > link | Thomas Goerttler 路 Thomas Goerttler 路 Klaus Obermayer 馃敆 |
-
|
Hyper-Representation for Pre-Training and Transfer Learning ( Poster ) > link | Konstantin Sch眉rholt 路 Konstantin Sch眉rholt 路 Boris Knyazev 路 Boris Knyazev 路 Xavier Giro-i-Nieto 路 Damian Borth 路 Damian Borth 馃敆 |
-
|
What Do We Maximize In Self-Supervised Learning? ( Poster ) > link | Ravid Shwartz-Ziv 路 Ravid Shwartz-Ziv 路 Randall Balestriero 路 Yann LeCun 路 Yann LeCun 馃敆 |
-
|
ECLIP: Efficient Contrastive Language-Image Pretraining via Ensemble Confidence Learning and Masked Language Modeling ( Poster ) > link | Jue Wang 路 Jue Wang 路 Haofan Wang 路 Haofan Wang 路 Weijia Wu 路 Weijia Wu 路 Jincan Deng 路 Jincan Deng 路 Yu Lu 路 Xiaofeng Guo 路 Xiaofeng Guo 路 Debing Zhang 路 Debing Zhang 馃敆 |
-
|
Boosting Monolingual Sentence Representation with Large-scale Parallel Translation Datasets ( Poster ) > link | Jue Wang 路 Jue Wang 路 Haofan Wang 路 Haofan Wang 路 Xing Wu 路 Xing Wu 路 Chaochen Gao 路 Chaochen Gao 路 Debing Zhang 馃敆 |
-
|
Knowledge Distillation for Efficient Sequences of Training Runs ( Poster ) > link | Xingyu Liu 路 Xingyu Liu 路 Alexander Leonardi 路 Alexander Leonardi 路 Lu Yu 路 Lu Yu 路 Christopher Gilmer-Hill 路 Christopher Gilmer-Hill 路 Matthew Leavitt 路 Matthew Leavitt 路 Jonathan Frankle 路 Jonathan Frankle 馃敆 |
-
|
Energy-Inspired Self-Supervised Pretraining for Vision Models ( Poster ) > link | Ze Wang 路 Ze Wang 路 Jiang Wang 路 Jiang Wang 路 Zicheng Liu 路 Zicheng Liu 路 Qiang Qiu 路 Qiang Qiu 馃敆 |
-
|
On the Connection between Pre-training Data Diversity and Robustness ( Poster ) > link | Vivek Ramanujan 路 Vivek Ramanujan 路 Thao Nguyen 路 Thao Nguyen 路 Ludwig Schmidt 路 Ali Farhadi 路 Ali Farhadi 馃敆 |
-
|
Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning ( Poster ) > link | Zhecheng Yuan 路 Zhecheng Yuan 路 Zhengrong Xue 路 Zhengrong Xue 路 Bo Yuan 路 Bo Yuan 路 Xueqian Wang 路 Xueqian Wang 路 Yi Wu 路 Yi Wu 路 Yang Gao 路 Yang Gao 路 Huazhe Xu 路 Huazhe Xu 馃敆 |
-
|
Self-Supervised Time Series Representation Learning with Temporal-Instance Similarity Distillation ( Poster ) > link | Ainaz Hajimoradlou 路 Ainaz Hajimoradlou 路 Leila Pishdad 路 Leila Pishdad 路 Frederick Tung 路 Frederick Tung 路 Maryna Karpusha 路 Maryna Karpusha 馃敆 |
-
|
Protein Representation Learning by Geometric Structure Pretraining ( Poster ) > link | Zuobai Zhang 路 Zuobai Zhang 路 Minghao Xu 路 Minghao Xu 路 Arian Jamasb 路 Arian Jamasb 路 Vijil Chenthamarakshan 路 Vijil Chenthamarakshan 路 Aurelie Lozano 路 Payel Das 路 Payel Das 路 Jian Tang 路 Jian Tang 馃敆 |
-
|
Manifold Characteristics That Predict Downstream Task Performance ( Poster ) > link | Ruan van der Merwe 路 Ruan van der Merwe 路 Gregory Newman 路 Gregory Newman 路 Etienne Barnard 路 Etienne Barnard 馃敆 |
-
|
PARS-Push: Personalized, Asynchronous and Robust Decentralized Optimization ( Poster ) > link | Mohammad Taha Toghani 路 Mohammad Taha Toghani 路 Soomin Lee 路 Soomin Lee 路 Cesar Uribe 馃敆 |
-
|
Evaluating Self-Supervised Learned Molecular Graphs ( Poster ) > link | Hanchen Wang 路 Hanchen Wang 路 Shengchao Liu 路 Shengchao Liu 路 Jean Kaddour 路 Jean Kaddour 路 Qi Liu 路 Qi Liu 路 Jian Tang 路 Jian Tang 路 Matt Kusner 路 Matt Kusner 路 Joan Lasenby 路 Joan Lasenby 馃敆 |
-
|
PSP-HDRI+: A Synthetic Dataset Generator for Pre-Training of Human-Centric Computer Vision Models ( Poster ) > link | Salehe Erfanian Ebadi 路 Salehe Erfanian Ebadi 路 Saurav Dhakad 路 Saurav Dhakad 路 Sanjay Vishwakarma 路 Sanjay Vishwakarma 路 Chunpu Wang 路 Chunpu Wang 路 You-Cyuan Jhang 路 Maciek Chociej 路 Maciek Chociej 路 Adam Crespi 路 Adam Crespi 路 Alex Thaman 路 Alex Thaman 路 Sujoy Ganguly 路 Sujoy Ganguly 馃敆 |
-
|
Generative Self-training Improves Pre-training for Visual Dialog ( Poster ) > link | Gi-Cheon Kang 路 Gi-Cheon Kang 路 Sungdong Kim 路 Sungdong Kim 路 Jin-Hwa Kim 路 Jin-Hwa Kim 路 Donghyun Kwak 路 Donghyun Kwak 路 Byoung-Tak Zhang 路 Byoung-Tak Zhang 馃敆 |
-
|
The Trade-off between Label Efficiency and Universality of Representations from Contrastive Learning ( Poster ) > link | Zhenmei Shi 路 Zhenmei Shi 路 Jiefeng Chen 路 Jiefeng Chen 路 Kunyang Li 路 Kunyang Li 路 Jayaram Raghuram 路 Jayaram Raghuram 路 Xi Wu 路 Xi Wu 路 Yingyiu Liang 路 Yingyiu Liang 路 Somesh Jha 路 Somesh Jha 馃敆 |
-
|
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision ( Poster ) > link | Yufeng Cui 路 Yufeng Cui 路 Lichen Zhao 路 Lichen Zhao 路 Feng Liang 路 Feng Liang 路 Yangguang Li 路 Yangguang Li 路 Jing Shao 路 Jing Shao 馃敆 |
-
|
LAVA: Language Audio Vision Alignment for Pre-Training Transformers on Video Data ( Poster ) > link | Sumanth Gurram 路 Sumanth Gurram 路 David Chan 路 David Chan 路 Andy Fang 路 Andy Fang 路 John Canny 路 John Canny 馃敆 |