Workshop
The First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward
Huaxiu Yao · Hugo Larochelle · Percy Liang · Colin Raffel · Jian Tang · Ying WEI · Saining Xie · Eric Xing · Chelsea Finn
Hall F
Sat 23 Jul, 5:50 a.m. PDT
The past five years have seen rapid progress in large-scale pre-trained models across a variety of domains, such as computer vision, natural language processing, robotics, bioinformatics, etc. Leveraging a huge number of parameters, large-scale pre-trained models are capable of encoding rich knowledge from labeled and/or unlabeled examples. Supervised and self-supervised pre-training have been the two most representative paradigms, through which pre-trained models have demonstrated large benefits on a wide spectrum of downstream tasks. There are also other pre-training paradigms, e.g., meta-learning for few-shot learning, where pre-trained models are trained so that they quickly adapt to solve new tasks. However, there are still many remaining challenges and new opportunities ahead for pre-training, In this workshop, we propose to have the following two foci: (1) Which pre-training methods transfer across different applications/domains, which ones don't, and why? (2) In what settings should we expect pre-training to be effective, compared to learning from scratch?
Schedule
Sat 5:50 a.m. - 6:00 a.m.
|
Introduction and Opening Remarks
(
Introduction and Opening Remarks
)
>
SlidesLive Video |
🔗 |
Sat 6:00 a.m. - 6:30 a.m.
|
Neural Scaling of Deep Chemical Models
(
Invited Talk
)
>
SlidesLive Video |
Connor Coley · Nathan C. Frey 🔗 |
Sat 6:30 a.m. - 7:00 a.m.
|
Chinchillas, Flamingos, and Gatos: Few-Shot Learning through Pre-training
(
Invited Talk
)
>
SlidesLive Video |
Oriol Vinyals 🔗 |
Sat 7:00 a.m. - 7:15 a.m.
|
Multimodal Masked Autoencoders Learn Transferable Representations
(
Oral
)
>
SlidesLive Video |
Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel 🔗 |
Sat 7:15 a.m. - 7:45 a.m.
|
How Neural Networks See, Learn and Forget
(
Invited Talk
)
>
SlidesLive Video |
Maithra Raghu 🔗 |
Sat 7:45 a.m. - 8:15 a.m.
|
Program Synthesis, Program Semantics, and Large Language Models
(
Invited Talk
)
>
SlidesLive Video |
Charles Sutton 🔗 |
Sat 8:15 a.m. - 9:15 a.m.
|
Panel Discussion
(
Panel Discussion
)
>
SlidesLive Video |
🔗 |
Sat 10:30 a.m. - 11:00 a.m.
|
Exploring the Limits of Large Scale Pre-training
(
Invited Talk
)
>
|
Hanie Sedghi 🔗 |
Sat 11:00 a.m. - 11:15 a.m.
|
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior
(
Oral
)
>
link
SlidesLive Video |
Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson 🔗 |
Sat 11:15 a.m. - 11:45 a.m.
|
Simplifying and Simplifying Self-Supervised Visual Representation Pre-Training
(
Invited Talk
)
>
SlidesLive Video |
Xinlei Chen 🔗 |
Sat 11:45 a.m. - 12:00 p.m.
|
Plex: Towards Reliability using Pretrained Large Model Extensions
(
Oral
)
>
link
SlidesLive Video |
24 presentersDustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani |
Sat 12:00 p.m. - 1:30 p.m.
|
Poster Session
(
Poster Session
)
>
|
🔗 |
Sat 1:30 p.m. - 2:00 p.m.
|
Unified and Efficient Multimodal Pretraining across Vision and Language
(
Invited Talk
)
>
SlidesLive Video |
Mohit Bansal 🔗 |
Sat 2:00 p.m. - 2:30 p.m.
|
Benefits and Challenges of Pre-training for Environmental Monitoring
(
Invited Talk
)
>
SlidesLive Video |
Sara Beery 🔗 |
-
|
Efficient Task Adaptation by Mixing Discovered Skills ( Poster ) > link | Eunseok Yang · JUNGSUB RHIM · Taesup Kim 🔗 |
-
|
Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments ( Poster ) > link | Pietro Maldini · Mirco Mutti · Riccardo De Santi · Marcello Restelli 🔗 |
-
|
On the Importance of Hyperparameters and Data Augmentation for Self-Supervised Learning ( Poster ) > link | Diane Wagner · Fabio Ferreira · Danny Stoll · Robin Tibor Schirrmeister · Samuel Gabriel Müller · Frank Hutter 🔗 |
-
|
Learning Large-scale Universal User Representation with Sparse Mixture of Experts ( Poster ) > link | Caigao Jiang · Siqiao Xue · James Zhang · Lingyue Liu · Zhibo Zhu · Hongyan Hao 🔗 |
-
|
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? ( Poster ) > link | Nenad Tomasev · Ioana Bica · Brian McWilliams · Lars Buesing · Razvan Pascanu · Charles Blundell · Jovana Mitrovic 🔗 |
-
|
How robust are pre-trained models to distribution shift? ( Poster ) > link | Yuge Shi · Imant Daunhawer · Julia Vogt · Phil Torr · Amartya Sanyal 🔗 |
-
|
Multimodal Masked Autoencoders Learn Transferable Representations ( Poster ) > link | Xinyang Geng · Hao Liu · Lisa Lee · Dale Schuurmans · Sergey Levine · Pieter Abbeel 🔗 |
-
|
Is Self-Supervised Contrastive Learning More Robust Than Supervised Learning? ( Poster ) > link | Yuanyi Zhong · Haoran Tang · Junkun Chen · Jian Peng · Yu-Xiong Wang 🔗 |
-
|
Leader-based Pre-training Framework for Cooperative Multi-Agent Reinforcement Learning ( Poster ) > link | Wenqi Chen · Xin Zeng · Amber Li 🔗 |
-
|
Pixel-level Correspondence for Self-Supervised Learning from Video ( Poster ) > link | Yash Sharma · Yi Zhu · Chris Russell · Thomas Brox 🔗 |
-
|
Pre-Training on a Data Diet: Identifying Sufficient Examples for Early Training ( Poster ) > link | Mansheej Paul · Brett Larsen · Surya Ganguli · Jonathan Frankle · Gintare Karolina Dziugaite 🔗 |
-
|
Enhancing Multi-hop Connectivity for Graph Convolutional Networks ( Poster ) > link | Songtao Liu · Shixiong Jing · Tong Zhao · Zengfeng Huang · Dinghao Wu 🔗 |
-
|
Investigating Why Contrastive Learning Benefits Robustness against Label Noise ( Poster ) > link | Yihao Xue · Kyle Whitecross · Baharan Mirzasoleiman 🔗 |
-
|
Pretraining a Neural Network before Knowing Its Architecture ( Poster ) > link | Boris Knyazev 🔗 |
-
|
Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming ( Poster ) > link | Hanlin Zhang · Ziyang Li · Jiani Huang · Mayur Naik · Eric Xing 🔗 |
-
|
Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Prior ( Poster ) > link | Ravid Shwartz-Ziv · Micah Goldblum · Hossein Souri · Sanyam Kapoor · Chen Zhu · Yann LeCun · Andrew Wilson 🔗 |
-
|
How well do contrastively trained models transfer? ( Poster ) > link | M. Moein Shariatnia · Rahim Entezari · Mitchell Wortsman · Olga Saukh · Ludwig Schmidt 🔗 |
-
|
Vote for Nearest Neighbors Meta-Pruning of Self-Supervised Networks ( Poster ) > link | Haiyan Zhao · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang 🔗 |
-
|
On Combining Global and Localized Self-Supervised Models of Speech ( Poster ) > link | Sri Harsha Dumpala · Chandramouli Shama Sastry · Rudolf Uher · Sageev Oore 🔗 |
-
|
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning ( Poster ) > link | Weixin Liang · Yuhui Zhang · Yongchan Kwon · Serena Yeung · James Zou 🔗 |
-
|
Robustness to Adversarial Gradients: A Glimpse Into the Loss Landscape of Contrastive Pre-training ( Poster ) > link | Philip Fradkin · Lazar Atanackovic · Michael Zhang 🔗 |
-
|
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models ( Poster ) > link | Eric Mitchell · Peter Henderson · Christopher Manning · Dan Jurafsky · Chelsea Finn 🔗 |
-
|
Flaky Performances when Pre-Training on Relational Databases with a Plan for Future Characterization Efforts ( Poster ) > link | Shengchao Liu · David Vazquez · Jian Tang · Pierre-André Noël 🔗 |
-
|
Training strategies with unlabeled and few labeled examples under 1-pixel attack by combining supervised and self-supervised learning ( Poster ) > link | Gabriel Biscaro Cavallari · Moacir Ponti 🔗 |
-
|
Plex: Towards Reliability using Pretrained Large Model Extensions ( Poster ) > link |
24 presentersDustin Tran · Andreas Kirsch · Balaji Lakshminarayanan · Huiyi Hu · Du Phan · D. Sculley · Jasper Snoek · Jeremiah Liu · JIE REN · Joost van Amersfoort · Kehang Han · Estefany Kelly Buchanan · Kevin Murphy · Mark Collier · Michael Dusenberry · Neil Band · Nithum Thain · Rodolphe Jenatton · Tim G. J Rudner · Yarin Gal · Zachary Nado · Zelda Mariet · Zi Wang · Zoubin Ghahramani |
-
|
Contrastive Learning Can Find An Optimal Basis For Approximately Invariant Functions ( Poster ) > link | Daniel D. Johnson · Daniel D. Johnson · Ayoub El Hanchi · Ayoub El Hanchi · Chris Maddison · Chris Maddison 🔗 |
-
|
Memorization in NLP Fine-tuning Methods ( Poster ) > link | FatemehSadat Mireshghallah · FatemehSadat Mireshghallah · Archit Uniyal · Archit Uniyal · Tianhao Wang · Tianhao Wang · David Evans · David Evans · Taylor Berg-Kirkpatrick · Taylor Berg-Kirkpatrick 🔗 |
-
|
Feed-Forward Source-Free Latent Domain Adaptation via Cross-Attention ( Poster ) > link | Ondrej Bohdal · Da Li · Xu Hu · Timothy Hospedales 🔗 |
-
|
On the Subspace Structure of Gradient-Based Meta-Learning ( Poster ) > link | Gustaf Tegnér · Alfredo Reichlin · Hang Yin · Mårten Björkman · Danica Kragic 🔗 |
-
|
Reinforcement Learning Assisted Layer-wise Fine-Tuning for Transfer Learning ( Poster ) > link | Tanvir Mahmud · Natalia Frumkin · Diana Marculescu 🔗 |
-
|
Improved Generalization Bounds for Transfer Learning via Neural Collapse ( Poster ) > link | Tomer Galanti · Andras Gyorgy · Marcus Hutter 🔗 |
-
|
Predicting Human Similarity Judgments Using Large Language Models ( Poster ) > link | Raja Marjieh · Ilia Sucholutsky · Theodore R Sumers · Nori Jacoby · Thomas Griffiths · Thomas Griffiths 🔗 |
-
|
Federated Learning from Pre-Trained Models: A Contrastive Learning Approach ( Poster ) > link | Yue Tan · Yue Tan · Guodong Long · Guodong Long · Jie Ma · Jie Ma · LU LIU · LU LIU · Tianyi Zhou · Tianyi Zhou · Jing Jiang · Jing Jiang 🔗 |
-
|
Similarity of Pre-trained and Fine-tuned Representations ( Poster ) > link | Thomas Goerttler · Thomas Goerttler · Klaus Obermayer 🔗 |
-
|
Hyper-Representation for Pre-Training and Transfer Learning ( Poster ) > link | Konstantin Schürholt · Konstantin Schürholt · Boris Knyazev · Boris Knyazev · Xavier Giro-i-Nieto · Damian Borth · Damian Borth 🔗 |
-
|
What Do We Maximize In Self-Supervised Learning? ( Poster ) > link | Ravid Shwartz-Ziv · Ravid Shwartz-Ziv · Randall Balestriero · Yann LeCun · Yann LeCun 🔗 |
-
|
ECLIP: Efficient Contrastive Language-Image Pretraining via Ensemble Confidence Learning and Masked Language Modeling ( Poster ) > link | Jue Wang · Jue Wang · Haofan Wang · Haofan Wang · Weijia Wu · Weijia Wu · Jincan Deng · Jincan Deng · Yu Lu · Xiaofeng Guo · Xiaofeng Guo · Debing Zhang · Debing Zhang 🔗 |
-
|
Boosting Monolingual Sentence Representation with Large-scale Parallel Translation Datasets ( Poster ) > link | Jue Wang · Jue Wang · Haofan Wang · Haofan Wang · Xing Wu · Xing Wu · Chaochen Gao · Chaochen Gao · Debing Zhang 🔗 |
-
|
Knowledge Distillation for Efficient Sequences of Training Runs ( Poster ) > link | Xingyu Liu · Xingyu Liu · Alexander Leonardi · Alexander Leonardi · Lu Yu · Lu Yu · Christopher Gilmer-Hill · Christopher Gilmer-Hill · Matthew Leavitt · Matthew Leavitt · Jonathan Frankle · Jonathan Frankle 🔗 |
-
|
Energy-Inspired Self-Supervised Pretraining for Vision Models ( Poster ) > link | Ze Wang · Ze Wang · Jiang Wang · Jiang Wang · Zicheng Liu · Zicheng Liu · Qiang Qiu · Qiang Qiu 🔗 |
-
|
On the Connection between Pre-training Data Diversity and Robustness ( Poster ) > link | Vivek Ramanujan · Vivek Ramanujan · Thao Nguyen · Thao Nguyen · Ludwig Schmidt · Ali Farhadi · Ali Farhadi 🔗 |
-
|
Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning ( Poster ) > link | Zhecheng Yuan · Zhecheng Yuan · Zhengrong Xue · Zhengrong Xue · Bo Yuan · Bo Yuan · Xueqian Wang · Xueqian Wang · Yi Wu · Yi Wu · Yang Gao · Yang Gao · Huazhe Xu · Huazhe Xu 🔗 |
-
|
Self-Supervised Time Series Representation Learning with Temporal-Instance Similarity Distillation ( Poster ) > link | Ainaz Hajimoradlou · Ainaz Hajimoradlou · Leila Pishdad · Leila Pishdad · Frederick Tung · Frederick Tung · Maryna Karpusha · Maryna Karpusha 🔗 |
-
|
Protein Representation Learning by Geometric Structure Pretraining ( Poster ) > link | Zuobai Zhang · Zuobai Zhang · Minghao Xu · Minghao Xu · Arian Jamasb · Arian Jamasb · Vijil Chenthamarakshan · Vijil Chenthamarakshan · Aurelie Lozano · Payel Das · Payel Das · Jian Tang · Jian Tang 🔗 |
-
|
Manifold Characteristics That Predict Downstream Task Performance ( Poster ) > link | Ruan van der Merwe · Ruan van der Merwe · Gregory Newman · Gregory Newman · Etienne Barnard · Etienne Barnard 🔗 |
-
|
PARS-Push: Personalized, Asynchronous and Robust Decentralized Optimization ( Poster ) > link | Mohammad Taha Toghani · Mohammad Taha Toghani · Soomin Lee · Soomin Lee · Cesar Uribe 🔗 |
-
|
Evaluating Self-Supervised Learned Molecular Graphs ( Poster ) > link | Hanchen Wang · Hanchen Wang · Shengchao Liu · Shengchao Liu · Jean Kaddour · Jean Kaddour · Qi Liu · Qi Liu · Jian Tang · Jian Tang · Matt Kusner · Matt Kusner · Joan Lasenby · Joan Lasenby 🔗 |
-
|
PSP-HDRI+: A Synthetic Dataset Generator for Pre-Training of Human-Centric Computer Vision Models ( Poster ) > link | Salehe Erfanian Ebadi · Salehe Erfanian Ebadi · Saurav Dhakad · Saurav Dhakad · Sanjay Vishwakarma · Sanjay Vishwakarma · Chunpu Wang · Chunpu Wang · You-Cyuan Jhang · Maciek Chociej · Maciek Chociej · Adam Crespi · Adam Crespi · Alex Thaman · Alex Thaman · Sujoy Ganguly · Sujoy Ganguly 🔗 |
-
|
Generative Self-training Improves Pre-training for Visual Dialog ( Poster ) > link | Gi-Cheon Kang · Gi-Cheon Kang · Sungdong Kim · Sungdong Kim · Jin-Hwa Kim · Jin-Hwa Kim · Donghyun Kwak · Donghyun Kwak · Byoung-Tak Zhang · Byoung-Tak Zhang 🔗 |
-
|
The Trade-off between Label Efficiency and Universality of Representations from Contrastive Learning ( Poster ) > link | Zhenmei Shi · Zhenmei Shi · Jiefeng Chen · Jiefeng Chen · Kunyang Li · Kunyang Li · Jayaram Raghuram · Jayaram Raghuram · Xi Wu · Xi Wu · Yingyiu Liang · Yingyiu Liang · Somesh Jha · Somesh Jha 🔗 |
-
|
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision ( Poster ) > link | Yufeng Cui · Yufeng Cui · Lichen Zhao · Lichen Zhao · Feng Liang · Feng Liang · Yangguang Li · Yangguang Li · Jing Shao · Jing Shao 🔗 |
-
|
LAVA: Language Audio Vision Alignment for Pre-Training Transformers on Video Data ( Poster ) > link | Sumanth Gurram · Sumanth Gurram · David Chan · David Chan · Andy Fang · Andy Fang · John Canny · John Canny 🔗 |