Timezone: »
Building scalable models to learn from diverse, multimodal data remains an open challenge. For vision-language data, the dominant approaches are based on contrastive learning objectives that train a separate encoder for each modality. While effective, contrastive learning approaches introduce sampling bias depending on the data augmentations used, which can degrade performance on downstream tasks. Moreover, these methods are limited to paired image-text data, and cannot leverage widely-available unpaired data. In this paper, we investigate whether a large multimodal model trained purely via masked token prediction, without using modality-specific encoders or contrastive learning, can learn transferable representations for downstream tasks. We propose a simple and scalable network architecture, the Multimodal Masked Autoencoder (M3AE), which learns a unified encoder for both vision and language data via masked token prediction. We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks. We demonstrate the scalability of M3AE with larger model size and training time, and its flexibility to train on both paired image-text data as well as unpaired data.
Author Information
Xinyang Geng (UC Berkeley)
Hao Liu (UC Berkeley)
Lisa Lee (Google Brain)
Dale Schuurmans (University of Alberta)
Sergey Levine (University of Washington)
Pieter Abbeel (UC Berkeley & Covariant)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : Multimodal Masked Autoencoders Learn Transferable Representations »
Sat. Jul 23rd 02:00 -- 02:15 PM Room
More from the Same Authors
-
2021 : Decision Transformer: Reinforcement Learning via Sequence Modeling »
Lili Chen · Kevin Lu · Aravind Rajeswaran · Kimin Lee · Aditya Grover · Michael Laskin · Pieter Abbeel · Aravind Srinivas · Igor Mordatch -
2021 : Reinforcement Learning as One Big Sequence Modeling Problem »
Michael Janner · Qiyang Li · Sergey Levine -
2021 : Data-Efficient Exploration with Self Play for Atari »
Michael Laskin · Catherine Cang · Ryan Rudes · Pieter Abbeel -
2021 : Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments »
Nicholas Rhinehart · Jenny Wang · Glen Berseth · John Co-Reyes · Danijar Hafner · Chelsea Finn · Sergey Levine -
2021 : Explore and Control with Adversarial Surprise »
Arnaud Fickinger · Natasha Jaques · Samyak Parajuli · Michael Chang · Nicholas Rhinehart · Glen Berseth · Stuart Russell · Sergey Levine -
2021 : Hierarchical Few-Shot Imitation with Skill Transition Models »
kourosh hakhamaneshi · Ruihan Zhao · Albert Zhan · Pieter Abbeel · Michael Laskin -
2021 : Decision Transformer: Reinforcement Learning via Sequence Modeling »
Lili Chen · Kevin Lu · Aravind Rajeswaran · Kimin Lee · Aditya Grover · Michael Laskin · Pieter Abbeel · Aravind Srinivas · Igor Mordatch -
2021 : Explaining Reinforcement Learning Policies through Counterfactual Trajectories »
Julius Frost · Olivia Watkins · Eric Weiner · Pieter Abbeel · Trevor Darrell · Bryan Plummer · Kate Saenko -
2022 : Effective Offline RL Needs Going Beyond Pessimism: Representations and Distributional Shift »
Xinyang Geng · Kevin Li · Abhishek Gupta · Aviral Kumar · Sergey Levine -
2022 : DASCO: Dual-Generator Adversarial Support Constrained Offline Reinforcement Learning »
Quan Vuong · Aviral Kumar · Sergey Levine · Yevgen Chebotar -
2022 : Distributionally Adaptive Meta Reinforcement Learning »
Anurag Ajay · Dibya Ghosh · Sergey Levine · Pulkit Agrawal · Abhishek Gupta -
2022 : You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping »
Annie Chen · Archit Sharma · Sergey Levine · Chelsea Finn -
2022 Poster: Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization »
Brandon Trabucco · Xinyang Geng · Aviral Kumar · Sergey Levine -
2022 Poster: Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks »
Litian Liang · Yaosheng Xu · Stephen Mcaleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2022 Poster: Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents »
Wenlong Huang · Pieter Abbeel · Deepak Pathak · Igor Mordatch -
2022 Spotlight: Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks »
Litian Liang · Yaosheng Xu · Stephen Mcaleer · Dailin Hu · Alexander Ihler · Pieter Abbeel · Roy Fox -
2022 Spotlight: Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents »
Wenlong Huang · Pieter Abbeel · Deepak Pathak · Igor Mordatch -
2022 Spotlight: Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization »
Brandon Trabucco · Xinyang Geng · Aviral Kumar · Sergey Levine -
2022 Poster: Reinforcement Learning with Action-Free Pre-Training from Videos »
Younggyo Seo · Kimin Lee · Stephen James · Pieter Abbeel -
2022 Spotlight: Reinforcement Learning with Action-Free Pre-Training from Videos »
Younggyo Seo · Kimin Lee · Stephen James · Pieter Abbeel -
2021 : Panel Discussion »
Rosemary Nan Ke · Danijar Hafner · Pieter Abbeel · Chelsea Finn · Chelsea Finn -
2021 : Invited Talk by Pieter Abbeel »
Pieter Abbeel -
2021 Poster: Decoupling Representation Learning from Reinforcement Learning »
Adam Stooke · Kimin Lee · Pieter Abbeel · Michael Laskin -
2021 Spotlight: Decoupling Representation Learning from Reinforcement Learning »
Adam Stooke · Kimin Lee · Pieter Abbeel · Michael Laskin -
2021 Poster: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: Leveraging Non-uniformity in First-order Non-convex Optimization »
Jincheng Mei · Yue Gao · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: APS: Active Pretraining with Successor Features »
Hao Liu · Pieter Abbeel -
2021 Poster: Characterizing the Gap Between Actor-Critic and Policy Gradient »
Junfeng Wen · Saurabh Kumar · Ramki Gummadi · Dale Schuurmans -
2021 Poster: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning »
Kimin Lee · Michael Laskin · Aravind Srinivas · Pieter Abbeel -
2021 Spotlight: Characterizing the Gap Between Actor-Critic and Policy Gradient »
Junfeng Wen · Saurabh Kumar · Ramki Gummadi · Dale Schuurmans -
2021 Spotlight: SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning »
Kimin Lee · Michael Laskin · Aravind Srinivas · Pieter Abbeel -
2021 Oral: APS: Active Pretraining with Successor Features »
Hao Liu · Pieter Abbeel -
2021 Poster: PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training »
Kimin Lee · Laura Smith · Pieter Abbeel -
2021 Oral: PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training »
Kimin Lee · Laura Smith · Pieter Abbeel -
2021 Poster: Unsupervised Learning of Visual 3D Keypoints for Control »
Boyuan Chen · Pieter Abbeel · Deepak Pathak -
2021 Poster: State Entropy Maximization with Random Encoders for Efficient Exploration »
Younggyo Seo · Lili Chen · Jinwoo Shin · Honglak Lee · Pieter Abbeel · Kimin Lee -
2021 Poster: MSA Transformer »
Roshan Rao · Jason Liu · Robert Verkuil · Joshua Meier · John Canny · Pieter Abbeel · Tom Sercu · Alexander Rives -
2021 Spotlight: MSA Transformer »
Roshan Rao · Jason Liu · Robert Verkuil · Joshua Meier · John Canny · Pieter Abbeel · Tom Sercu · Alexander Rives -
2021 Spotlight: State Entropy Maximization with Random Encoders for Efficient Exploration »
Younggyo Seo · Lili Chen · Jinwoo Shin · Honglak Lee · Pieter Abbeel · Kimin Lee -
2021 Spotlight: Unsupervised Learning of Visual 3D Keypoints for Control »
Boyuan Chen · Pieter Abbeel · Deepak Pathak -
2021 : Part 2: Unsupervised Pre-Training in RL »
Pieter Abbeel -
2021 Tutorial: Unsupervised Learning for Reinforcement Learning »
Aravind Srinivas · Pieter Abbeel -
2020 Poster: On the Global Convergence Rates of Softmax Policy Gradient Methods »
Jincheng Mei · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: CURL: Contrastive Unsupervised Representations for Reinforcement Learning »
Michael Laskin · Aravind Srinivas · Pieter Abbeel -
2020 Poster: Domain Aggregation Networks for Multi-Source Domain Adaptation »
Junfeng Wen · Russell Greiner · Dale Schuurmans -
2020 Poster: Batch Stationary Distribution Estimation »
Junfeng Wen · Bo Dai · Lihong Li · Dale Schuurmans -
2020 Poster: Hallucinative Topological Memory for Zero-Shot Visual Planning »
Kara Liu · Thanard Kurutach · Christine Tung · Pieter Abbeel · Aviv Tamar -
2020 Poster: Planning to Explore via Self-Supervised World Models »
Ramanan Sekar · Oleh Rybkin · Kostas Daniilidis · Pieter Abbeel · Danijar Hafner · Deepak Pathak -
2020 Poster: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods »
Adam Stooke · Joshua Achiam · Pieter Abbeel -
2020 Poster: Variable Skipping for Autoregressive Range Density Estimation »
Eric Liang · Zongheng Yang · Ion Stoica · Pieter Abbeel · Yan Duan · Peter Chen -
2020 Poster: Hierarchically Decoupled Imitation For Morphological Transfer »
Donald Hejna · Lerrel Pinto · Pieter Abbeel -
2019 Workshop: Workshop on Self-Supervised Learning »
Aaron van den Oord · Yusuf Aytar · Carl Doersch · Carl Vondrick · Alec Radford · Pierre Sermanet · Amir Zamir · Pieter Abbeel -
2019 Poster: Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables »
Friso Kingma · Pieter Abbeel · Jonathan Ho -
2019 Poster: On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference »
Rohin Shah · Noah Gundotra · Pieter Abbeel · Anca Dragan -
2019 Oral: On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference »
Rohin Shah · Noah Gundotra · Pieter Abbeel · Anca Dragan -
2019 Oral: Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables »
Friso Kingma · Pieter Abbeel · Jonathan Ho -
2019 Poster: Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules »
Daniel Ho · Eric Liang · Peter Chen · Ion Stoica · Pieter Abbeel -
2019 Poster: Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design »
Jonathan Ho · Peter Chen · Aravind Srinivas · Rocky Duan · Pieter Abbeel -
2019 Poster: Taming MAML: Efficient unbiased meta-reinforcement learning »
Hao Liu · Richard Socher · Caiming Xiong -
2019 Poster: SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning »
Marvin Zhang · Sharad Vikram · Laura Smith · Pieter Abbeel · Matthew Johnson · Sergey Levine -
2019 Oral: Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design »
Jonathan Ho · Peter Chen · Aravind Srinivas · Rocky Duan · Pieter Abbeel -
2019 Oral: Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules »
Daniel Ho · Eric Liang · Peter Chen · Ion Stoica · Pieter Abbeel -
2019 Oral: Taming MAML: Efficient unbiased meta-reinforcement learning »
Hao Liu · Richard Socher · Caiming Xiong -
2019 Oral: SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning »
Marvin Zhang · Sharad Vikram · Laura Smith · Pieter Abbeel · Matthew Johnson · Sergey Levine -
2018 Poster: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor »
Tuomas Haarnoja · Aurick Zhou · Pieter Abbeel · Sergey Levine -
2018 Poster: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: PixelSNAIL: An Improved Autoregressive Generative Model »
Xi Chen · Nikhil Mishra · Mostafa Rohaninejad · Pieter Abbeel -
2018 Oral: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor »
Tuomas Haarnoja · Aurick Zhou · Pieter Abbeel · Sergey Levine -
2018 Oral: PixelSNAIL: An Improved Autoregressive Generative Model »
Xi Chen · Nikhil Mishra · Mostafa Rohaninejad · Pieter Abbeel -
2018 Oral: Smoothed Action Value Functions for Learning Gaussian Policies »
Ofir Nachum · Mohammad Norouzi · George Tucker · Dale Schuurmans -
2018 Poster: Automatic Goal Generation for Reinforcement Learning Agents »
Carlos Florensa · David Held · Xinyang Geng · Pieter Abbeel -
2018 Poster: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2018 Poster: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine -
2018 Poster: Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control »
Aravind Srinivas · Allan Jabri · Pieter Abbeel · Sergey Levine · Chelsea Finn -
2018 Oral: Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control »
Aravind Srinivas · Allan Jabri · Pieter Abbeel · Sergey Levine · Chelsea Finn -
2018 Oral: Automatic Goal Generation for Reinforcement Learning Agents »
Carlos Florensa · David Held · Xinyang Geng · Pieter Abbeel -
2018 Oral: Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings »
John Co-Reyes · Yu Xuan Liu · Abhishek Gupta · Benjamin Eysenbach · Pieter Abbeel · Sergey Levine -
2018 Oral: Latent Space Policies for Hierarchical Reinforcement Learning »
Tuomas Haarnoja · Kristian Hartikainen · Pieter Abbeel · Sergey Levine -
2017 Poster: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks »
Chelsea Finn · Pieter Abbeel · Sergey Levine -
2017 Poster: Prediction and Control with Temporal Segment Models »
Nikhil Mishra · Pieter Abbeel · Igor Mordatch -
2017 Poster: Reinforcement Learning with Deep Energy-Based Policies »
Tuomas Haarnoja · Haoran Tang · Pieter Abbeel · Sergey Levine -
2017 Poster: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel -
2017 Talk: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks »
Chelsea Finn · Pieter Abbeel · Sergey Levine -
2017 Talk: Prediction and Control with Temporal Segment Models »
Nikhil Mishra · Pieter Abbeel · Igor Mordatch -
2017 Talk: Reinforcement Learning with Deep Energy-Based Policies »
Tuomas Haarnoja · Haoran Tang · Pieter Abbeel · Sergey Levine -
2017 Talk: Constrained Policy Optimization »
Joshua Achiam · David Held · Aviv Tamar · Pieter Abbeel