Timezone: »
In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area. Code: https://github.com/gemcollector/learning-from-scratch.
Author Information
Nicklas Hansen (University of California, San Diego)
Zhecheng Yuan (Tsinghua University, Tsinghua University)
Yanjie Ze (Shanghai Jiao Tong University)
Tongzhou Mu (University of California, San Diego)
Aravind Rajeswaran (Meta AI (FAIR))
Hao Su (UCSD)
Huazhe Xu (Tsinghua University, Tsinghua University)
Xiaolong Wang (UC San Diego)

Our group has a broad interest around the directions of Computer Vision, Machine Learning and Robotics. Our focus is on learning 3D and dynamics representations through videos and physical robotic interaction data. We explore various means of supervision signals from the data itself, language, and common sense knowledge. We leverage these comprehensive representations to facilitate the learning of robot skills, with the goal of generalizing the robot to interact effectively with a wide range of objects and environments in the real physical world. Please check out our individual research topic of Self-Supervised Learning, Video Understanding, Common Sense Reasoning, RL and Robotics, 3D Interaction, Dexterous Hand.
More from the Same Authors
-
2021 : Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation »
Nicklas Hansen · Hao Su · Xiaolong Wang -
2021 : Visual Adversarial Imitation Learning using Variational Models »
Rafael Rafailov · Tianhe (Kevin) Yu · Aravind Rajeswaran · Chelsea Finn -
2021 : Decision Transformer: Reinforcement Learning via Sequence Modeling »
Lili Chen · Kevin Lu · Aravind Rajeswaran · Kimin Lee · Aditya Grover · Michael Laskin · Pieter Abbeel · Aravind Srinivas · Igor Mordatch -
2021 : Disentangled Attention as Intrinsic Regularization for Bimanual Multi-Object Manipulation »
Minghao Zhang · Pingcheng Jian · Yi Wu · Harry (Huazhe) Xu · Xiaolong Wang -
2021 : Learning Vision-Guided Quadrupedal Locomotionwith Cross-Modal Transformers »
Ruihan Yang · Minghao Zhang · Nicklas Hansen · Harry (Huazhe) Xu · Xiaolong Wang -
2021 : Decision Transformer: Reinforcement Learning via Sequence Modeling »
Lili Chen · Kevin Lu · Aravind Rajeswaran · Kimin Lee · Aditya Grover · Michael Laskin · Pieter Abbeel · Aravind Srinivas · Igor Mordatch -
2022 : Policy Architectures for Compositional Generalization in Control »
Allan Zhou · Vikash Kumar · Chelsea Finn · Aravind Rajeswaran -
2023 Poster: Learning Dense Correspondences between Photos and Sketches »
Xuanchen Lu · Xiaolong Wang · Judith E. Fan -
2023 Poster: Masked Trajectory Models for Prediction, Representation, and Control »
Philipp Wu · Arjun Majumdar · Kevin Stone · Yixin Lin · Igor Mordatch · Pieter Abbeel · Aravind Rajeswaran -
2023 Poster: Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization »
Stone Tao · Xiaochen Li · Tongzhou Mu · Zhiao Huang · Yuzhe Qin · Hao Su -
2023 Poster: Reparameterized Policy Learning for Multimodal Trajectory Optimization »
Zhiao Huang · Litian Liang · Zhan Ling · Xuanlin Li · Chuang Gan · Hao Su -
2023 Poster: MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses »
Yang Fu · Ishan Misra · Xiaolong Wang -
2023 Oral: Reparameterized Policy Learning for Multimodal Trajectory Optimization »
Zhiao Huang · Litian Liang · Zhan Ling · Xuanlin Li · Chuang Gan · Hao Su -
2022 Poster: Temporal Difference Learning for Model Predictive Control »
Nicklas Hansen · Hao Su · Xiaolong Wang -
2022 Poster: The Unsurprising Effectiveness of Pre-Trained Vision Models for Control »
Simone Parisi · Aravind Rajeswaran · Senthil Purushwalkam · Abhinav Gupta -
2022 Spotlight: Temporal Difference Learning for Model Predictive Control »
Nicklas Hansen · Hao Su · Xiaolong Wang -
2022 Oral: The Unsurprising Effectiveness of Pre-Trained Vision Models for Control »
Simone Parisi · Aravind Rajeswaran · Senthil Purushwalkam · Abhinav Gupta -
2022 Poster: Translating Robot Skills: Learning Unsupervised Skill Correspondences Across Robots »
Tanmay Shankar · Yixin Lin · Aravind Rajeswaran · Vikash Kumar · Stuart Anderson · Jean Oh -
2022 Poster: Improving Policy Optimization with Generalist-Specialist Learning »
Zhiwei Jia · Xuanlin Li · Zhan Ling · Shuang Liu · Yiran Wu · Hao Su -
2022 Spotlight: Translating Robot Skills: Learning Unsupervised Skill Correspondences Across Robots »
Tanmay Shankar · Yixin Lin · Aravind Rajeswaran · Vikash Kumar · Stuart Anderson · Jean Oh -
2022 Spotlight: Improving Policy Optimization with Generalist-Specialist Learning »
Zhiwei Jia · Xuanlin Li · Zhan Ling · Shuang Liu · Yiran Wu · Hao Su -
2021 Poster: Compositional Video Synthesis with Action Graphs »
Amir Bar · Roi Herzig · Xiaolong Wang · Anna Rohrbach · Gal Chechik · Trevor Darrell · Amir Globerson -
2021 Spotlight: Compositional Video Synthesis with Action Graphs »
Amir Bar · Roi Herzig · Xiaolong Wang · Anna Rohrbach · Gal Chechik · Trevor Darrell · Amir Globerson -
2020 Poster: A Game Theoretic Framework for Model Based Reinforcement Learning »
Aravind Rajeswaran · Igor Mordatch · Vikash Kumar -
2020 Poster: Information-Theoretic Local Minima Characterization and Regularization »
Zhiwei Jia · Hao Su -
2020 Poster: Deep Isometric Learning for Visual Recognition »
Haozhi Qi · Chong You · Xiaolong Wang · Yi Ma · Jitendra Malik -
2019 : Welcome and Introduction »
Aravind Rajeswaran -
2019 Workshop: Generative Modeling and Model-Based Reasoning for Robotics and AI »
Aravind Rajeswaran · Emanuel Todorov · Igor Mordatch · William Agnew · Amy Zhang · Joelle Pineau · Michael Chang · Dumitru Erhan · Sergey Levine · Kimberly Stachenfeld · Marvin Zhang -
2019 Poster: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine -
2019 Oral: Online Meta-Learning »
Chelsea Finn · Aravind Rajeswaran · Sham Kakade · Sergey Levine