Timezone: »

On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
Nicklas Hansen · Zhecheng Yuan · Yanjie Ze · Tongzhou Mu · Aravind Rajeswaran · Hao Su · Huazhe Xu · Xiaolong Wang

Tue Jul 25 05:00 PM -- 06:30 PM (PDT) @ Exhibit Hall 1 #711

In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area. Code: https://github.com/gemcollector/learning-from-scratch.

Author Information

Nicklas Hansen (University of California, San Diego)
Zhecheng Yuan (Tsinghua University, Tsinghua University)
Yanjie Ze (Shanghai Jiao Tong University)
Tongzhou Mu (University of California, San Diego)
Aravind Rajeswaran (Meta AI (FAIR))
Hao Su (UCSD)
Huazhe Xu (Tsinghua University, Tsinghua University)
Xiaolong Wang (UC San Diego)
Xiaolong Wang

Our group has a broad interest around the directions of Computer Vision, Machine Learning and Robotics. Our focus is on learning 3D and dynamics representations through videos and physical robotic interaction data. We explore various means of supervision signals from the data itself, language, and common sense knowledge. We leverage these comprehensive representations to facilitate the learning of robot skills, with the goal of generalizing the robot to interact effectively with a wide range of objects and environments in the real physical world. Please check out our individual research topic of Self-Supervised Learning, Video Understanding, Common Sense Reasoning, RL and Robotics, 3D Interaction, Dexterous Hand.

More from the Same Authors