Timezone: »

Disentangled Attention as Intrinsic Regularization for Bimanual Multi-Object Manipulation
Minghao Zhang · Pingcheng Jian · Yi Wu · Harry (Huazhe) Xu · Xiaolong Wang

We address the problem of solving complex bimanual robot manipulation tasks on multiple objects with sparse rewards. Such complex tasks can be decomposed into sub-tasks that are accomplishable by different robots concurrently or sequentially for better efficiency. While previous reinforcement learning approaches primarily focus on modeling the compositionality of sub-tasks, two fundamental issues are largely ignored particularly when learning cooperative strategies for two robots: (i) domination, i.e., one robot may try to solve a task by itself and leaves the other idle; (ii) conflict, i.e., one robot can easily interrupt another's workspace when executing different sub-tasks simultaneously. To tackle these two issues, we propose a novel technique called disentangled attention, which provides an intrinsic regularization for two robots to focus on separate sub-tasks and objects. We evaluate our method on four bimanual manipulation tasks. Experimental results show that our proposed intrinsic regularization successfully avoids domination and reduces conflicts for the policies, which leads to significantly more effective cooperative strategies than all the baselines.

Author Information

Minghao Zhang (Tsinghua University)
Pingcheng Jian (Tsinghua University)
Yi Wu (UC Berkeley)
Harry (Huazhe) Xu (UC Berkeley)

I am a 2nd year phd at UC Berkeley doing RL and vision under Prof. Trevor Darrell. I also actively collaborate with Prof. Sergey Levine and Prof. Tengyu Ma

Xiaolong Wang (UCSD)
Xiaolong Wang

Our group has a broad interest around the directions of Computer Vision, Machine Learning and Robotics. Our focus is on learning 3D and dynamics representations through videos and physical robotic interaction data. We explore various means of supervision signals from the data itself, language, and common sense knowledge. We leverage these comprehensive representations to facilitate the learning of robot skills, with the goal of generalizing the robot to interact effectively with a wide range of objects and environments in the real physical world. Please check out our individual research topic of Self-Supervised Learning, Video Understanding, Common Sense Reasoning, RL and Robotics, 3D Interaction, Dexterous Hand.

More from the Same Authors