Timezone: »

Zero-Shot Reward Specification via Grounded Natural Language
Parsa Mahmoudieh · Deepak Pathak · Trevor Darrell

Wed Jul 20 03:30 PM -- 05:30 PM (PDT) @ Hall E #838

Reward signals in reinforcement learning can be expensive signals in many tasks and often require access to direct state. The alternative to reward signals are usually demonstrations or goal images which can be labor intensive to collect. Goal text descriptions are a low effort way of communicating the desired task. Goal text conditioned policies so far though have been trained with reward signals that have access to state or labelled expert demonstrations. We devise a model that leverages CLIP to provide a reward signal on only raw pixels to learn a set of simulated robotic manipulation tasks. We distill the policies learned with this reward signal on several tasks to produce one goal text conditioned policy.

Author Information

Parsa Mahmoudieh (UC Berkeley)
Deepak Pathak (Carnegie Mellon University)
Trevor Darrell (University of California at Berkeley)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors