Timezone: »

 
Poster
Learning Calibratable Policies using Programmatic Style-Consistency
Eric Zhan · Albert Tseng · Yisong Yue · Adith Swaminathan · Matthew Hausknecht

Wed Jul 15 10:00 AM -- 10:45 AM & Wed Jul 15 09:00 PM -- 09:45 PM (PDT) @
We study the problem of controllable generation of long-term sequential behaviors, where the goal is to calibrate to multiple behavior styles simultaneously. In contrast to the well-studied areas of controllable generation of images, text, and speech, there are two questions that pose significant challenges when generating long-term behaviors: how should we specify the factors of variation to control, and how can we ensure that the generated behavior faithfully demonstrates combinatorially many styles? We leverage programmatic labeling functions to specify controllable styles, and derive a formal notion of style-consistency as a learning objective, which can then be solved using conventional policy learning approaches. We evaluate our framework using demonstrations from professional basketball players and agents in the MuJoCo physics environment, and show that existing approaches that do not explicitly enforce style-consistency fail to generate diverse behaviors whereas our learned policies can be calibrated for up to $4^5 (1024)$ distinct style combinations.

Author Information

Eric Zhan (California Institute of Technology)
Albert Tseng (Caltech)
Yisong Yue (Caltech)
Yisong Yue

Yisong Yue is a Professor of Computing and Mathematical Sciences at Caltech and (via sabbatical) a Principal Scientist at Latitude AI. His research interests span both fundamental and applied pursuits, from novel learning-theoretic frameworks all the way to deep learning deployed in autonomous driving on public roads. His work has been recognized with multiple paper awards and nominations, including in robotics, computer vision, sports analytics, machine learning for health, and information retrieval. At Latitude AI, he is working on machine learning approaches to motion planning for autonomous driving.

Adith Swaminathan (Microsoft Research)
Matthew Hausknecht (Microsoft Research)

More from the Same Authors