Timezone: »

A distributional view on multi-objective policy optimization
Abbas Abdolmaleki · Sandy Huang · Leonard Hasenclever · Michael Neunert · Francis Song · Martina Zambelli · Murilo Martins · Nicolas Heess · Raia Hadsell · Martin Riedmiller

Tue Jul 14 12:00 PM -- 12:45 PM & Wed Jul 15 01:00 AM -- 01:45 AM (PDT) @

Many real-world problems require trading off multiple competing objectives. However, these objectives are often in different units and/or scales, which can make it challenging for practitioners to express numerical preferences over objectives in their native units. In this paper we propose a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way. We propose to learn an action distribution for each objective, and we use supervised learning to fit a parametric policy to a combination of these distributions. We demonstrate the effectiveness of our approach on challenging high-dimensional real and simulated robotics tasks, and show that setting different preferences in our framework allows us to trace out the space of nondominated solutions.

Author Information

Abbas Abdolmaleki (DeepMind)
Sandy Huang (DeepMind)
Leonard Hasenclever (DeepMind)
Michael Neunert (Google DeepMind)
Francis Song (DeepMind)
Martina Zambelli (DeepMind)
Murilo Martins (DeepMind)
Nicolas Heess (DeepMind)
Raia Hadsell (DeepMind)

Raia Hadsell, a senior research scientist at DeepMind, has worked on deep learning and robotics problems for over 10 years. Her early research developed the notion of manifold learning using Siamese networks, which has been used extensively for invariant feature learning. After completing a PhD with Yann LeCun, which featured a self-supervised deep learning vision system for a mobile robot, her research continued at Carnegie Mellon’s Robotics Institute and SRI International, and in early 2014 she joined DeepMind in London to study artificial general intelligence. Her current research focuses on the challenge of continual learning for AI agents and robotic systems. While deep RL algorithms are capable of attaining superhuman performance on single tasks, they cannot transfer that performance to additional tasks, especially if experienced sequentially. She has proposed neural approaches such as policy distillation, progressive nets, and elastic weight consolidation to solve the problem of catastrophic forgetting and improve transfer learning.

Martin Riedmiller (DeepMind)

More from the Same Authors