Timezone: »
We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space. We show that by utilizing the dual formulation of the WD, we can learn score functions over policy behaviors that can in turn be used to lead policy optimization towards (or away from) (un)desired behaviors. Combined with smoothed WDs, the dual formulation allows us to devise efficient algorithms that take stochastic gradient descent steps through WD regularizers. We incorporate these regularizers into two novel on-policy algorithms, Behavior-Guided Policy Gradient and Behavior-Guided Evolution Strategies, which we demonstrate can outperform existing methods in a variety of challenging environments. We also provide an open source demo.
Author Information
Aldo Pacchiano (UC Berkeley)
Jack Parker-Holder (University of Oxford)
Yunhao Tang (Columbia University)
Krzysztof Choromanski (Google)
Anna Choromanska (NYU Tandon School of Engineering)
Michael Jordan (UC Berkeley)
More from the Same Authors
-
2021 : Marginalized Operators for Off-Policy Reinforcement Learning »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 : Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 : Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 : Estimating Optimal Policy Value in Linear Contextual Bandits beyond Gaussianity »
Jonathan Lee · Weihao Kong · Aldo Pacchiano · Vidya Muthukumar · Emma Brunskill -
2021 : Meta Learning MDPs with linear transition models »
Robert Müller · Aldo Pacchiano · Jack Parker-Holder -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2022 : Representation Learning as Finding Necessary and Sufficient Causes »
Yixin Wang · Michael Jordan -
2022 : Robust Calibration with Multi-domain Temperature Scaling »
Yaodong Yu · Stephen Bates · Yi Ma · Michael Jordan -
2022 : Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations »
Cong Lu · Philip Ball · Tim G. J Rudner · Jack Parker-Holder · Michael A Osborne · Yee-Whye Teh -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : Anytime Model Selection in Linear Bandits »
Parnian Kassraie · Aldo Pacchiano · Nicolas Emmenegger · Andreas Krause -
2023 : Undo Maps: A Tool for Adapting Policies to Perceptual Distortions »
Abhi Gupta · Ted Moskovitz · David Alvarez-Melis · Aldo Pacchiano -
2023 : In-Context Decision-Making from Supervised Pretraining »
Jonathan Lee · Annie Xie · Aldo Pacchiano · Yash Chandak · Chelsea Finn · Ofir Nachum · Emma Brunskill -
2023 : SCAFF-PD: Communication Efficient Fair and Robust Federated Learning »
Yaodong Yu · Sai Praneeth Karimireddy · Yi Ma · Michael Jordan -
2023 : Federated Conformal Predictors for Distributed Uncertainty Quantification »
Charles Lu · Yaodong Yu · Sai Praneeth Karimireddy · Michael Jordan · Ramesh Raskar -
2023 : Evaluating and Incentivizing Diverse Data Contributions in Collaborative Learning »
Baihe Huang · Sai Praneeth Karimireddy · Michael Jordan -
2023 : Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons »
Banghua Zhu · Michael Jordan · Jiantao Jiao -
2023 : Experiment Planning with Function Approximation »
Aldo Pacchiano · Jonathan Lee · Emma Brunskill -
2023 : Anytime Model Selection in Linear Bandits »
Parnian Kassraie · Aldo Pacchiano · Nicolas Emmenegger · Andreas Krause -
2023 Poster: Online Learning in Stackelberg Games with an Omniscient Follower »
Geng Zhao · Banghua Zhu · Jiantao Jiao · Michael Jordan -
2023 Poster: Understanding Self-Predictive Learning for Reinforcement Learning »
Yunhao Tang · Zhaohan Guo · Pierre Richemond · Bernardo Avila Pires · Yash Chandak · Remi Munos · Mark Rowland · Mohammad Gheshlaghi Azar · Charline Le Lan · Clare Lyle · Andras Gyorgy · Shantanu Thakoor · Will Dabney · Bilal Piot · Daniele Calandriello · Michal Valko -
2023 Poster: Leveraging Offline Data in Online Reinforcement Learning »
Andrew Wagenmaker · Aldo Pacchiano -
2023 Poster: Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition »
Yash Chandak · Shantanu Thakoor · Zhaohan Guo · Yunhao Tang · Remi Munos · Will Dabney · Diana Borsa -
2023 Poster: Towards a better understanding of representation dynamics under TD-learning »
Yunhao Tang · Remi Munos -
2023 Poster: Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization »
Chris Junchi Li · Huizhuo Yuan · Gauthier Gidel · Quanquan Gu · Michael Jordan -
2023 Poster: Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons »
Banghua Zhu · Michael Jordan · Jiantao Jiao -
2023 Poster: Fast Rates for Maximum Entropy Exploration »
Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Pierre Perrault · Yunhao Tang · Michal Valko · Pierre Menard -
2023 Oral: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation »
Mark Rowland · Yunhao Tang · Clare Lyle · Remi Munos · Marc Bellemare · Will Dabney -
2023 Poster: Quantile Credit Assignment »
Thomas Mesnard · Wenqi Chen · Alaa Saade · Yunhao Tang · Mark Rowland · Theophane Weber · Clare Lyle · Audrunas Gruslys · Michal Valko · Will Dabney · Georg Ostrovski · Eric Moulines · Remi Munos -
2023 Poster: DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Anna Harutyunyan · Remi Munos · Bernardo Avila Pires · Michal Valko -
2023 Poster: The Edge of Orthogonality: A Simple View of What Makes BYOL Tick »
Pierre Richemond · Allison Tam · Yunhao Tang · Florian Strub · Bilal Piot · Feilx Hill -
2023 Poster: Federated Conformal Predictors for Distributed Uncertainty Quantification »
Charles Lu · Yaodong Yu · Sai Praneeth Karimireddy · Michael Jordan · Ramesh Raskar -
2023 Poster: VA-learning as a more efficient alternative to Q-learning »
Yunhao Tang · Remi Munos · Mark Rowland · Michal Valko -
2023 Poster: Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice »
Toshinori Kitamura · Tadashi Kozuno · Yunhao Tang · Nino Vieillard · Michal Valko · Wenhao Yang · Jincheng Mei · Pierre Menard · Mohammad Gheshlaghi Azar · Remi Munos · Olivier Pietquin · Matthieu Geist · Csaba Szepesvari · Wataru Kumagai · Yutaka Matsuo -
2022 : Michael I. Jordan: Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control »
Michael Jordan -
2022 Poster: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre Menard -
2022 Poster: No-Regret Learning in Partially-Informed Auctions »
Wenshuo Guo · Michael Jordan · Ellen Vitercik -
2022 Oral: From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses »
Daniil Tiapkin · Denis Belomestny · Eric Moulines · Alexey Naumov · Sergey Samsonov · Yunhao Tang · Michal Valko · Pierre Menard -
2022 Spotlight: Evolving Curricula with Regret-Based Environment Design »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2022 Spotlight: No-Regret Learning in Partially-Informed Auctions »
Wenshuo Guo · Michael Jordan · Ellen Vitercik -
2022 Poster: Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning »
Yunhao Tang -
2022 Spotlight: Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning »
Yunhao Tang -
2022 Poster: Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging »
Anastasios Angelopoulos · Amit Pal Kohli · Stephen Bates · Michael Jordan · Jitendra Malik · Thayer Alshaabi · Srigokul Upadhyayula · Yaniv Romano -
2022 Poster: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2022 Poster: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy »
ZHIHAN LIU · Lu Miao · Zhaoran Wang · Michael Jordan · Zhuoran Yang -
2022 Spotlight: Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging »
Anastasios Angelopoulos · Amit Pal Kohli · Stephen Bates · Michael Jordan · Jitendra Malik · Thayer Alshaabi · Srigokul Upadhyayula · Yaniv Romano -
2022 Spotlight: Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback »
Tianyi Lin · Aldo Pacchiano · Yaodong Yu · Michael Jordan -
2021 : On the Theory of Reinforcement Learning with Once-per-Episode Feedback »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2021 Poster: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Provable Meta-Learning of Linear Representations »
Nilesh Tripuraneni · Chi Jin · Michael Jordan -
2021 Poster: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Poster: Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data »
Esther Rolf · Theodora Worledge · Benjamin Recht · Michael Jordan -
2021 Poster: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Spotlight: Provable Meta-Learning of Linear Representations »
Nilesh Tripuraneni · Chi Jin · Michael Jordan -
2021 Oral: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Spotlight: Dynamic Balancing for Model Selection in Bandits and RL »
Ashok Cutkosky · Christoph Dann · Abhimanyu Das · Claudio Gentile · Aldo Pacchiano · Manish Purohit -
2021 Spotlight: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Spotlight: Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data »
Esther Rolf · Theodora Worledge · Benjamin Recht · Michael Jordan -
2021 Poster: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Poster: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Poster: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2021 Spotlight: Taylor Expansion of Discount Factors »
Yunhao Tang · Mark Rowland · Remi Munos · Michal Valko -
2021 Spotlight: Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning »
Tadashi Kozuno · Yunhao Tang · Mark Rowland · Remi Munos · Steven Kapturowski · Will Dabney · Michal Valko · David Abel -
2021 Spotlight: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment »
Philip Ball · Cong Lu · Jack Parker-Holder · Stephen Roberts -
2020 : Panel Discussion »
Neil Lawrence · Mihaela van der Schaar · Alex Smola · Valerio Perrone · Jack Parker-Holder · Zhengying Liu -
2020 : Contributed Talk 1: Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits »
Jack Parker-Holder · Vu Nguyen · Stephen Roberts -
2020 : Discussion Panel »
Krzysztof Dembczynski · Prateek Jain · Alina Beygelzimer · Inderjit Dhillon · Anna Choromanska · Maryam Majzoubi · Yashoteja Prabhu · John Langford -
2020 : Spotlight talk 2 - Ridge Riding: Finding diverse solutions by following eigenvectors of the Hessian »
Jack Parker-Holder -
2020 Workshop: Workshop on eXtreme Classification: Theory and Applications »
Anna Choromanska · John Langford · Maryam Majzoubi · Yashoteja Prabhu -
2020 Poster: Monte-Carlo Tree Search as Regularized Policy Optimization »
Jean-Bastien Grill · Florent Altché · Yunhao Tang · Thomas Hubert · Michal Valko · Ioannis Antonoglou · Remi Munos -
2020 Poster: On Thompson Sampling with Langevin Algorithms »
Eric Mazumdar · Aldo Pacchiano · Yian Ma · Michael Jordan · Peter Bartlett -
2020 Poster: Accelerated Message Passing for Entropy-Regularized MAP Inference »
Jonathan Lee · Aldo Pacchiano · Peter Bartlett · Michael Jordan -
2020 Poster: Stochastic Flows and Geometric Optimization on the Orthogonal Group »
Krzysztof Choromanski · David Cheikhi · Jared Quincy Davis · Valerii Likhosherstov · Achille Nazaret · Achraf Bahamou · Xingyou Song · Mrugank Akarte · Jack Parker-Holder · Jacob Bergquist · Yuan Gao · Aldo Pacchiano · Tamas Sarlos · Adrian Weller · Vikas Sindhwani -
2020 Poster: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems »
Tianyi Lin · Chi Jin · Michael Jordan -
2020 Poster: Continuous-time Lower Bounds for Gradient-based Algorithms »
Michael Muehlebach · Michael Jordan -
2020 Poster: Stochastic Gradient and Langevin Processes »
Xiang Cheng · Dong Yin · Peter Bartlett · Michael Jordan -
2020 Poster: Reinforcement Learning for Integer Programming: Learning to Cut »
Yunhao Tang · Shipra Agrawal · Yuri Faenza -
2020 Poster: Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games »
Tianyi Lin · Zhengyuan Zhou · Panayotis Mertikopoulos · Michael Jordan -
2020 Poster: Ready Policy One: World Building Through Active Learning »
Philip Ball · Jack Parker-Holder · Aldo Pacchiano · Krzysztof Choromanski · Stephen Roberts -
2020 Poster: Taylor Expansion Policy Optimization »
Yunhao Tang · Michal Valko · Remi Munos -
2020 Poster: What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? »
Chi Jin · Praneeth Netrapalli · Michael Jordan -
2019 : poster session I »
Nicholas Rhinehart · Yunhao Tang · Vinay Prabhu · Dian Ang Yap · Alexander Wang · Marc Finzi · Manoj Kumar · You Lu · Abhishek Kumar · Qi Lei · Michael Przystupa · Nicola De Cao · Polina Kirichenko · Pavel Izmailov · Andrew Wilson · Jakob Kruse · Diego Mesquita · Mario Lezcano Casado · Thomas Müller · Keir Simmons · Andrei Atanov -
2019 Workshop: Workshop on AI for autonomous driving »
Anna Choromanska · Larry Jackel · Li Erran Li · Juan Carlos Niebles · Adrien Gaidon · Wei-Lun Chao · Ingmar Posner · Wei-Lun (Harry) Chao -
2019 Poster: Bridging Theory and Algorithm for Domain Adaptation »
Yuchen Zhang · Tianle Liu · Mingsheng Long · Michael Jordan -
2019 Oral: Bridging Theory and Algorithm for Domain Adaptation »
Yuchen Zhang · Tianle Liu · Mingsheng Long · Michael Jordan -
2019 Poster: Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers »
Hong Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2019 Poster: Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation »
Kaichao You · Ximei Wang · Mingsheng Long · Michael Jordan -
2019 Poster: A Dynamical Systems Perspective on Nesterov Acceleration »
Michael Muehlebach · Michael Jordan -
2019 Poster: Theoretically Principled Trade-off between Robustness and Accuracy »
Hongyang Zhang · Yaodong Yu · Jiantao Jiao · Eric Xing · Laurent El Ghaoui · Michael Jordan -
2019 Oral: A Dynamical Systems Perspective on Nesterov Acceleration »
Michael Muehlebach · Michael Jordan -
2019 Oral: Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation »
Kaichao You · Ximei Wang · Mingsheng Long · Michael Jordan -
2019 Oral: Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers »
Hong Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2019 Oral: Theoretically Principled Trade-off between Robustness and Accuracy »
Hongyang Zhang · Yaodong Yu · Jiantao Jiao · Eric Xing · Laurent El Ghaoui · Michael Jordan -
2019 Poster: On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms »
Tianyi Lin · Nhat Ho · Michael Jordan -
2019 Poster: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2019 Poster: Rao-Blackwellized Stochastic Gradients for Discrete Distributions »
Runjing Liu · Jeffrey Regier · Nilesh Tripuraneni · Michael Jordan · Jon McAuliffe -
2019 Oral: Rao-Blackwellized Stochastic Gradients for Discrete Distributions »
Runjing Liu · Jeffrey Regier · Nilesh Tripuraneni · Michael Jordan · Jon McAuliffe -
2019 Oral: Online learning with kernel losses »
Niladri Chatterji · Aldo Pacchiano · Peter Bartlett -
2019 Oral: On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms »
Tianyi Lin · Nhat Ho · Michael Jordan -
2018 Poster: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Poster: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Oral: On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo »
Niladri Chatterji · Nicolas Flammarion · Yian Ma · Peter Bartlett · Michael Jordan -
2018 Oral: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Poster: SAFFRON: an Adaptive Algorithm for Online Control of the False Discovery Rate »
Aaditya Ramdas · Tijana Zrnic · Martin Wainwright · Michael Jordan -
2018 Oral: SAFFRON: an Adaptive Algorithm for Online Control of the False Discovery Rate »
Aaditya Ramdas · Tijana Zrnic · Martin Wainwright · Michael Jordan -
2018 Poster: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation »
Jianbo Chen · Le Song · Martin Wainwright · Michael Jordan -
2018 Oral: Learning to Explain: An Information-Theoretic Perspective on Model Interpretation »
Jianbo Chen · Le Song · Martin Wainwright · Michael Jordan -
2017 Poster: How to Escape Saddle Points Efficiently »
Chi Jin · Rong Ge · Praneeth Netrapalli · Sham Kakade · Michael Jordan -
2017 Talk: How to Escape Saddle Points Efficiently »
Chi Jin · Rong Ge · Praneeth Netrapalli · Sham Kakade · Michael Jordan -
2017 Poster: Deep Transfer Learning with Joint Adaptation Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2017 Poster: Breaking Locality Accelerates Block Gauss-Seidel »
Stephen Tu · Shivaram Venkataraman · Ashia Wilson · Alex Gittens · Michael Jordan · Benjamin Recht -
2017 Talk: Deep Transfer Learning with Joint Adaptation Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2017 Talk: Breaking Locality Accelerates Block Gauss-Seidel »
Stephen Tu · Shivaram Venkataraman · Ashia Wilson · Alex Gittens · Michael Jordan · Benjamin Recht