Timezone: »
Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function. Popular approaches like CEM and CMA-ES greedily focus on promising regions of the search space and may get trapped in local maxima. DOO and VOOT balance exploration and exploitation, but use space partitioning strategies independent of the reward function to be optimized. Recently, LaMCTS empirically learns to partition the search space in a reward-sensitive manner for black-box optimization. In this paper, we develop a novel formal regret analysis for when and why such an adaptive region partitioning scheme works. We also propose a new path planning method PlaLaM which improves the function value estimation within each sub-region, and uses a latent representation of the search space. Empirically, PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima, and shows benefits when plugged into model-based RL with planning components such as PETS. These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale. Code is available at https://github.com/yangkevin2/plalam.
Author Information
Kevin Yang (UC Berkeley)
Tianjun Zhang (UC Berkeley)
Chris Cummins (Facebook AI Research)
Brandon Cui (Facebook AI Research)
Benoit Steiner (FAIR)
Linnan Wang (Brown)
Joseph E Gonzalez (UC Berkeley)
Dan Klein (UC Berkeley)
Yuandong Tian (Facebook AI Research)
More from the Same Authors
-
2021 : K-level Reasoning for Zero-Shot Coordination in Hanabi »
Brandon Cui -
2023 Poster: The Wisdom of Hindsight Makes Language Models Better Instruction Followers »
Tianjun Zhang · Fangchen Liu · Justin Wong · Pieter Abbeel · Joseph E Gonzalez -
2023 Poster: Poisoning Language Models During Instruction Tuning »
Alexander Wan · Eric Wallace · Sheng Shen · Dan Klein -
2022 Poster: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Poster: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Poster: Flashlight: Enabling Innovation in Tools for Machine Learning »
Jacob Kahn · Vineel Pratap · Tatiana Likhomanenko · Qiantong Xu · Awni Hannun · Jeff Cai · Paden Tomasello · Ann Lee · Edouard Grave · Gilad Avidov · Benoit Steiner · Vitaliy Liptchinsky · Gabriel Synnaeve · Ronan Collobert -
2022 Poster: POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging »
Shishir G. Patil · Paras Jain · Prabal Dutta · Ion Stoica · Joseph E Gonzalez -
2022 Spotlight: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Spotlight: POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging »
Shishir G. Patil · Paras Jain · Prabal Dutta · Ion Stoica · Joseph E Gonzalez -
2022 Spotlight: Flashlight: Enabling Innovation in Tools for Machine Learning »
Jacob Kahn · Vineel Pratap · Tatiana Likhomanenko · Qiantong Xu · Awni Hannun · Jeff Cai · Paden Tomasello · Ann Lee · Edouard Grave · Gilad Avidov · Benoit Steiner · Vitaliy Liptchinsky · Gabriel Synnaeve · Ronan Collobert -
2022 Spotlight: GACT: Activation Compressed Training for Generic Network Architectures »
Xiaoxuan Liu · Lianmin Zheng · Dequan Wang · Yukuo Cen · Weize Chen · Xu Han · Jianfei Chen · Zhiyuan Liu · Jie Tang · Joseph Gonzalez · Michael Mahoney · Alvin Cheung -
2022 Poster: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Poster: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Spotlight: Denoised MDPs: Learning World Models Better Than the World Itself »
Tongzhou Wang · Simon Du · Antonio Torralba · Phillip Isola · Amy Zhang · Yuandong Tian -
2022 Spotlight: Neurotoxin: Durable Backdoors in Federated Learning »
Zhengming Zhang · Ashwinee Panda · Linyue Song · Yaoqing Yang · Michael Mahoney · Prateek Mittal · Kannan Ramchandran · Joseph E Gonzalez -
2022 Poster: Describing Differences between Text Distributions with Natural Language »
Ruiqi Zhong · Charlie Snell · Dan Klein · Jacob Steinhardt -
2022 Spotlight: Describing Differences between Text Distributions with Natural Language »
Ruiqi Zhong · Charlie Snell · Dan Klein · Jacob Steinhardt -
2021 : RL + Operations Research Panel »
Jim Dai · Fei Fang · Shie Mannor · Yuandong Tian · Zhiwei (Tony) Qin · Zongqing Lu -
2021 Poster: Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing »
Cheng Fu · Hanxian Huang · Xinyun Chen · Yuandong Tian · Jishen Zhao -
2021 Poster: Calibrate Before Use: Improving Few-shot Performance of Language Models »
Tony Z. Zhao · Eric Wallace · Shi Feng · Dan Klein · Sameer Singh -
2021 Oral: Learn-to-Share: A Hardware-friendly Transfer Learning Framework Exploiting Computation and Parameter Sharing »
Cheng Fu · Hanxian Huang · Xinyun Chen · Yuandong Tian · Jishen Zhao -
2021 Oral: Calibrate Before Use: Improving Few-shot Performance of Language Models »
Tony Z. Zhao · Eric Wallace · Shi Feng · Dan Klein · Sameer Singh -
2021 Poster: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Poster: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Oral: Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism »
Brijen Thananjeyan · Kirthevasan Kandasamy · Ion Stoica · Michael Jordan · Ken Goldberg · Joseph E Gonzalez -
2021 Oral: Understanding self-supervised learning dynamics without contrastive pairs »
Yuandong Tian · Xinlei Chen · Surya Ganguli -
2021 Poster: ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations »
Chris Cummins · Zacharias Fisches · Tal Ben-Nun · Torsten Hoefler · Michael O'Boyle · Hugh Leather -
2021 Poster: Off-Belief Learning »
Hengyuan Hu · Adam Lerer · Brandon Cui · Luis Pineda · Noam Brown · Jakob Foerster -
2021 Spotlight: Off-Belief Learning »
Hengyuan Hu · Adam Lerer · Brandon Cui · Luis Pineda · Noam Brown · Jakob Foerster -
2021 Spotlight: ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations »
Chris Cummins · Zacharias Fisches · Tal Ben-Nun · Torsten Hoefler · Michael O'Boyle · Hugh Leather -
2021 Poster: Few-Shot Neural Architecture Search »
Yiyang Zhao · Linnan Wang · Yuandong Tian · Rodrigo Fonseca · Tian Guo -
2021 Poster: Trajectory Diversity for Zero-Shot Coordination »
Andrei Lupu · Brandon Cui · Hengyuan Hu · Jakob Foerster -
2021 Poster: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training »
Jianfei Chen · Lianmin Zheng · Zhewei Yao · Dequan Wang · Ion Stoica · Michael Mahoney · Joseph E Gonzalez -
2021 Oral: ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training »
Jianfei Chen · Lianmin Zheng · Zhewei Yao · Dequan Wang · Ion Stoica · Michael Mahoney · Joseph E Gonzalez -
2021 Oral: Few-Shot Neural Architecture Search »
Yiyang Zhao · Linnan Wang · Yuandong Tian · Rodrigo Fonseca · Tian Guo -
2021 Spotlight: Trajectory Diversity for Zero-Shot Coordination »
Andrei Lupu · Brandon Cui · Hengyuan Hu · Jakob Foerster -
2020 Poster: Frustratingly Simple Few-Shot Object Detection »
Xin Wang · Thomas Huang · Joseph E Gonzalez · Trevor Darrell · Fisher Yu -
2020 Poster: Improving Molecular Design by Stochastic Iterative Target Augmentation »
Kevin Yang · Wengong Jin · Kyle Swanson · Regina Barzilay · Tommi Jaakkola -
2020 Poster: Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers »
Zhuohan Li · Eric Wallace · Sheng Shen · Kevin Lin · Kurt Keutzer · Dan Klein · Joseph Gonzalez -
2020 Poster: Student Specialization in Deep Rectified Networks With Finite Width and Input Dimension »
Yuandong Tian -
2020 Poster: FetchSGD: Communication-Efficient Federated Learning with Sketching »
Daniel Rothchild · Ashwinee Panda · Enayat Ullah · Nikita Ivkin · Ion Stoica · Vladimir Braverman · Joseph E Gonzalez · Raman Arora -
2019 Poster: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2019 Oral: ELF OpenGo: an analysis and open reimplementation of AlphaZero »
Yuandong Tian · Jerry Ma · Qucheng Gong · Shubho Sengupta · Zhuoyuan Chen · James Pinkerton · Larry Zitnick -
2018 Poster: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Oral: RLlib: Abstractions for Distributed Reinforcement Learning »
Eric Liang · Richard Liaw · Robert Nishihara · Philipp Moritz · Roy Fox · Ken Goldberg · Joseph E Gonzalez · Michael Jordan · Ion Stoica -
2018 Poster: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2018 Oral: Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima »
Simon Du · Jason Lee · Yuandong Tian · Aarti Singh · Barnabás Póczos -
2017 Poster: Modular Multitask Reinforcement Learning with Policy Sketches »
Jacob Andreas · Dan Klein · Sergey Levine -
2017 Poster: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis »
Yuandong Tian -
2017 Talk: An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis »
Yuandong Tian -
2017 Talk: Modular Multitask Reinforcement Learning with Policy Sketches »
Jacob Andreas · Dan Klein · Sergey Levine