Timezone: »
Natural agents can effectively learn from multiple data sources that differ in size, quality, and types of measurements. We study this heterogeneity in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting. Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. Empirically, we find this simple pipeline to be highly successful --- on several D4RL benchmarks (Fu et al., 2020), certain offline RL algorithms can match the performance of variants trained on a fully labelled dataset even when we label only 10% of trajectories which are highly suboptimal. To strengthen our understanding, we perform a large-scale controlled empirical study investigating the interplay of data-centric properties of the labelled and unlabelled datasets, with algorithmic design choices (e.g., choice of inverse dynamics, offline RL algorithm) to identify general trends and best practices for training RL agents on semi-supervised offline datasets.
Author Information
Qinqing Zheng (FAIR)
Mikael Henaff (Meta)
Brandon Amos (Meta)
Aditya Grover (UCLA)
More from the Same Authors
-
2021 : Neural Fixed-Point Acceleration for Convex Optimization »
Shobha Venkataraman · Brandon Amos -
2022 : BARACK: Partially Supervised Group Robustness With Guarantees »
Nimit Sohoni · Maziar Sanjabi · Nicolas Ballas · Aditya Grover · Shaoliang Nie · Hamed Firooz · Christopher Re -
2023 : Neural Optimal Transport with Lagrangian Costs »
Aram-Alexandre Pooladian · Carles Domingo i Enrich · Ricky T. Q. Chen · Brandon Amos -
2023 : Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models »
Siyan Zhao · Aditya Grover -
2023 : ClimaX: A Foundation Model for Weather and Climate »
Tung Nguyen · Johannes Brandstetter · Ashish Kapoor · Jayesh K. Gupta · Aditya Grover -
2023 : Koopman Constrained Policy Optimization: A Koopman operator theoretic method for differentiable optimal control in robotics »
Matthew Retchin · Brandon Amos · Steven Brunton · Shuran Song -
2023 : TaskMet: Task-Driven Metric Learning for Model Learning »
Dishank Bansal · Ricky T. Q. Chen · Mustafa Mukadam · Brandon Amos -
2023 : Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information »
Arman Zharmagambetov · Brandon Amos · Aaron Ferber · Taoan Huang · Bistra Dilkina · Yuandong Tian -
2023 : Decision Stacks: Flexible Reinforcement Learning via Modular Generative Models »
Siyan Zhao · Aditya Grover -
2023 : Leaving Reality to Imagination: Robust Classification via Generated Datasets »
Hritik Bansal · Aditya Grover -
2023 : Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information »
Arman Zharmagambetov · Brandon Amos · Aaron Ferber · Taoan Huang · Bistra Dilkina · Yuandong Tian -
2023 : On optimal control and machine learning »
Brandon Amos -
2023 Poster: Meta Optimal Transport »
Brandon Amos · Giulia Luise · samuel cohen · Ievgen Redko -
2023 Poster: Generative Pretraining for Black-Box Optimization »
Satvik Mehul Mashkaria · Siddarth Krishnamoorthy · Aditya Grover -
2023 Oral: A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs »
Mikael Henaff · Minqi Jiang · Roberta Raileanu -
2023 Poster: A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs »
Mikael Henaff · Minqi Jiang · Roberta Raileanu -
2023 Poster: Diffusion Models for Black-Box Optimization »
Siddarth Krishnamoorthy · Satvik Mehul Mashkaria · Aditya Grover -
2023 Poster: ClimaX: A foundation model for weather and climate »
Tung Nguyen · Johannes Brandstetter · Ashish Kapoor · Jayesh K. Gupta · Aditya Grover -
2023 Poster: Multisample Flow Matching: Straightening Flows with Minibatch Couplings »
Aram-Alexandre Pooladian · Heli Ben-Hamu · Carles Domingo i Enrich · Brandon Amos · Yaron Lipman · Ricky T. Q. Chen -
2022 : Differentiable optimization for control and reinforcement learning »
Brandon Amos -
2022 Poster: Online Decision Transformer »
Qinqing Zheng · Amy Zhang · Aditya Grover -
2022 Oral: Online Decision Transformer »
Qinqing Zheng · Amy Zhang · Aditya Grover -
2022 Poster: Matching Normalizing Flows and Probability Paths on Manifolds »
Heli Ben-Hamu · samuel cohen · Joey Bose · Brandon Amos · Maximilian Nickel · Aditya Grover · Ricky T. Q. Chen · Yaron Lipman -
2022 Spotlight: Matching Normalizing Flows and Probability Paths on Manifolds »
Heli Ben-Hamu · samuel cohen · Joey Bose · Brandon Amos · Maximilian Nickel · Aditya Grover · Ricky T. Q. Chen · Yaron Lipman -
2021 Poster: Near-Optimal Confidence Sequences for Bounded Random Variables »
Arun Kuchibhotla · Qinqing Zheng -
2021 Spotlight: Near-Optimal Confidence Sequences for Bounded Random Variables »
Arun Kuchibhotla · Qinqing Zheng -
2021 Poster: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints »
Anselm Paulus · Michal Rolinek · Vit Musil · Brandon Amos · Georg Martius -
2021 Spotlight: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints »
Anselm Paulus · Michal Rolinek · Vit Musil · Brandon Amos · Georg Martius -
2021 Poster: Riemannian Convex Potential Maps »
samuel cohen · Brandon Amos · Yaron Lipman -
2021 Spotlight: Riemannian Convex Potential Maps »
samuel cohen · Brandon Amos · Yaron Lipman -
2020 Poster: Sharp Composition Bounds for Gaussian Differential Privacy via Edgeworth Expansion »
Qinqing Zheng · Jinshuo Dong · Qi Long · Weijie Su -
2020 Poster: The Differentiable Cross-Entropy Method »
Brandon Amos · Denis Yarats