Timezone: »
Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.
Author Information
Yash Chandak (University of Massachusetts Amherst)
Georgios Theocharous (Adobe Research)
James Kostas (UMass Amherst)
Scott Jordan (University of Massachusetts Amherst)
Philip Thomas (University of Massachusetts Amherst)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Learning Action Representations for Reinforcement Learning »
Tue. Jun 11th 11:20 -- 11:25 PM Room Room 104
More from the Same Authors
-
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 Spotlight: Towards Practical Mean Bounds for Small Samples »
My Phan · Philip Thomas · Erik Learned-Miller -
2021 Poster: Towards Practical Mean Bounds for Small Samples »
My Phan · Philip Thomas · Erik Learned-Miller -
2021 Poster: Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods »
Chris Nota · Philip Thomas · Bruno C. da Silva -
2021 Spotlight: Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods »
Chris Nota · Philip Thomas · Bruno C. da Silva -
2021 Poster: High Confidence Generalization for Reinforcement Learning »
James Kostas · Yash Chandak · Scott Jordan · Georgios Theocharous · Philip Thomas -
2021 Spotlight: High Confidence Generalization for Reinforcement Learning »
James Kostas · Yash Chandak · Scott Jordan · Georgios Theocharous · Philip Thomas -
2020 Poster: Asynchronous Coagent Networks »
James Kostas · Chris Nota · Philip Thomas -
2020 Poster: Evaluating the Performance of Reinforcement Learning Algorithms »
Scott Jordan · Yash Chandak · Daniel Cohen · Mengxue Zhang · Philip Thomas -
2020 Poster: Optimizing for the Future in Non-Stationary MDPs »
Yash Chandak · Georgios Theocharous · Shiv Shankar · Martha White · Sridhar Mahadevan · Philip Thomas -
2019 Poster: Concentration Inequalities for Conditional Value at Risk »
Philip Thomas · Erik Learned-Miller -
2019 Oral: Concentration Inequalities for Conditional Value at Risk »
Philip Thomas · Erik Learned-Miller -
2018 Poster: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill -
2018 Oral: Decoupling Gradient-Like Learning Rules from Representations »
Philip Thomas · Christoph Dann · Emma Brunskill