Timezone: »
Agents that learn to select optimal actions represent a prominent focus of the sequential decision-making literature. In the face of a complex environment or constraints on time and resources, however, aiming to synthesize such an optimal policy can become infeasible. These scenarios give rise to an important trade-off between the information an agent must acquire to learn and the sub-optimality of the resulting policy. While an agent designer has a preference for how this trade-off is resolved, existing approaches further require that the designer translate these preferences into a fixed learning target for the agent. In this work, leveraging rate-distortion theory, we automate this process such that the designer need only express their preferences via a single hyperparameter and the agent is endowed with the ability to compute its own learning targets that best achieve the desired trade-off. We establish a general bound on expected discounted regret for an agent that decides what to learn in this manner along with computational experiments that illustrate the expressiveness of designer preferences and even show improvements over Thompson sampling in identifying an optimal policy.
Author Information
Dilip Arumugam (Stanford University)
Benjamin Van Roy (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Deciding What to Learn: A Rate-Distortion Approach »
Wed. Jul 21st 02:35 -- 02:40 PM Room
More from the Same Authors
-
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2022 : Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2020 Poster: Flexible and Efficient Long-Range Planning Through Curious Exploration »
Aidan Curtis · Minjian Xin · Dilip Arumugam · Kevin Feigelis · Daniel Yamins -
2018 Poster: Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Benjamin Van Roy -
2018 Oral: Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Benjamin Van Roy -
2017 Poster: Why is Posterior Sampling Better than Optimism for Reinforcement Learning? »
Ian Osband · Benjamin Van Roy -
2017 Talk: Why is Posterior Sampling Better than Optimism for Reinforcement Learning? »
Ian Osband · Benjamin Van Roy