Timezone: »
The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment, whose underlying dynamics likely exceed the agent's capacity for representation. In this work, we consider the scenario where agent limitations may entirely preclude identifying an exactly value-equivalent model, immediately giving rise to a trade-off between identifying a model that is simple enough to learn while only incurring bounded sub-optimality. To address this problem, we introduce an algorithm that, using rate-distortion theory, iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model. We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity.
Author Information
Dilip Arumugam (Stanford University)
Benjamin Van Roy (Stanford University)
More from the Same Authors
-
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2021 : Bad-Policy Density: A Measure of Reinforcement-Learning Hardness »
David Abel · Cameron Allen · Dilip Arumugam · D Ellis Hershkowitz · Michael L. Littman · Lawson Wong -
2021 Poster: Deciding What to Learn: A Rate-Distortion Approach »
Dilip Arumugam · Benjamin Van Roy -
2021 Spotlight: Deciding What to Learn: A Rate-Distortion Approach »
Dilip Arumugam · Benjamin Van Roy -
2020 Poster: Flexible and Efficient Long-Range Planning Through Curious Exploration »
Aidan Curtis · Minjian Xin · Dilip Arumugam · Kevin Feigelis · Daniel Yamins -
2018 Poster: Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Benjamin Van Roy -
2018 Oral: Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Benjamin Van Roy -
2017 Poster: Why is Posterior Sampling Better than Optimism for Reinforcement Learning? »
Ian Osband · Benjamin Van Roy -
2017 Talk: Why is Posterior Sampling Better than Optimism for Reinforcement Learning? »
Ian Osband · Benjamin Van Roy