Workshop

Decision Awareness in Reinforcement Learning

Evgenii Nikishin · Pierluca D'Oro · Doina Precup · Andre Barreto · Amir-massoud Farahmand · Pierre-Luc Bacon

Hall G

Abstract:

The goal of reinforcement learning (RL) is to maximize a reward signal by taking optimal decisions. An RL system typically contains several moving components, possibly including a policy, a value function, and a model of the environment. We refer to decision awareness as the notion that each of the components and their combination should be explicitly trained to help the agent improve the total amount of collected reward. To better understand decision awareness, consider as an example a model-based method. For environments with rich observations (e.g., pixel-based), the world model is complex and standard approaches would need a large number of samples and a high-capacity function approximator to learn a reasonable approximation of the dynamics. However, a decision-aware agent might recognize that modeling all the granular complexity of the environment is neither feasible nor necessary to learn an optimal policy and instead focus on modeling aspects that are important for decision making. Decision awareness goes beyond the model learning aspect. In actor-critic algorithms, a critic is trained to predict the expected return while later used to aid policy learning. Is return prediction an optimal strategy for critic learning? And, in general, what is the best way to learn each component of an RL system? Our workshop aims at answering these questions and articulating that decision awareness might be a key towards solving grand challenges in RL, including exploration and sample efficiency. The workshop is about decision-aware RL algorithms, their implications, and real-world applications; we focus on decision-aware objectives, end-to-end procedures, and meta-learning techniques for training and discovering components in modular RL systems, as well as theoretical or empirical analyses of the interaction among multiple modules used by RL algorithms.

Chat is not available.
Timezone: America/Los_Angeles »

Schedule