Session
Reinforcement Learning 7
Machine Theory of Mind
Neil Rabinowitz · Frank Perbet · Francis Song · Chiyuan Zhang · S. M. Ali Eslami · Matthew Botvinick
Theory of mind (ToM) broadly refers to humans' ability to represent the mental states of others, including their desires, beliefs, and intentions. We design a Theory of Mind neural network – a ToMnet – which uses meta-learning to build such models of the agents it encounters. The ToMnet learns a strong prior model for agents’ future behaviour, and, using only a small number of behavioural observations, can bootstrap to richer predictions about agents’ characteristics and mental states. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep RL agents from varied populations, and that it passes classic ToM tasks such as the "Sally-Anne" test of recognising that others can hold false beliefs about the world.
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement
Andre Barreto · Diana Borsa · John Quan · Tom Schaul · David Silver · Matteo Hessel · Daniel J. Mankowitz · Augustin Zidek · Remi Munos
The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SF&GPI framework in two ways. One of the basic assumptions underlying the original formulation of SF&GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SF&GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SF&GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent's set of skills, and thus be reused in the future.
Been There, Done That: Meta-Learning with Episodic Recall
Samuel Ritter · Jane Wang · Zeb Kurth-Nelson · Siddhant Jayakumar · Charles Blundell · Razvan Pascanu · Matthew Botvinick
Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur – as they do in natural environments – meta-learning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious environments, then develop a meta-learning architecture for solving these environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of agents with this episodic LSTM in five meta-learning environments with reoccurring tasks, ranging from bandits to navigation and stochastic sequential decision problems.
Continual Reinforcement Learning with Complex Synapses
Christos Kaplanis · Murray Shanahan · Claudia Clopath
Unlike humans, who are capable of continuallearning over their lifetimes, artificial neural networkshave long been known to suffer from aphenomenon known as catastrophic forgetting,whereby new learning can lead to abrupt erasureof previously acquired knowledge. Whereas in aneural network the parameters are typically modelledas scalar values, an individual synapse inthe brain comprises a complex network of interactingbiochemical components that evolve at differenttimescales. In this paper, we show that byequipping tabular and deep reinforcement learningagents with a synaptic model that incorporatesthis biological complexity (Benna & Fusi, 2016),catastrophic forgetting can be mitigated at multipletimescales. In particular, we find that as wellas enabling continual learning across sequentialtraining of two simple tasks, it can also be used toovercome within-task forgetting by reducing theneed for an experience replay database.