As we are applying ML to more and more real-world tasks, we are moving toward a future in which ML will play an increasingly dominant role in society. Therefore addressing safety problems is becoming an increasingly pressing issue. Broadly speaking, we can classify current safety research into three areas: specification, robustness, and assurance. Specification focuses on investigating and developing techniques to alleviate undesired behaviors that systems might exhibit due to objectives that are only surrogates of desired ones. This can happen e.g. when training on a data set containing historical biases or when trying measuring progress of reinforcement learning agents in a real-world setting. Robustness deals with addressing system failures in extrapolating to new data and in responding to adversarial inputs. Assurance is concerned with developing methods that enable us to understand systems that are opaque and black-box in nature, and to control them during operation. This tutorial will give an overview of these three areas with a particular focus on specification, and more specifically on fairness and alignment of reinforcement learning agents. The goal is to stimulate discussion among researchers working on different areas of safety.
Silvia Chiappa (DeepMind)
Silvia Chiappa is Research Scientist in Machine Learning at DeepMind. She holds a Diploma di Laurea in Mathematics and a PhD in Machine Learning. Before joining DeepMind, Silvia worked at the Empirical Inference Department of the Max-Planck Institute for Intelligent Systems, at the Machine Intelligence and Perception Group of Microsoft Research Cambridge, and at the Statistical Laboratory of the University of Cambridge. Her research interests are based around Bayesian & causal reasoning, graphical models, variational inference, time-series models, and ML fairness and bias.
Jan Leike (DeepMind)
Until recently Jan was a Senior Research Scientist at DeepMind where he studied the agent alignment problem. He holds a PhD in computer science from the Australian National University where he worked on theoretical reinforcement learning. Before joining DeepMind, he was a Postdoctoral researcher at the University of Oxford. Jan’s research interests are in AI safety, reinforcement learning, and technical AI governance.