As we are applying ML to more and more real-world tasks, we are moving toward a future in which ML will play an increasingly dominant role in society. Therefore addressing safety problems is becoming an increasingly pressing issue. Broadly speaking, we can classify current safety research into three areas: specification, robustness, and assurance. Specification focuses on investigating and developing techniques to alleviate undesired behaviors that systems might exhibit due to objectives that are only surrogates of desired ones. This can happen e.g. when training on a data set containing historical biases or when trying measuring progress of reinforcement learning agents in a real-world setting. Robustness deals with addressing system failures in extrapolating to new data and in responding to adversarial inputs. Assurance is concerned with developing methods that enable us to understand systems that are opaque and black-box in nature, and to control them during operation. This tutorial will give an overview of these three areas with a particular focus on specification, and more specifically on fairness and alignment of reinforcement learning agents. The goal is to stimulate discussion among researchers working on different areas of safety.
Live content is unavailable. Log in and register to view live content