ICML 2019 Tutorials

Silvia Chiappa · Jan Leike

[ Room 104 ]

As we are applying ML to more and more real-world tasks, we are moving toward a future in which ML will play an increasingly dominant role in society. Therefore addressing safety problems is becoming an increasingly pressing issue. Broadly speaking, we can classify current safety research into three areas: specification, robustness, and assurance. Specification focuses on investigating and developing techniques to alleviate undesired behaviors that systems might exhibit due to objectives that are only surrogates of desired ones. This can happen e.g. when training on a data set containing historical biases or when trying measuring progress of reinforcement learning agents in a real-world setting. Robustness deals with addressing system failures in extrapolating to new data and in responding to adversarial inputs. Assurance is concerned with developing methods that enable us to understand systems that are opaque and black-box in nature, and to control them during operation. This tutorial will give an overview of these three areas with a particular focus on specification, and more specifically on fairness and alignment of reinforcement learning agents. The goal is to stimulate discussion among researchers working on different areas of safety.

Recent Advances in Population-Based Search for Deep Neural Networks: Quality Diversity, Indirect Encodings, and Open-Ended Algorithms

Jeff Clune · Joel Lehman · Kenneth Stanley

[ Hall A ]

Abstract

We will cover new, exciting, unconventional techniques for improving population-based search. These ideas are already enabling us to solve hard problems. They also hold great promise for further advancing machine learning, including deep neural networks. Major topics covered include (1) explicitly searching for behavioral diversity (in a low-dimensional space where diversity is inherently interesting, such as the behavior of robots, rather than in the true search space, such as the weights of the DNN that controls the robot), especially Quality Diversity algorithms, which have produced state-of-the-art results in robotics and solved a version of the hard-exploration RL challenge of Montezuma’s Revenge; (2) open-ended search, wherein algorithms continually create new and increasingly complex capabilities without bound, for example by simultaneously inventing new challenges and their solutions; and (3) indirect encoding (e.g. HyperNEAT/HyperNetworks), wherein one network encodes how to construct a larger neural network or learning system. The idea is motivated by biological development, wherein a search in the space of a few thousand genes enables the specification of a trillion-connection brain and its learning algorithm. We conclude with a discussion on current and future hybrids of traditional machine learning with these ideas, including how augmenting meta-learning with them offers an alternative …

A Primer on PAC-Bayesian Learning

Benjamin Guedj · John Shawe-Taylor

[ Grand Ballroom ]

Abstract

Over the past few years, the PAC-Bayesian approach has been applied to numerous settings, including classification, high-dimensional sparse regression, image denoising and reconstruction of large random matrices, recommendation systems and collaborative filtering, binary ranking, online ranking, transfer learning, multiview learning, signal processing, to name but a few. The "PAC-Bayes" query on arXiv illustrates how PAC-Bayes is quickly re-emerging as a principled theory to efficiently address modern machine learning topics, such as leaning with heavy-tailed and dependent data, or deep neural networks generalisation abilities. This tutorial aims at providing the ICML audience with a comprehensive overview of PAC-Bayes, starting from statistical learning theory (complexity terms analysis, generalisation and oracle bounds) and covering algorithmic (actual implementation of PAC-Bayesian algorithms) developments, up to the most recent PAC-Bayesian analyses of deep neural networks generalisation abilities. We intend to address the largest audience, with an elementary background in probability theory and statistical learning, although all key concepts will be covered from scratch.

Never-Ending Learning

Tom Mitchell · Partha Talukdar

[ Hall B ]

Abstract

There exists a stark difference between today’s machine learning methods and the lifelong learning capabilities of humans. Humans learn many different functions and skills, from diverse experiences gained over many years, from a staged curriculum in which they first learn easier and later more difficult tasks, retain the learned knowledge and skills, which are used in subsequent learning to make it easier or more effective. Furthermore, humans self-reflect on their evolving skills, choose new learning tasks over time, teach one another, learn new representations, read books, discuss competing hypotheses, and more. This tutorial will focus on the question of how to design machine learning agents with similar capabilities. The tutorial will include research on topics such as reinforcement learning and other agent learning architectures, transfer and multi-task learning, representation learning, amortized learning, learning by natural language instruction and demonstration, learning from experimentation.

Active Learning: From Theory to Practice

Robert Nowak · Steve Hanneke

[ Hall B ]

Abstract

The field of Machine Learning has advanced considerably in recent years, but mostly in well-defined domains using huge amounts of human-labeled training data. Machines can recognize objects in images and translate text, but they must be trained with more images and text than a person can see in nearly a lifetime. Generating the necessary training data sets can require an enormous human effort. Active ML aims to address this issue by designing learning algorithms that automatically and adaptively select the most informative data for labeling so that human time is not wasted labeling irrelevant, redundant, or trivial examples. This tutorial will overview applications and provide an introduction to basic theory and algorithms for active machine learning. It will particularly focus on provably sound active learning algorithms and quantify the reduction of labeled training data required for learning.

Meta-Learning: from Few-Shot Learning to Rapid Reinforcement Learning

Chelsea Finn · Sergey Levine

[ Hall A ]

Abstract

Tl;dr: We will provide a unified perspective of how a variety of meta-learning algorithms enable learning from small datasets, an overview of applications where meta-learning can and cannot be easily applied, and a discussion of the outstanding challenges and frontiers of this sub-field. Abstract: In recent years, high-capacity models, such as deep neural networks, have enabled very powerful machine learning techniques in domains where data is plentiful. However, domains where data is scarce have proven challenging for such methods because high-capacity function approximators critically rely on large datasets for generalization. This can pose a major challenge for domains ranging from supervised medical image processing to reinforcement learning where real-world data collection (e.g., for robots) poses a major logistical challenge. Meta-learning or few-shot learning offers a potential solution to this problem: by learning to learn across data from many previous tasks, few-shot meta-learning algorithms can discover the structure among tasks to enable fast learning of new tasks. The objective of this tutorial is to provide a unified perspective of meta-learning: teaching the audience about modern approaches, describing the conceptual and theoretical principles surrounding these techniques, presenting where these methods have been applied previously, and discussing the fundamental open problems and challenges …

Neural Approaches to Conversational AI

Michel Galley · Jianfeng Gao

[ Grand Ballroom ]

Abstract

Developing an intelligent dialogue system that not only emulates human conversation, but also can answer questions of topics ranging from latest news of a movie star to Einstein's theory of relativity, and fulfill complex tasks such as travel planning, has been one of the longest running goals in AI. The goal has remained elusive until recently. We are now observing promising results both in academia and industry, as large amounts of conversational data become available for training, and the breakthroughs in deep learning (DL) and reinforcement learning (RL) are applied to conversational AI. In this tutorial, we start with a brief introduction to the recent progress on DL and RL that is related to conversational AI. Then, we describe in detail the state-of-the-art neural approaches developed for three types of dialogue systems, or bots. The first is a question answering (QA) bot. Equipped with rich knowledge drawn from various data sources including Web documents and pre-complied knowledge graphs (KG's), the QA bot can provide concise direct answers to user queries. The second is a task-oriented dialogue system that can help users accomplish tasks ranging from meeting scheduling to vacation planning. The third is a social chat chatbot which can converse …

Causal Inference and Stable Learning

Tong Zhang · Peng Cui

[ Room 104 ]

Abstract

Predicting future outcome values based on their observed features using a model estimated on a training data set in a common machine learning problem. Many learning algorithms have been proposed and shown to be successful when the test data and training data come from the same distribution. However, the best-performing models for a given distribution of training data typically exploit subtle statistical relationships among features, making them potentially more prone to prediction error when applied to test data whose distribution differs from that in training data. How to develop learning models that are stable and robust to shifts in data is of paramount importance for both academic research and real applications. Causal inference, which refers to the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect, is a powerful statistical modeling tool for explanatory and stable learning. In this tutorial, we focus on causal inference and stable learning, aiming to explore causal knowledge from observational data to improve the interpretability and stability of machine learning algorithms. First, we will give introduction to causal inference and introduce some recent data-driven approaches to estimate causal effect from observational data, especially in high …

Algorithm configuration: learning in the space of algorithm designs

Kevin Leyton-Brown · Frank Hutter

[ Grand Ballroom ]

Abstract

This tutorial surveys work at a new frontier of machine learning, where each point in the hypothesis space corresponds to an algorithm, such as a combinatorial optimization problem solver. Much of this work falls under the umbrella of so-called \emph{algorithm configuration}; it also draws on methods from bandits, Bayesian optimization, reinforcement learning, and more. The tutorial will begin by explaining the area, describing some recent success stories, and giving a broad overview of related work from across the machine learning community and beyond. Then, we will focus on the algorithm configuration problem and how to solve it based on extensions of Bayesian optimization and bandits. We will also survey a wide range of other methods based on stochastic local search, algorithm portfolios, and more. Throughout, we will emphasize big picture ideas, motivational case studies, and core methodological innovations. We will conclude by surveying important open problems and exciting initial results from the broader community that offer potential ways forward.

A Tutorial on Attention in Deep Learning

Alex Smola · Aston Zhang

[ Hall A ]

Abstract

Attention is a key mechanism to enable nonparametric models in deep learning. Quite arguably it is the basis of most recent progress in deep learning models. Beyond its introduction in neural machine translation, it can be traced back to neuroscience. It was arguably introduced via the gating or forgetting mechanism of LSTMs. Over the past 5 years attention has been key to advancing the state of the art in areas as diverse as natural language processing, computer vision, speech recognition, image synthesis, solving traveling salesman problems, or reinforcement learning. This tutorial offers a coherent overview over various types of attention; efficient implementation using Jupyter notebooks which allow the audience a hands-on experience to replicate and apply attention mechanisms; and a textbook (www.d2l.ai) to allow the audience to dive more deeply into the underlying theory.

Active Hypothesis Testing: An Information Theoretic (re)View

Tara Javidi

[ Hall B ]

Abstract

This tutorial revisits the problem of active hypothesis testing: a classical problem in statistics in which a decision maker is responsible to actively and dynamically collect data/samples so as to enhance the information about an underlying phenomena of interest while accounting for the cost of communication, sensing, or data collection. The decision maker must rely on the current information state to constantly (re-)evaluate the trade-off between the precision and the cost of various actions. This tutorial explores an often overlooked connection between active hypothesis testing and feedback information theory. This connection, we argue, has significant implications for next generation of information acquisition and machine learning algorithms where data is collected actively and/or by cooperative yet local agents.

In the first part of the talk, we discuss the history of active hypothesis testing (and experiment design) in statistics and the seminal contributions by Blackwell, Chernoff, De Groot, and Stein. In the second part of the talk, we discuss the information theoretic notions of acquisition rate and reliability (and their fundamental trade-off) as well as Extrinsic Jensen-Shannon divergence. We also discuss a class of algorithms based on posterior matching, a capacity-achieving feedback scheme for channel coding. We will illustrate the utility of …