Tutorials
Arno Solin

Many ML tasks share practical goals and theoretical foundations with signal processing (consider, e.g., spectral and kernel methods, differential equation systems, sequential sampling techniques, and control theory). Signal processing methods are an integral part of many sub-fields in ML, with links to, for example, Reinforcement learning, Hamiltonian Monte Carlo, Gaussian process (GP) models, Bayesian optimization, and neural ODEs/SDEs.

This tutorials aims to cover aspects in machine learning that link to both discrete-time and continuous-time signal processing methods. Special focus is put on introducing stochastic differential equations (SDEs), state space models, and recursive estimation (Bayesian filtering and smoothing) for Gaussian process models. The goals are to (i) teach basic principles of direct links between signal processing and machine learning, (ii) provide an intuitive hands-on understanding of what stochastic differential equations are all about, (iii) show how these methods have real benefits in speeding up learning, improving inference, and model building—with illustrative and practical application examples. This is to show how ML can leverage existing theory to improve and accelerate research, and to provide a unifying overview to the ICML community members working in the intersection of these methods.

Mihaela van der Schaar

Medicine stands apart from other areas where machine learning (ML) can be applied. Where we have seen advances in other fields driven by lots of data, it is the complexity of medicine, not the volume of data, that makes the challenge so hard. But at the same time this makes medicine the most exciting area for anyone who is really interested in exploring the boundaries of ML, because we are given real-world problems to formalize and solve. And the solutions are ones that are societally important, and they potentially impact us all (just think COVID-19!).

ML has of course already achieved very impressive results in numerous areas. Standout examples include computer vision and image recognition, playing games or in teaching robots. AI empowered by ML is so good at mastering these things because they are easily-stated problems where the solutions are well-defined and easily verifiable. “Easily-stated problems” have a clear challenge to solve and clear rules to play by; “well-defined solutions,” fall into a easily recognizable class of answers; while a “verifiable solution” is one that we as humans can understand in terms of judging whether the model has succeeded or not. Unfortunately, in medicine the problems are not well-posed, …

S. M. Ali Eslami · Irina Higgins · Danilo J. Rezende

The field of representation learning without labels, also known as unsupervised or self-supervised learning, is seeing significant progress. New techniques have been put forward that approach or even exceed the performance of fully supervised techniques in large-scale and competitive benchmarks such as image classification, while also showing improvements in label-efficiency by multiple orders of magnitude. Representation learning without labels is therefore finally starting to address some of the major challenges in modern deep learning. To continue making progress, however, it is important to systematically understand the nature of the learnt representations and the learning objectives that give rise to them.

In this tutorial we will: - Provide a unifying overview of the state of the art in representation learning without labels, - Contextualise these methods through a number of theoretical lenses, including generative modelling, manifold learning and causality, - Argue for the importance of careful and systematic evaluation of representations and provide an overview of the pros and cons of current evaluation methods.

Francesco Orabona · Ashok Cutkosky

Classical stochastic optimization results typically assume known values for various properties of the data (e.g. Lipschitz constants, distance to an optimal point, smoothness or strong-convexity constants). Unfortunately, in practice these values are unknown, necessitating a long trial-and-error procedure to find the best parameters. To address this issue, in recent years a number of parameter-free algorithms have been developed for online optimization and for online learning. Parameter-free algorithms make no assumptions about the properties of the data and yet nevertheless converge just as fast as the optimally tuned algorithm. This is an exciting line of work that has now reached enough maturity to be taught to general audiences. Indeed,these algorithms have not received a proper introduction to the machine learning community and only a handful of people fully understand them. This tutorial aims at bridging this gap, presenting practice and theory for using and designing parameter-free algorithms. We will present the latest advancements in this field, including practical applications.

Elias Bareinboim

Causal inference provides a set of tools and principles that allows one to combine data and causal invariances about the environment to reason with questions of counterfactual nature -- i.e., what would have happened had reality been different, even when no data about this unrealized reality is available. Reinforcement Learning is concerned with efficiently finding a policy that optimizes a specific function (e.g., reward, regret) in interactive and uncertain environments. These two disciplines have evolved independently and with virtually no interaction between them. In fact, they operate over different aspects of the same building block, i.e., counterfactual relations, which makes them umbilically tied.

In this tutorial, we introduce a unified treatment putting these two disciplines under the same conceptual and theoretical umbrella. We show that a number of natural and pervasive classes of learning problems emerge when this connection is fully established, which cannot be seen individually from either discipline. In particular, we'll discuss generalized policy learning (a combination of online, off-policy, and do-calculus learning), where and where to intervene, counterfactual decision-making (and free-will, autonomy, Human-AI collaboration), police generalizability, causal imitation learning, among others. This new understanding leads to a broader view of what counterfactual learning is and suggests the …

Ilias Diakonikolas

One of the major recent advances in theoretical machine learning is the development of efficient learning algorithms for various high-dimensional statistical models. The Achilles heel of these algorithms is the assumption that the samples are precisely generated from the model. This assumption is crucial for the performance of these algorithms: even a very small fraction of outliers can completely compromise the algorithms' behavior.

Recent results in theoretical computer science have led to the development of the first computationally efficient robust estimators for a range of high-dimensional models. The goal of this tutorial is to introduce the machine learning community to the core insights and techniques in this area of algorithmic robust statistics, and discuss new directions and opportunities for future work.

Elaine Nsoesie

The tutorial will focus on digital epidemiology – the study of the patterns of disease and health, and the factors that influence these patterns using digital technology and data. We will discuss the use of digital data and machine learning for studying and improving health in different populations.

The tutorial has four parts:

Part 1: Introduction to Digital Epidemiology.

Part 2: The use of digital data and technology to study infectious disease outbreaks, chronic diseases and other conditions.

Part 3: The use of digital data and tools during the COVID-19 pandemic.

Part 4: Ethics, privacy and representation.

Hamed Hassani · Amin Karbasi

This tutorial will cover recent advancements in discrete optimization methods prevalent in large-scale machine learning problems. Traditionally, machine learning has been harnessing convex optimization to design fast algorithms with provable guarantees for a broad range of applications. In recent years, however, there has been a surge of interest in applications that involve discrete optimization. For discrete domains, the analog of convexity is considered to be submodularity, and the evolving theory of submodular optimization has been a catalyst for progress in extraordinarily varied application areas including active learning and experimental design, vision, sparse reconstruction, graph inference, video analysis, clustering, document summarization, object detection, information retrieval, network inference, interpreting neural network, and discrete adversarial attacks.

As applications and techniques of submodular optimization mature, a fundamental gap between theory and application emerges. In the past decade, paradigms such as large-scale learning, distributed systems, and sequential decision making have enabled a quantum leap in the performance of learning methodologies. Incorporating these paradigms in discrete problems has led to fundamentally new frameworks for submodular optimization. The goal of this tutorial is to cover rigorous and scalable foundations for discrete optimization in complex, dynamic environments, addressing challenges of scalability and uncertainty, and facilitating distributed and sequential …

Igor Mordatch · Jessica Hamrick

This tutorial presents a broad overview of the field of model-based reinforcement learning (MBRL), with a particular emphasis on deep methods. MBRL methods utilize a model of the environment to make decisions—as opposed to treating the environment as a black box—and present unique opportunities and challenges beyond model-free RL. We discuss methods for learning transition and reward models, ways in which those models can effectively be used to make better decisions, and the relationship between planning and learning. We also highlight ways that models of the world can be leveraged beyond the typical RL setting, and what insights might be drawn from human cognition when designing future MBRL systems.

Andrew Wilson

[ Virtual ]

Bayesian inference is especially compelling for deep neural networks. The key distinguishing property of a Bayesian approach is marginalization instead of optimization. Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for accuracy and calibration.

The tutorial has four parts:

Part 1: Introduction to Bayesian modelling and overview (Foundations, overview, Bayesian model averaging in deep learning, epistemic uncertainty, examples)

Part 2: The function-space view (Gaussian processes, infinite neural networks, training a neural network is kernel learning, Bayesian non-parametric deep learning)

Part 3: Practical methods for Bayesian deep learning (Loss landscapes, functional diversity in mode connectivity, SWAG, epistemic uncertainty, calibration, subspace inference, K-FAC Laplace, MC Dropout, stochastic MCMC, Bayes by Backprop, deep ensembles)

Part 4: Bayesian model construction and generalization (Deep ensembles, MultiSWAG, tempering, prior-specification, posterior contraction, re-thinking generalization, double descent, width-depth trade-offs, more!)