Tutorials

Medicine stands apart from other areas where machine learning (ML) can be applied. Where we have seen advances in other fields driven by lots of data, it is the complexity of medicine, not the volume of data, that makes the challenge so hard. But at the same time this makes medicine the most exciting area for anyone who is really interested in exploring the boundaries of ML, because we are given real-world problems to formalize and solve. And the solutions are ones that are societally important, and they potentially impact us all (just think COVID-19!).

ML has of course already achieved very impressive results in numerous areas. Standout examples include computer vision and image recognition, playing games or in teaching robots. AI empowered by ML is so good at mastering these things because they are easily-stated problems where the solutions are well-defined and easily verifiable. “Easily-stated problems” have a clear challenge to solve and clear rules to play by; “well-defined solutions,” fall into a easily recognizable class of answers; while a “verifiable solution” is one that we as humans can understand in terms of judging whether the model has succeeded or not. Unfortunately, in medicine the problems are not well-posed, …

Many ML tasks share practical goals and theoretical foundations with signal processing (consider, e.g., spectral and kernel methods, differential equation systems, sequential sampling techniques, and control theory). Signal processing methods are an integral part of many sub-fields in ML, with links to, for example, Reinforcement learning, Hamiltonian Monte Carlo, Gaussian process (GP) models, Bayesian optimization, and neural ODEs/SDEs.

This tutorials aims to cover aspects in machine learning that link to both discrete-time and continuous-time signal processing methods. Special focus is put on introducing stochastic differential equations (SDEs), state space models, and recursive estimation (Bayesian filtering and smoothing) for Gaussian process models. The goals are to (i) teach basic principles of direct links between signal processing and machine learning, (ii) provide an intuitive hands-on understanding of what stochastic differential equations are all about, (iii) show how these methods have real benefits in speeding up learning, improving inference, and model building—with illustrative and practical application examples. This is to show how ML can leverage existing theory to improve and accelerate research, and to provide a unifying overview to the ICML community members working in the intersection of these methods.

The field of representation learning without labels, also known as unsupervised or self-supervised learning, is seeing significant progress. New techniques have been put forward that approach or even exceed the performance of fully supervised techniques in large-scale and competitive benchmarks such as image classification, while also showing improvements in label-efficiency by multiple orders of magnitude. Representation learning without labels is therefore finally starting to address some of the major challenges in modern deep learning. To continue making progress, however, it is important to systematically understand the nature of the learnt representations and the learning objectives that give rise to them.

In this tutorial we will: - Provide a unifying overview of the state of the art in representation learning without labels, - Contextualise these methods through a number of theoretical lenses, including generative modelling, manifold learning and causality, - Argue for the importance of careful and systematic evaluation of representations and provide an overview of the pros and cons of current evaluation methods.

The tutorial will focus on digital epidemiology – the study of the patterns of disease and health, and the factors that influence these patterns using digital technology and data. We will discuss the use of digital data and machine learning for studying and improving health in different populations.

The tutorial has four parts:

Part 1: Introduction to Digital Epidemiology.

Part 2: The use of digital data and technology to study infectious disease outbreaks, chronic diseases and other conditions.

Part 3: The use of digital data and tools during the COVID-19 pandemic.

Part 4: Ethics, privacy and representation.

Causal inference provides a set of tools and principles that allows one to combine data and causal invariances about the environment to reason with questions of counterfactual nature -- i.e., what would have happened had reality been different, even when no data about this unrealized reality is available. Reinforcement Learning is concerned with efficiently finding a policy that optimizes a specific function (e.g., reward, regret) in interactive and uncertain environments. These two disciplines have evolved independently and with virtually no interaction between them. In fact, they operate over different aspects of the same building block, i.e., counterfactual relations, which makes them umbilically tied.

In this tutorial, we introduce a unified treatment putting these two disciplines under the same conceptual and theoretical umbrella. We show that a number of natural and pervasive classes of learning problems emerge when this connection is fully established, which cannot be seen individually from either discipline. In particular, we'll discuss generalized policy learning (a combination of online, off-policy, and do-calculus learning), where and where to intervene, counterfactual decision-making (and free-will, autonomy, Human-AI collaboration), police generalizability, causal imitation learning, among others. This new understanding leads to a broader view of what counterfactual learning is and suggests the …

One of the major recent advances in theoretical machine learning is the development of efficient learning algorithms for various high-dimensional statistical models. The Achilles heel of these algorithms is the assumption that the samples are precisely generated from the model. This assumption is crucial for the performance of these algorithms: even a very small fraction of outliers can completely compromise the algorithms' behavior.

Recent results in theoretical computer science have led to the development of the first computationally efficient robust estimators for a range of high-dimensional models. The goal of this tutorial is to introduce the machine learning community to the core insights and techniques in this area of algorithmic robust statistics, and discuss new directions and opportunities for future work.

[ Virtual ]

Bayesian inference is especially compelling for deep neural networks. The key distinguishing property of a Bayesian approach is marginalization instead of optimization. Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for accuracy and calibration.

The tutorial has four parts:

Part 1: Introduction to Bayesian modelling and overview (Foundations, overview, Bayesian model averaging in deep learning, epistemic uncertainty, examples)

Part 2: The function-space view (Gaussian processes, infinite neural networks, training a neural network is kernel learning, Bayesian non-parametric deep learning)

Part 3: Practical methods for Bayesian deep learning (Loss landscapes, functional diversity in mode connectivity, SWAG, epistemic uncertainty, calibration, subspace inference, K-FAC Laplace, MC Dropout, stochastic MCMC, Bayes by Backprop, deep ensembles)

Part 4: Bayesian model construction and generalization (Deep ensembles, MultiSWAG, tempering, prior-specification, posterior contraction, re-thinking generalization, double descent, width-depth trade-offs, more!)

This tutorial will cover recent advancements in discrete optimization methods prevalent in large-scale machine learning problems. Traditionally, machine learning has been harnessing convex optimization to design fast algorithms with provable guarantees for a broad range of applications. In recent years, however, there has been a surge of interest in applications that involve discrete optimization. For discrete domains, the analog of convexity is considered to be submodularity, and the evolving theory of submodular optimization has been a catalyst for progress in extraordinarily varied application areas including active learning and experimental design, vision, sparse reconstruction, graph inference, video analysis, clustering, document summarization, object detection, information retrieval, network inference, interpreting neural network, and discrete adversarial attacks.

As applications and techniques of submodular optimization mature, a fundamental gap between theory and application emerges. In the past decade, paradigms such as large-scale learning, distributed systems, and sequential decision making have enabled a quantum leap in the performance of learning methodologies. Incorporating these paradigms in discrete problems has led to fundamentally new frameworks for submodular optimization. The goal of this tutorial is to cover rigorous and scalable foundations for discrete optimization in complex, dynamic environments, addressing challenges of scalability and uncertainty, and facilitating distributed and sequential …