ICML 2024 Tutorials

Tutorial

Aleksander Madry · Andrew Ilyas · Logan Engstrom · Sung Min (Sam) Park · Kristian Georgiev

[ Hall A1 ]

Abstract

Data attribution is the study of the relationship between data and ML predictions. In downstream applications, data attribution methods can help interpret and compare models; curate datasets; and assess learning algorithm stability.

This tutorial surveys the field of data attribution, with a focus on what we call “predictive data attribution.” We first motivate this notion within a broad, purpose-based taxonomy of data attribution. Next, we highlight how one can view predictive data attribution through the lens of a classic statistical problem that we call “weighted refitting." We discuss why classical methods for solving the weighted refitting problem struggle when directly applied to large-scale machine learning settings (and thus cannot directly solve problems in modern contexts). With these shortcomings in mind, we overview recent progress on performing predictive data attribution for modern ML models. Finally, we conclude by discussing applications---current and future---of data attribution.

Neural Operator Learning

Tutorial

Kamyar Azizzadenesheli

[ Hall A8 ]

Abstract

This tutorial will introduce neural operators, an extension of neural networks designed to learn mappings between infinite-dimensional function spaces. We'll cover the theoretical foundations, including their formulation and universal approximation capabilities. Emphasizing their discretization-invariance, we'll explore how neural operators tackle problems in partial differential equations (PDEs) and scientific computing tasks. This session is ideal for machine learning experts looking to leverage neural operators for advanced scientific and engineering applications.

Towards Efficient Generative Large Language Model Serving: A Tutorial from Algorithms to Systems

Tutorial

Xupeng Miao · Zhihao Jia

[ Lehar 1-4 ]

Abstract

Strategic ML: How to Learn With Data That ‘Behaves’

Tutorial

Nir Rosenfeld

[ Straus 1-3 ]

Abstract

Machine learning is increasingly being used for tasks that require making predictions about humans. But humans are not your conventional input: they have goals, beliefs, and aspirations, and take action to promote their own interests. Given that standard learning methods are not designed to handle inputs that 'behave', a natural question is: how should we design learning systems when we know they will be deployed and used in social settings? This tutorial introduces strategic machine learning – a new and emerging subfield of machine learning that aims to develop a disciplined framework for learning under strategic user behavior. The working hypothesis of strategic ML is simple: users want things, and act to achieve them. Surprisingly, this basic truism is difficult to address within the conventional learning framework. The key challenge is that how users behave often depends on the learned decision rule itself; thus, strategic learning seeks to devise methods which are able to anticipate and accommodate such responsive behavior. Towards this, strategic ML offers a formalism for reasoning about strategic responses, for designing appropriate learning objectives, and for developing practical tools for learning in strategic environments. The tutorial will survey recent and ongoing work in this new domain, present …

Distribution-Free Predictive Uncertainty Quantification: Strengths and Limits of Conformal Prediction

Tutorial

Margaux Zaffran · Aymeric Dieuleveut

[ Straus 1-3 ]

Abstract

Foundations of Data-efficient Machine Learning

Tutorial

Siddharth Joshi · Baharan Mirzasoleiman

[ Hall A8 ]

Abstract

Over the last decade, machine learning models have achieved remarkable success by learning from large amounts of data. This is best exemplified by the recent rise of foundation models that are trained on billions of examples. Training on massive data is, however, dependent on exceptionally large and expensive computational resources, and incurs substantial financial and environmental costs, due to the significant energy consumption. To reduce these costs, there has been a recent surge of interest in data-efficient learning techniques to train machine learning models on smaller subsets of carefully-chosen training examples. The field is, however, filled with many heuristics that seem contradictory at times, and is increasingly difficult and diverse to grasp for a non-informed audience. The goal of this tutorial will be to provide a unifying perspective, by discussing recent theoretically-rigorous approaches for data-efficient machine learning. We will discuss rigorous techniques for data-efficient supervised learning, and self-supervised contrastive pre-training. Then, we will focus on foundation models and discuss data selection for (pre-)training large vision-language models, such as CLIP. We will conclude by discussing challenge and providing guidelines for data-efficient training of large language models (LLMs).

Understanding the Role of Large Language Models in Planning

Tutorial

Subbarao Kambhampati

[ Lehar 1-4 ]

Abstract

"Large Language Models (LLMs, or n-gram models on steroids) that have been trained originally to generate text by repeatedly predicting the next word in the context of a window of previous words, have captured the attention of the AI (and the world) community. Part of the reason for this is their ability to produce meaningful completions for prompts relating to almost any area of human intellectual endeavors. This sheer versatility has also led to claims that these predictive text completion systems may be capable of abstract reasoning and planning. In this tutorial we take a critical look at the ability of LLMs to help in planning tasks–either in autonomous modes, or in assistive modes. The tutorial will both point out the fundamental limitations of LLMs in generating plans (especially those that will normally require resolving subgoal interactions with combinatorial search), and also show constructive LLM-Modulo'' uses as complementary technologies to the sound planners, plan verifiers, simulators, unit testers etc. In addition to presenting our own work in this area, we provide a critical survey of many related efforts.

Materials Link: https://yochan-lab.github.io/tutorial/LLMs-Planning/"

Graph Learning: Principles, Challenges, and Open Directions

Tutorial

Adrián Arnaiz-Rodríguez · Ameya Velingker

[ Hall A8 ]

Abstract

Physics of Language Models

Tutorial

Zeyuan Allen-Zhu

[ Hall A1 ]

Abstract

We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand the theory and push the capabilities of LLMs to the extreme.

Unlike benchmarking, by controlling the synthetic data, we aim to discover universal laws of all LLMs, not just a specific version like GPT/Llama. By tweaking hyperparameters such as data amount, type, difficulty, and format, we determine factors affecting LLM performance and suggest improvements.

Unlike black-box training, we develop advanced probing techniques to examine the inner workings of LLMs and understand their hidden mental processes. This helps us gain a deeper understanding of how these AI models function and moves us closer to creating more powerful and transparent AI systems.

This talk will cover language structures (Part 1), reasoning (Part 2), and knowledge (Part 3). These sections explain why and how language models succeed or fail on certain AI tasks and provide practical suggestions for necessary changes to (1) model architecture, (2) data preparation, and (3) the training process to move us closer to AGI.

Challenges in Language Model Evaluations

Tutorial

Lintang Sutawika · Hailey Schoelkopf

[ Lehar 1-4 ]

Abstract

The field of machine learning relies on benchmarking and evaluation datasets to accurately track progress in the field and assess the efficacy of new models and methodologies. For this reason, good evaluation practices and accurate reporting are crucial. However, language models (LMs) not only inherit the challenges previously faced in benchmarking, but also introduce a slew of novel considerations which can make proper comparison across models difficult, misleading, or near-impossible. In this tutorial, we aim to bring attendees up to speed on the state of LM evaluation, and highlight current challenges in evaluating language model performance through discussing the various fundamental methods commonly associated with evaluating progress in language model research. We will then discuss how these common pitfalls can be addressed and what considerations should be taken to enhance future work, especially as we seek to evaluate ever more complex properties of LMs.

Convex Analysis at Infinity: An Introduction to Astral Space

Tutorial

Miroslav Dudik · Robert Schapire

[ Straus 1-3 ]

Abstract

"Optimization is centrally important to machine learning, including optimization of convex functions. A particular challenge arises, however, when the function being minimized, even if convex, has no finite minimizer, and so can only be minimized by a sequence as it heads to infinity, as can certainly occur in practice, for instance, when minimizing objectives based on log loss (or cross entropy). Analyzing statistical properties and proving the convergence of algorithms in such cases is considerably more difficult.

This tutorial presents a new theory for studying minimizers of convex functions at infinity, introducing astral space, an extension of Euclidean space that includes points at infinity, and that has many favorable properties. In extending convex analysis, astral space provides a mathematical foundation for the study of optimization algorithms when minimizers exist only at infinity. We will look at how some of the most important topics studied in convex analysis extend to astral space. We also look at applications of particular relevance to machine learning, such as in the analysis of descent methods."