Skip to yearly menu bar Skip to main content



Tutorials
Tutorial
Kamyar Azizzadenesheli

[ Hall A8 ]

Abstract

This tutorial will introduce neural operators, an extension of neural networks designed to learn mappings between infinite-dimensional function spaces. We'll cover the theoretical foundations, including their formulation and universal approximation capabilities. Emphasizing their discretization-invariance, we'll explore how neural operators tackle problems in partial differential equations (PDEs) and scientific computing tasks. This session is ideal for machine learning experts looking to leverage neural operators for advanced scientific and engineering applications.

Tutorial
Nir Rosenfeld

[ Straus 1-3 ]

Abstract

Machine learning is increasingly being used for tasks that require making predictions about humans. But humans are not your conventional input: they have goals, beliefs, and aspirations, and take action to promote their own interests. Given that standard learning methods are not designed to handle inputs that 'behave', a natural question is: how should we design learning systems when we know they will be deployed and used in social settings? This tutorial introduces strategic machine learning – a new and emerging subfield of machine learning that aims to develop a disciplined framework for learning under strategic user behavior. The working hypothesis of strategic ML is simple: users want things, and act to achieve them. Surprisingly, this basic truism is difficult to address within the conventional learning framework. The key challenge is that how users behave often depends on the learned decision rule itself; thus, strategic learning seeks to devise methods which are able to anticipate and accommodate such responsive behavior. Towards this, strategic ML offers a formalism for reasoning about strategic responses, for designing appropriate learning objectives, and for developing practical tools for learning in strategic environments. The tutorial will survey recent and ongoing work in this new domain, present …

Tutorial
Aleksander Madry · Andrew Ilyas · Logan Engstrom · Sung Min (Sam) Park · Kristian Georgiev

[ Hall A1 ]

Abstract

Data attribution is the study of the relationship between data and ML predictions. In downstream applications, data attribution methods can help interpret and compare models; curate datasets; and assess learning algorithm stability.

This tutorial surveys the field of data attribution, with a focus on what we call “predictive data attribution.” We first motivate this notion within a broad, purpose-based taxonomy of data attribution. Next, we highlight how one can view predictive data attribution through the lens of a classic statistical problem that we call “weighted refitting." We discuss why classical methods for solving the weighted refitting problem struggle when directly applied to large-scale machine learning settings (and thus cannot directly solve problems in modern contexts). With these shortcomings in mind, we overview recent progress on performing predictive data attribution for modern ML models. Finally, we conclude by discussing applications---current and future---of data attribution.

Tutorial
Margaux Zaffran · Aymeric Dieuleveut

[ Straus 1-3 ]

Abstract
Tutorial
Siddharth Joshi · Baharan Mirzasoleiman

[ Hall A8 ]

Abstract

Over the last decade, machine learning models have achieved remarkable success by learning from large amounts of data. This is best exemplified by the recent rise of foundation models that are trained on billions of examples. Training on massive data is, however, dependent on exceptionally large and expensive computational resources, and incurs substantial financial and environmental costs, due to the significant energy consumption. To reduce these costs, there has been a recent surge of interest in data-efficient learning techniques to train machine learning models on smaller subsets of carefully-chosen training examples. The field is, however, filled with many heuristics that seem contradictory at times, and is increasingly difficult and diverse to grasp for a non-informed audience. The goal of this tutorial will be to provide a unifying perspective, by discussing recent theoretically-rigorous approaches for data-efficient machine learning. We will discuss rigorous techniques for data-efficient supervised learning, and self-supervised contrastive pre-training. Then, we will focus on foundation models and discuss data selection for (pre-)training large vision-language models, such as CLIP. We will conclude by discussing challenge and providing guidelines for data-efficient training of large language models (LLMs).

Tutorial
Minjia Zhang · Yu Cheng · Tianlong Chen

[ Hall A1 ]

Abstract

Recently, Large Language Models (LLMs) have showcased remarkable generalization capabilities across a plethora of tasks, yielding notable successes. The scale of these models stands out as a pivotal determinant in enhancing LLM performance. However, the escalation in model size significantly amplifies the costs associated with both pre-training and fine-tuning, while simultaneously constraining inference speed. Consequently, there has been a surge in exploration aimed at devising novel techniques for model scaling. Among these, the sparse Mixture-of-Experts (MoE) has garnered considerable attention due to its ability to expedite pre-training and enhance inference speed, especially when compared to dense models with equivalent parameter counts. This tutorial endeavors to offer a comprehensive overview of MoE within the context of LLMs. The discussion commences by revisiting extant research on MoE, elucidating critical challenges encountered within this domain. Subsequent exploration delves into the intricate relationship between MoE and LLMs, encompassing sparse scaling of pre-training models and the conversion of existing dense models into sparse MoE counterparts. Moreover, the tutorial elucidates the broader advantages conferred by MoE beyond mere efficiency. Overall, this tutorial delineates the evolutionary trajectory of MoE within the landscape of LLMs, underscoring its pivotal role in the era of LLMs.

Tutorial
Subbarao Kambhampati

[ Lehar 1-4 ]

Abstract

"Large Language Models (LLMs, or n-gram models on steroids) that have been trained originally to generate text by repeatedly predicting the next word in the context of a window of previous words, have captured the attention of the AI (and the world) community. Part of the reason for this is their ability to produce meaningful completions for prompts relating to almost any area of human intellectual endeavors. This sheer versatility has also led to claims that these predictive text completion systems may be capable of abstract reasoning and planning. In this tutorial we take a critical look at the ability of LLMs to help in planning tasks–either in autonomous modes, or in assistive modes. The tutorial will both point out the fundamental limitations of LLMs in generating plans (especially those that will normally require resolving subgoal interactions with combinatorial search), and also show constructive ``LLM-Modulo'' uses as complementary technologies to the sound planners, plan verifiers, simulators, unit testers etc. In addition to presenting our own work in this area, we provide a critical survey of many related efforts.

Materials Link: https://yochan-lab.github.io/tutorial/LLMs-Planning/"

Tutorial
Zeyuan Allen-Zhu

[ Hall A1 ]

Abstract

We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand the theory and push the capabilities of LLMs to the extreme.

Unlike benchmarking, by controlling the synthetic data, we aim to discover universal laws of all LLMs, not just a specific version like GPT/Llama. By tweaking hyperparameters such as data amount, type, difficulty, and format, we determine factors affecting LLM performance and suggest improvements.

Unlike black-box training, we develop advanced probing techniques to examine the inner workings of LLMs and understand their hidden mental processes. This helps us gain a deeper understanding of how these AI models function and moves us closer to creating more powerful and transparent AI systems.

This talk will cover language structures (Part 1), reasoning (Part 2), and knowledge (Part 3). These sections explain why and how language models succeed or fail on certain AI tasks and provide practical suggestions for necessary changes to (1) model architecture, (2) data preparation, and (3) the training process to move us closer to AGI.

Tutorial
Adrián Arnaiz-Rodríguez · Ameya Velingker

[ Hall A8 ]

Abstract
Tutorial
Miroslav Dudik · Robert Schapire

[ Straus 1-3 ]

Abstract

"Optimization is centrally important to machine learning, including optimization of convex functions. A particular challenge arises, however, when the function being minimized, even if convex, has no finite minimizer, and so can only be minimized by a sequence as it heads to infinity, as can certainly occur in practice, for instance, when minimizing objectives based on log loss (or cross entropy). Analyzing statistical properties and proving the convergence of algorithms in such cases is considerably more difficult.

This tutorial presents a new theory for studying minimizers of convex functions at infinity, introducing astral space, an extension of Euclidean space that includes points at infinity, and that has many favorable properties. In extending convex analysis, astral space provides a mathematical foundation for the study of optimization algorithms when minimizers exist only at infinity. We will look at how some of the most important topics studied in convex analysis extend to astral space. We also look at applications of particular relevance to machine learning, such as in the analysis of descent methods."

Tutorial
Lintang Sutawika · Hailey Schoelkopf

[ Lehar 1-4 ]

Abstract

The field of machine learning relies on benchmarking and evaluation datasets to accurately track progress in the field and assess the efficacy of new models and methodologies. For this reason, good evaluation practices and accurate reporting are crucial. However, language models (LMs) not only inherit the challenges previously faced in benchmarking, but also introduce a slew of novel considerations which can make proper comparison across models difficult, misleading, or near-impossible. In this tutorial, we aim to bring attendees up to speed on the state of LM evaluation, and highlight current challenges in evaluating language model performance through discussing the various fundamental methods commonly associated with evaluating progress in language model research. We will then discuss how these common pitfalls can be addressed and what considerations should be taken to enhance future work, especially as we seek to evaluate ever more complex properties of LMs.