ICML 2026 Tutorials

Tutorial

Diffusion and Flow-Matching: From Memorization to Generalization & Beyond

Mathurin Massias ⋅ Quentin Bertrand

Jul 6, 9:00 AM - 11:30 AM HALL D1

View full details

Tutorial

Probabilistic Numerics — Computation is Machine Learning

Philipp Hennig ⋅ Marvin Pförtner ⋅ Tim Weiland

Jul 6, 9:00 AM - 11:30 AM HALL D2

Machine learning is the process of estimating latent representations or variables from *finite data*. If the data is insufficient, this inference process leaves a finite *estimation error*. Probabilistic (Bayesian) machine learning attempts to capture this empirical uncertainty in a probability distribution.

But what actually happens inside of a Learning Machine, the computational side of ML, is invariably the solution of a *numerical problem*: *Optimisation* for deep learning, solving *differential equations* for diffusion, flow matching, and scientific simulation, or even just (large-scale, approximate) numerical *linear algebra*. These numerical tasks have no analytic solution in reach. The computational resources are insufficient, and so the computation leaves a finite *computational error*. **Probabilistic numerical methods attempt to capture this computational uncertainty in a probability distribution.**

By matching the mathematical modelling language of the empirical and the computational side of machine learning in this way, probabilistic numerical methods open new opportunities for computational savings, and new functionality in the ML stack: Computational and data uncertainty can be controlled in relation to each other, and information from data can flow "backwards" through a computation to solve inverse problems. A growing research community within ML is developing this toolchain, typically by building on established, highly efficient, classic numerical methods.

The tutorial is split in three parts. We will start with a simple worked example to establish key concepts and patterns. A second part will generalise these insights into a design pattern across a large class of numerical tasks. Finally, a hands-on code demo will demonstrate how probabilistic numerical methods work in practice.

View full details

Tutorial

Unifying Attention and Diffusion with Kan Extension Transformers: Structured Deep Learning with Diagrammatic Backpropagation

Sridhar Mahadevan

Jul 6, 9:00 AM - 11:30 AM HALL C

Modern foundation models are powerful, but their representations, training dynamics, and agentic workflows remain difficult to audit, compose, and trust. This tutorial presents a categorical and geometric framework for trustworthy foundation-model systems. The major scientific components of the tutorial include

- **Diagrammatic Backpropagation** (DB), which generalizes deep learning to include curvature loss function over categorical diagrams

- **Infinitesimal Causality** (IC), which generalizes the chain rule in calculus to functors in tangent categories

- **Kan Extension Transformers** (KET), which define a structured computation substrate, unifying attention and diffusion, and providing a universal machine learning framework for mapping finite experience into infinite futures

- **Universal Decision Learning** (UDL), which is a rigorous categorical framework for building foundries, or building blocks of foundation models

- **Lie-algebra based neural adapters** (ALLORA), which shows how to compose LoRa adapters by detecting non-commutativity using Lie-Brackets

- **Agentic skill optimization using Lie Algebroids**(LASKO), which formalizes optimization over tangent Markdown categories

- **Odyssey**: a demonstration system for automatic foundry construction.

The tutorial is designed as a conceptual 2.5-hour overview. Technical details are deferred to associated arXiv papers and the *Categories for AGI* book. Participants will leave with a solid understanding of a powerful categorical and geometric design language for foundation-model systems that learn locally, transfer cautiously, expose obstructions, and glue global conclusions only when the evidence permits.

View full details

Tutorial

Unlearning Data at Scale

Vinith Suriyakumar ⋅ Gautam Kamath ⋅ Ashia Wilson

Jul 6, 9:00 AM - 11:30 AM AUDITORIUM

As powerful generative models like large language models and diffusion models become widespread, they raise growing concerns about privacy, copyright, and safety, especially since they are trained on large amounts of web data that may include sensitive or harmful information. Machine unlearning, which aims to remove the influence of specific data or suppress unwanted model outputs, offers a promising solution to these issues. This tutorial introduces the motivations behind unlearning and explains how it is formally defined and measured to remove the influence of specific examples. We then cover core algorithmic techniques, from provable methods in simple settings to practical approaches for large-scale models. The tutorial concludes with a discussion of current limitations and future directions.

View full details

Tutorial

Proving Theorems with Lean and Machine Learning

Rémy Degenne ⋅ Wenda Li

Jul 6, 9:00 AM - 11:30 AM HALL B2

AI agents can now write mathematics, including proofs of theorems relevant to Machine Learning, but we can’t trust them yet. Subtle errors might be hidden deep in the reasoning steps, and checking the proofs manually takes a lot of time and expertise.
The Lean theorem prover provides a way to write formal, machine-checkable proofs, giving us high confidence in their correctness. AI systems have managed to reach gold medal level at the International Mathematical Olympiad while producing Lean-checked proofs. Could we get them to write research-level, verified mathematics?

In this tutorial, we introduce Lean and its mathematical library Mathlib, and show how they can be used to write trusted proofs, in particular machine learning theory proofs. We then show how machine learning can help with theorem proving, and present recent advances in AI-assisted formalization.

View full details

Tutorial

New Techniques for Sequence Prediction: Spectral Filtering and Preconditioning

Elad Hazan ⋅ Annie Marsden

Jul 6, 1:30 PM - 4:00 PM HALL D2

View full details

Tutorial

Calibration, Decisions, and Collaboration in Learning

Aaron Roth ⋅ Natalie Collina ⋅ Ira Globus-Harris

Jul 6, 1:30 PM - 4:00 PM AUDITORIUM

In this tutorial we will learn about a powerful framework to make probabilistic predictions in ways that "look like real probabilities" in all of the ways that matter for downstream applications. We'll see how to do this efficiently even in difficult, adversarial environments, and then focus on two concrete applications.
First we'll see how to make predictions that are "trustworthy" for downstream decision makers. Many downstream decision makers, each with different objectives and actions, will be able to act optimally as if our predictions are correct, and get strong guarantees about their performance. Next, we'll see how to make predictions that allow for efficient collaboration between two differently informed parties, like an AI and a human user, who can't easily share their observations, while still obtaining the complementary benefits of their individual knowledge. We'll end with a quick survey of many other applications of this technique.

View full details

Tutorial

Adaptive Reasoning in LLMs: From Post-Training to Test-Time Learning (partially remote)

Akhil Arora ⋅ Vishrav Chaudhary

Jul 6, 1:30 PM - 4:00 PM HALL C

Large language models are increasingly deployed in settings that require extended, multi-step reasoning, where a single forward pass is often insufficient. In response, recent research has explored a range of mechanisms that adapt model behavior beyond pretraining, including post-training refinement, test-time training, and agentic inference strategies. While these approaches have shown promise, they are often studied in isolation, making it difficult to understand their shared structure, trade-offs, and failure modes.

This tutorial presents a unified view of these methods through the lens of adaptive reasoning control loops. We examine how LLMs iteratively generate reasoning traces, receive structured feedback, and adapt their behavior during inference, and how post-training, test-time learning, and agentic systems can all be understood as different instantiations of this loop. Through conceptual explanations and illustrative case studies, accompanied by optional executable examples, participants will gain a principled understanding of when adaptive control improves reasoning, where it breaks down, and what this implies for evaluation, robustness, and deployment.

View full details

Tutorial

Evaluating and Training LLMs for Math Copilots and Theorem Proving

Simon Frieder ⋅ Philip Vonderlind

Jul 6, 1:30 PM - 4:00 PM HALL B2

View full details

Tutorial

Is numerical optimization theory irrelevant to machine learning practice in 2026?

Mark Schmidt

Jul 6, 1:30 PM - 4:00 PM HALL D1

We are seeing more numerical optimization theory papers published than ever before. These papers often make unrealistic assumptions or propose algorithms that never get adopted. So is all this optimization theory largely useless?

In this tutorial I show how some surprisingly simple optimization ideas can explain a wide variety of the implementation choices we make when training modern deep learning models. Some of these ideas might have let us skip some generations of grad-student descent, or have led to state-of-the-art tricks in modern architectures. On the other hand, I will highlight how some important practical ideas are not explained by optimization theory and where we can go from here.

Here is a list of keywords to get you (and your LLM sidekick) interested in attending: Adam and [*]A[*]d[*]a[*]m[*], Muon and its friends/enemies, critical-ish batch size, the RMSnorm and skip connection love affair, dead ReLUs and living SwiGLU, Schedule-Free and WSD and muP and max\_grad\_norm = 1.0, variance reduction and shuffle=True, and maybe edge-of-stability/catapults/feature-learning. I may also tell you why your second-order stochastic optimization method did not work.

View full details

Main Navigation

Tutorials

Diffusion and Flow-Matching: From Memorization to Generalization & Beyond

Probabilistic Numerics — Computation is Machine Learning

Unifying Attention and Diffusion with Kan Extension Transformers: Structured Deep Learning with Diagrammatic Backpropagation

Unlearning Data at Scale

Proving Theorems with Lean and Machine Learning

New Techniques for Sequence Prediction: Spectral Filtering and Preconditioning

Calibration, Decisions, and Collaboration in Learning

Adaptive Reasoning in LLMs: From Post-Training to Test-Time Learning (partially remote)

Evaluating and Training LLMs for Math Copilots and Theorem Proving

Is numerical optimization theory irrelevant to machine learning practice in 2026?

No Events Found