ICML 2026 Saturday 07/11

Skip to yearly menu bar Skip to main content

Timezone: Asia/Seoul

Full Schedule Mon 7/6 Tue 7/7 Wed 7/8 Thu 7/9 Fri 7/10 Sat 7/11

Registration Desk

Registration Desk

7:30 AM - 12:00 PM

Workshop

New Frontiers in Game-Theoretic Learning

Nicolò Cesa-Bianchi ⋅ Tatjana Chavdarova ⋅ Michael Jordan ⋅ Celestine Mendler-Dünner ⋅ Rene Vidal ⋅ Emmanouil-Vasileios Vlatakis-Gkaragkounis

8:00 AM - 5:00 PM

As Artificial Intelligence systems are increasingly deployed in high-impact, mixed-motive ecosystems, we are witnessing a paradigm shift from monolithic reasoning to strategic agency. However, a critical "translation gap" exists between classical foundational theory and modern AI practice. While classical game theory and mechanism design focus on long-run behaviors and static equilibria under explicit specifications, modern learning agents operate via non-stationary learning dynamics in unknown environments where traditional equilibria may be computationally intractable or dynamically irrelevant. Furthermore, while Large Language Models (LLMs) excel at parsing rich context, they often exhibit brittle strategic planning and exploitable biases when interacting in multi-agent settings. The NExT-Game workshop aims to bridge this gap by uniting the algorithmic game theory and machine learning communities. We seek to explore twofold frontiers: (i) theoretical frontiers, reimagining classical abstractions for high-dimensional, non-convex learning landscapes and characterizing principal-agent dynamics among boundedly rational, regret-minimizing learners; and (ii) applied frontiers, utilizing gamification and self-play as cognitive scaffolding to ground LLM hallucinations and addressing the systemic risks of "algorithmic monoculture". By fostering dialogue between theoreticians and practitioners, this workshop will chart concrete research directions to couple strategic stability with realistic multi-agent learning dynamics, ultimately informing robust and incentive-compatible emerging AI policy.

Workshop

3rd AI for Math Workshop: Toward Self-Evolving Scientific Agents

Haocheng Wang ⋅ Kun Xiang ⋅ Zhizhen Qin ⋅ Zhijiang Guo

8:00 AM - 5:00 PM

Mathematics has long served as a foundation for scientific discovery and a benchmark for reasoning systems. Recent advances in LLMs and formal methods have enabled AI agents to achieve IMO-level performance in theorem proving and demonstrate strong capabilities in end-to-end natural language mathematical reasoning. Against this backdrop, our workshop explores the next generation of automated research agents capable of reasoning across mathematics and broader scientific domains. We aim to investigate how these agents can achieve self-evolution to advance scientific knowledge. We invite diverse participants from academia and industry to discuss areas related to the following: - Formal theorem proving: How can LLM theorem provers transcend Olympiad questions to support real-world mathematics research and teaching, and self-evolve to propose and solve innovative conjectures? - Precise autoformalization: How to close the gap between formal and informal mathematical reasoning? How can natural language mathematics be reliably translated into formal languages? How do we verify that the resulting formal statements faithfully preserve the original mathematical intent? - Automated mathematics in natural language: How to achieve frontier mathematical reasoning performances with a pure natural language pipeline, including data, generation, and verification? - Scientific problem solving: How do the advances of mathematical reasoning as a foundation benefit/be transferred into broader scientific fields, e.g., theoretical computer science and physics? - Multimodal reasoning: How do current reasoning systems use visual information? How can we develop methods to tackle problems in multimodal mathematical and scientific reasoning? Extending the scope further, we also welcome research related to the following topics: - Verification and measurement: How to verify the correctness and measure the faithfulness of AI-generated scientific solutions? - Human-AI collaboration: What are the effective methods for scientific human-AI collaboration? - Scientific agents in related areas: Systems science, causality, finance, bioinformatics, etc. Our workshop also includes three challenges: - Track 1: Semantic Alignment Evaluation for Autoformalization - Track 2: Theoretical Computer Science Proving in Lean - Track 3: Visual Grounded Physics Problem Solving

Workshop

Forecasting as a New Frontier of Intelligence

Haifeng Xu ⋅ Jibang Wu ⋅ Anri Gu

8:00 AM - 5:00 PM

Forecasting has a rich literature in machine learning (ML), ranging from classical time-series analysis, to significant recent interest from both ML theory (e.g., forecasting and calibration) and applied ML research (e.g., benchmarking and advancing forecasting capabilities of foundation models). Despite its deep roots and recent trends in ML research, a dedicated workshop to forecasting is still missing, to our best knowledge. Led by a diverse team of seasoned organizers and featuring a compelling lineup of confirmed invited speakers, this workshop will provide a platform for interdisciplinary dialogue on AI forecasting, and will bring together researchers from varied perspectives (theoretical vs applied) and application domains (tech, finance, policy making, etc.).

Workshop

Efficient Multimodal Question Answering

Jordan L Boyd-Graber ⋅ Martin Fajčík ⋅ Chen Zhao ⋅ Ikuya Yamada ⋅ George Boateng

8:00 AM - 5:00 PM

Efficient multimodal question answering is becoming increasingly important as large language models expand into real-world settings where users rely on systems that must operate under constraints of latency, cost, connectivity, and device resources. This workshop brings together researchers from machine learning, NLP, and information retrieval to explore methods for answering questions over text, images, tables, and audio while balancing accuracy with computational efficiency. Building on the success of the NeurIPS 2020 EfficientQA competition, we highlight retrieval-augmented and hybrid generative–extractive approaches, multimodal reasoning under resource limits, and evaluation frameworks that incorporate human oversight. The workshop will feature invited talks, a shared task on efficient multimodal QA, poster sessions, and an exciting live human–computer question answering event designed to engage both participants and spectators. Our goal is to advance practical, trustworthy QA systems that remain deployable across diverse domains and global contexts.

Workshop

AdaptFM: Resource-Adaptive Foundation Model Inference

Stefanos Laskaridis ⋅ Deepak Gupta ⋅ Samuel Horvath ⋅ Zechun Liu ⋅ Christina Giannoula ⋅ Arnav Chavan ⋅ Sankalp Dayal

8:00 AM - 5:00 PM

Foundation models (FMs) have achieved remarkable capabilities across language, vision, and multimodal tasks. However, their inference typically follows a rigid, one-size-fits-all paradigm where every input, regardless of complexity, passes through the same fixed architecture with identical computational cost. This inflexibility creates a fundamental mismatch between the diverse resource budgets encountered in real-world deployments and the static nature of model inference. Adaptation can take many forms: compressing models to meet deployment budgets, designing flexible architectures that support multiple configurations from a single trained model, or making dynamic runtime decisions based on input complexity or resource availability. The central question we explore is: How can foundation model inference flexibly adapt to any resource budget, whether constrained by memory, compute, latency, energy, or cost, while maximizing output quality? This challenge spans across algorithms, architectures, and systems. We aim to bring together researchers from ML, systems, and hardware communities to advance techniques that move beyond rigid inference toward flexible, resource-aware foundation models. We welcome contributions in the following areas:

Workshop

Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

Aryan Deshwal ⋅ Haruka Kiyohara ⋅ Nghia Hoang ⋅ Tang Thanh Nguyen ⋅ Willie Neiswanger ⋅ Syrine Belakaria ⋅ Jana Doppa

8:00 AM - 5:00 PM

Join us for an insightful workshop on decision-making from offline datasets to online adaptation in various settings including black-box optimization, contextual bandits, RL, and synergies between them. Dive into their use in application domains including natural sciences (e.g., materials and drug discovery), engineering (e.g., chip design), healthcare, education, recommender systems, agriculture, and more.

Workshop

The Second Workshop on the Impact of Memorization on Trustworthy Foundation Models

Dominik Hintersdorf ⋅ Adam Dziedzic ⋅ Franziska Boenisch ⋅ Niloofar Mireshghallah

8:00 AM - 5:00 PM

Foundation models underpin many critical applications, such as healthcare, public safety, and education. Ensuring their trustworthiness is, therefore, more important than ever. However, recent research has revealed that foundation models are prone to memorizing details or even entire samples from their training data. This issue can lead to privacy violations, intellectual property infringement, and societal harm when sensitive information is leaked. While unintended memorization risks the integrity of models, a certain degree of it is essential for solving novel and complex tasks, highlighting the importance of balancing performance with data leakage. Currently, isolated solutions are being developed across various research fields and data modalities, often without integration or coordination. This fragmentation can lead to duplicated efforts despite shared goals. The lack of interaction and exchange between research fields hinders progress in understanding and mitigating undesired memorization. In this workshop, we explore the causes and consequences of memorization from both theoretical and practical perspectives. We aim to connect insights from different research fields, including data privacy, law, ethics, and security in machine learning, to assess their impact on models and society and to explore innovative methods for mitigating associated risks. By bringing together researchers and practitioners from diverse fields, we seek to bridge the gap between research and real-world applications, fostering the development of trustworthy foundation models that benefit society without compromising sensitive data, intellectual property, or individual privacy.

Workshop

Philosophy Meets Machine Learning: What Counts as Trustworthy?

Junhyung Park ⋅ Fanny Yang ⋅ Bernhard Schölkopf ⋅ Konstantin Genin ⋅ Thomas Icard ⋅ Vincent Fortuin ⋅ Jaesik Choi

8:00 AM - 5:00 PM

Philosophers have long thought deeply about many concepts that are used colloquially in the machine learning (ML) community such as epistemology, counterfactuals, explainability, reliability, uncertainty and causality. As ML systems are now embedded in high-stakes decisions across science, industry, and public life, it is urgent that when ML researchers claim properties such as "explainability", "reliability", "intelligence" or "cognition", these claims are made with awareness of what practitioners, policymakers, and affected users mean by those terms. In particular, we argue that the ML community needs to take a step back and review whether the mathematical objectives used in optimisation and evaluation procedures truly take into account how philosophers have analysed them—analyses that explicitly aim to connect notions like explanation, evidence, and uncertainty to human understanding, justification, and use. Philosophers of science and psychologists are more actively engaged than ever in such questions; however, their interaction with ML researchers remains sparse and fragmented. The goal of the proposed workshop is to facilitate a lively dialogue between the two otherwise largely separate communities, to promote more principled and grounded advances in ML and artificial intelligence.

Workshop

3rd Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences

Pengtao Xie ⋅ James Zou ⋅ Le Song ⋅ Ruishan Liu ⋅ Eran Segal ⋅ Wei Wang ⋅ Marinka Zitnik ⋅ Han Guo

8:00 AM - 5:00 PM

Recent advances in foundation models and large language models (LLMs) have revolutionized life sciences by enabling AI-driven insights into complex biological systems. However, most existing models focus on single-modal data, limiting their ability to capture the inherently multi-modal nature of biological processes. This workshop will explore the development and application of multi-modal foundation models and LLMs that integrate diverse biological data types, such as protein sequences, structures, genomic and transcriptomic data, and metabolomics. By bringing together researchers from AI, computational biology, and biomedical sciences, the workshop will address challenges in modality fusion, cross-modal representation learning, scalable pretraining, and interpretability. Discussions will focus on novel architectures, self-supervised learning methods, and real-world applications in drug discovery, precision medicine, and multi-omics data analysis. Through invited talks, poster sessions, contributed presentations, and panel discussions, this workshop aims to advance multi-modal foundation models and LLMs for biological discovery and foster interdisciplinary collaborations that push the boundaries of AI in life sciences. We successfully organized the first and second editions of this workshop at ICML 2025 and NeurIPS 2025, which attracted around 200 paper submissions and several hundred attendees in total.

Workshop

ICML 2026 Hypothesis Testing Workshop

Feng Liu ⋅ Danica J Sutherland ⋅ Lester Mackey ⋅ Xiuyuan Cheng ⋅ Shayak Sen ⋅ Nathaniel Xu

8:00 AM - 5:00 PM

Hypothesis testing, while much-maligned, remains a key component of scientific practice. Machine learning contributes to helping develop testing methodology, with many key advances in testing coming from the ML community, from widely used nonparametric tests to recent work on e-values. Machine learning practice can also benefit from the usage of hypothesis testing techniques, whether for checking or ensuring model reliability and robustness, or practical methods for helping detect subgroup shifts in medical applications. This workshop will explore advances both in testing methodology and in its impacts across ML.

Workshop

The 2nd Workshop on Connecting Low-rank Representations in AI: From Practice to Theory

Antonio Vergari ⋅ Grigorios Chrysos ⋅ Chao Li ⋅ Evrim Acar

8:00 AM - 5:00 PM

Structured low-rank representations constitute a unifying foundation across modern machine learning, powering advancements in domains as diverse as Large Language Models, probabilistic circuits, and quantum simulation. Despite sharing a common mathematical core—structured computational graphs—scientific progress is currently impeded by fragmented terminologies and isolated research silos. This workshop aims to bridge these communities by providing a centralized platform for cross-disciplinary synthesis. By harmonizing disparate theoretical frameworks and aligning vocabularies, we seek to accelerate breakthroughs in high-dimensional scaling, interpretability, and efficient computation.

Workshop

Workshop on Human-AI Co-Creativity: Advances, Opportunities, and Challenges

Adish Singla ⋅ Abhilasha Ravichander ⋅ Liwei Jiang ⋅ Alexander Spangher ⋅ Alice Oh ⋅ Jiho Jin ⋅ Jun Seong Kim ⋅ Changyoon Lee

8:00 AM - 5:00 PM

This workshop will bring together researchers and practitioners interested in topics of generative AI, creativity, and human-AI co-creation. On the one hand, we will explore opportunities in how recent advances in generative AI can support people in open-ended creative tasks. On the other hand, we will identify unique challenges in integrating generative AI into creative workflows, including design fixation, idea homogeneity, and issues of authorship. By fostering collaboration between different communities and stakeholders, we aim to facilitate the development of next-generation technologies that enhance human-AI co-creativity.

Workshop

AI4Physics: An ICML 2026 Workshop on AI for Physics

John Sous ⋅ Arman Cohan ⋅ Yilun Zhao

8:00 AM - 5:00 PM

AI is rapidly reshaping physics research, but progress is often fragmented across subfields and stages of the scientific workflow. This workshop, AI4Physics at ICML 2026, will bring together machine learning researchers and physicists to develop and evaluate trustworthy AI methods that support physics discovery end to end—from physics-centric reasoning with LLMs and tool-using agents, to high-fidelity generative and surrogate simulators, to inverse problems and uncertainty-aware inference under systematics, and finally to data scarcity, dataset-building, and closed-loop experimental design and control. Spanning high-energy physics, astrophysics and cosmology, condensed matter, plasma and fusion, and quantum science, the workshop will highlight shared structure and bottlenecks across domains, consolidate best practices for physical consistency and robust evaluation, and catalyze cross-disciplinary collaboration through invited talks, posters, and contributed presentations.

Workshop

Pluralistic Alignment Workshop

JinYeong Bak ⋅ Yohan Jo ⋅ Ruyuan Wan ⋅ Liwei Jiang ⋅ Maarten Sap ⋅ Dongyeop Kang ⋅ Taylor Sorensen ⋅ Kshitish Ghate ⋅ Amy Zhang

8:00 AM - 5:00 PM

Aligning AI systems with human preferences and societal values has become a critical challenge as these technologies grow more powerful and pervasive. However, current AI alignment methods have proven insufficient for capturing the full spectrum of complex—and often conflicting—real-world values held across diverse populations. This workshop addresses this gap by examining how to integrate diverse perspectives, values, and expertise into pluralistic AI alignment frameworks. We will explore novel approaches to multi-objective alignment, drawing inspiration from established governance mechanisms and consensus-building practices to navigate the value conflicts inherent in pluralistic societies. The workshop will cover technical innovations in preference elicitation and dataset collection, algorithm development for multi-stakeholder optimization, and the design of human-AI interaction workflows that authentically reflect pluralistic values across diverse communities. By convening researchers, practitioners, and domain experts from AI safety, political philosophy, social science, and human-computer interaction, this workshop aims to foster interdisciplinary collaboration that advances both the theoretical foundations and practical implementation of pluralistic AI alignment.

Workshop

Planning in The Era of Language Models (LM4Plan)

Michael Katz ⋅ Augusto B. Corrêa ⋅ Nir Lipovetzky ⋅ Sarath Sreedharan ⋅ Katharina Stein ⋅ Luckeciano Melo ⋅ Elliot Gestrin

8:00 AM - 5:00 PM

Language Models (LMs) are a disruptive force, changing how research was done in many subareas of AI. Planning is one of the last bastions that remain standing. The focus of this workshop is on the questions in the intersection of these areas. Some of the specific areas we would like to gain a better understanding in include: what LMs can contribute to planning, how LMs can/should be used, what are the pitfalls of using LMs, what are the guarantees that can be obtained. This would be a third edition of the LM4Plan workshop, which started at AAAI 2025 and had its second edition at ICAPS 2025. The workshop series website is at https://llmforplanning.github.io/

Workshop

The Second Workshop on Agents in the Wild: Safety, Security, and Beyond

Chenguang Wang ⋅ Xinyun Chen ⋅ Wenbo Guo ⋅ Yizhou Sun ⋅ Kyle Montgomery ⋅ Yiyou Sun ⋅ Jianhong Tu ⋅ Zhun Wang

8:00 AM - 5:00 PM

The year 2025 was recognized as the year of the agent, with advances in AI agents that can perceive, reason, and act in complex real-world environments. For example, OpenAI's Operator can interact with a browser to take actions on the web to complete tasks such as booking a trip. Unlike LLMs, agentic systems introduce fundamentally different safety and security challenges, such as the risks of irreversible real-world consequences. The first workshop on Agents in the Wild at ICLR 2026 aimed to address these foundational concerns. However, the situation has only grown more urgent. For example, recent agents like OpenClaw now enable agent-only communities where AI agents interact with minimal human oversight, amplifying existing vulnerabilities while introducing novel challenges in new real-world settings such as multi-agent coordination. Building on the success of the first workshop, which received 235 submissions and anticipated 800 attendees, we propose a second iteration to tackle both the escalating foundational challenges and these emerging risks. Through invited talks, contributed papers, and structured discussions, the workshop seeks to formalize open research problems and establish a comprehensive and interdisciplinary research agenda for building safe, secure, and reliable agentic systems deployed in the wild.

Workshop

Structured Data for Health

Hyewon Jeong ⋅ Maxwell Xu ⋅ Wanting Mao ⋅ Dongxia Wu ⋅ Juncheng Liu ⋅ Girish Narayanswamy ⋅ Sana Tonekaboni ⋅ Patrick Langer ⋅ James Rehg

8:00 AM - 5:00 PM

Structured data is the backbone of modern healthcare, encompassing tabular Electronic Health Records (EHRs), high-frequency time-series biosignals, and complex disease networks. Despite the critical need for holistic patient modeling, research across these modalities remains largely siloed, often overlooking the multimodal nature of real-world clinical decision-making. The "Structured Data for Health" workshop aims to bridge this gap by establishing a unified forum for the convergence of tabular, time-series, and graph-based health data research. We focus on addressing shared technical challenges—such as data heterogeneity, sparsity, and distribution shifts—while leveraging emerging capabilities in Large Language Models (LLMs) for data structuring and reasoning. Featuring a globally diverse lineup of speakers from leading academic and industry institutions, this workshop will cover the full spectrum of structured health AI, from foundational representation learning and multimodal fusion to trustworthy, real-world clinical deployment.

Workshop

2nd Workshop on Compositional Learning: Safety, Interpretability, and Agents

Giacomo Camposampiero ⋅ Pietro Barbiero

8:00 AM - 5:00 PM

Compositionality, defined as the ability to construct and reason about complex concepts from reusable components, is a hallmark of human cognition and the key to robust generalization. Despite the astonishing progress of modern AI systems, it remains an open question whether they truly capture and leverage the compositional nature of many real-world domains. The workshop will explore this pressing challenge across multiple critical dimensions. We will invite contributions focusing on the theoretical foundations of compositionality, its central role in the age of foundation models and agents, and its impact on achieving robustness and systematic out-of-domain generalization. Through interdisciplinary dialogue, we aim to catalyze new research directions that push the boundaries of compositional learning in advanced AI systems.

Workshop

AI for Science: AI Scientists -- Tools, Co-authors, or Founders?

Lixue Cheng ⋅ Marinka Zitnik ⋅ Max Welling ⋅ Mengdi Wang ⋅ Soojung Yang ⋅ Sungsoo Ahn ⋅ Yixuan Wang ⋅ Mia Rosenfeld

8:00 AM - 5:00 PM

We have crossed an inflection point: AI has moved from passive tool to active agent that closes the loop on hypothesis generation, experimental design, and execution. Nations and corporations are investing at unprecedented scale, e.g., the U.S. Genesis Mission mobilizing 17 national laboratories, and a recent Nature study confirms that AI-augmented research is accelerating in adoption. The AI Scientist is no longer a vision; it is here. The question is no longer whether AI Scientists will reshape science, but how—and in particular, where AI sits on the spectrum from tool to co-author to founder. This distinction carries concrete consequences for authorship, credit, funding, and ethical oversight, yet these roles already coexist across domains without shared definitions to distinguish them. As a tool, AlphaFold predicts protein structures that biologists interpret and experimentally validate; GNoME screens hundreds of thousands of candidate crystals for thermodynamic stability while materials scientists choose which to synthesize. In each case, scientists retain full authority; the model accelerates search but does not set the agenda. As a co-author, AI autonomously executes substantial research steps within human-defined problem spaces: Coscientist uses large language models to plan chemical syntheses and drive robotic execution, CuspAI generates synthesizable materials candidates up to 10× faster, AlphaProof solves Olympiad problems at gold-medal level, and A-Lab combines target selection with robotic synthesis to realize novel compounds in a 17-day closed-loop campaign. At the far end, AI approaches founder: FutureHouse’s Kosmos identified and pursued questions without human guidance, Sakana’s AI Scientist autonomously generates ideas, designs experiments, and writes papers, and Lila Sciences has built “AI Science Factories”: autonomous labs integrating generative AI with robotics that generate hypotheses, execute experiments, and iterate across biology, chemistry, and materials science. These examples span a wide spectrum of autonomy, yet all fall under the umbrella of “AI Scientists.” Without shared definitions and meaningful benchmarks, we cannot separate marketing from milestones. Our workshop aims to fill this gap by bringing together ML researchers, domain scientists, experimentalists, policymakers, and industry practitioners to define clearer taxonomies, propose evaluation standards, and initiate governance dialogue for AI-driven discovery. Workshop attendees will leave with: (1) a shared vocabulary and taxonomy for discussing AI Scientist systems across domains; (2) concrete evaluation criteria for assessing whether AI contributions constitute tool use, co-authorship, or independent discovery; (3) draft principles for attribution, accountability, and governance that can inform institutional policies; and (4) connections across the AI and domain science communities to advance responsible development.

Workshop

[Cancelled] Game Theory in Nature: From Optimality to Equilibrium

Matthieu Geist ⋅ Mathieu Lauriere

8:00 AM - 5:00 PM

This workshop explores nature as a massive distributed learning system to bridge the gap between biological adaptation and multi-agent machine learning. While modern AI often relies on centralized optimization and global objectives, natural systems like microbial colonies and animal societies attain stability and collective intelligence through local strategic interactions. As the machine learning community moves toward decentralized environments and uses large scale models for ecological data, understanding the tension between individual goals and population level stability becomes critical. By investigating core themes such as evolutionary stability, environmental feedback, and emergent communication, this workshop aims to identify biological mechanisms that can inform the design of more efficient and scalable artificial systems. We bring together researchers in game theory, animal behavior, and machine learning to address the challenges of collective decision making and systemic robustness in the face of the current explosion in ecological sensor data.

Workshop

Foundation Models for Structured Data

Nick Erickson ⋅ Xiyuan Zhang ⋅ Mononito Goswami ⋅ Lennart Purucker ⋅ Boran Han ⋅ Maximilian Schambach ⋅ Arjun Ashok ⋅ Rajat Sen

8:00 AM - 5:00 PM

Structured data (tabular and time-series) underpins high-impact applications across finance, healthcare, enterprise decision-making, and climate modeling. Over the past two years, predictive foundation models tailored to structured data have emerged, enabling in-context learning and transfer across heterogeneous datasets and schemas, challenging the traditional “train per dataset” paradigm. Tabular and time-series foundation models share methodological similarities: pretraining on heterogeneous datasets, in-context learning, and transfer under schema and distribution shift. These similarities create natural synergies across the respective communities. Building on the inaugural Foundation Models for Structured Data workshop at ICML 2025, FMSD @ ICML 2026 will unify the tabular and time-series communities around shared challenges in data curation, scaling, evaluation (including contamination), and real-world deployment (latency, memory, monitoring).

Workshop

Statistical Frameworks for Uncertainty in Agentic Systems

Aymeric Dieuleveut ⋅ Mahmoud Hegazy ⋅ Tijana Zrnic ⋅ Aaditya Ramdas ⋅ Stephen Bates ⋅ Maxim Panov

8:00 AM - 5:00 PM

Long-horizon agentic workloads mark a shift into a regime where declarative systems adaptively build larger declarative systems. In this setting, uncertainty quantification must address not only output-level error but also the risks of adaptive resource allocation: budgeted routing to tools and subagents, safeguards against unnecessary spend, and principled stopping under continuous monitoring. As systems become more modular, we also need compositional guarantees that aggregate local uncertainties into end-to-end risk bounds. The aim of this workshop is to bring distribution-free statistical tools to bear on these agentic systems. We focus on three themes: (1) distribution-free validity layers for coverage, risk, and abstention under heterogeneity and distribution shift; (2) anytime-valid sequential inference for continuous monitoring, evidence aggregation, and principled stopping; and (3) uncertainty reporting for interactive components and inter-agent interaction. We invite contributions advancing statistical foundations for uncertainty in agentic systems, including theory, methodology, benchmarks, and case studies.