Skip to yearly menu bar Skip to main content


Timezone: America/Vancouver

Registration Desk: Registration West Sat 19 Jul 07:30 a.m.  


Workshop: Actionable Interpretability Sat 19 Jul 08:00 a.m.  

Tal Haklay · Hadas Orgad · Anja Reusch · Marius Mosbach · Sarah Wiegreffe · Ian Tenney · Mor Geva

Interpretability research has advanced considerably in uncovering the inner mechanisms of artificial intelligence (AI) systems and has become a crucial subfield within AI. However, translating interpretability findings into actionable improvements in model design, training, and deployment remains a challenge. As a result, such insights have rarely influenced real-world AI development. This workshop addresses a key yet underexplored question: How can interpretability research drive tangible advancements in AI systems? By fostering discussions on the practical applications of interpretability, we aim to bridge this gap and highlight work that moves beyond analysis to achieve concrete improvements in model alignment, robustness, and domain-specific performance. Through this workshop, we strive to refocus interpretability research on actionable impact rather than just analysis, ensuring its insights lead to meaningful advancements in AI.


Workshop: The Impact of Memorization on Trustworthy Foundation Models Sat 19 Jul 08:25 a.m.  

Franziska Boenisch · Adam Dziedzic · Lukas Struppek · Dominik Hintersdorf · Lingjuan Lyu · Niloofar Mireshghallah

Foundation models have come to underpin many critical applications, such as healthcare, public safety, and education. Ensuring their trustworthiness is, therefore, more important than ever. However, recent research has revealed that foundation models are prone to memorizing details or even entire samples from their training data. This issue can lead to privacy violations, intellectual property infringement, and societal harm when sensitive information is leaked. While unintended memorization risks the integrity of models, a certain degree of it is essential for solving novel and complex tasks, highlighting the importance of balancing performance with data leakage. Currently, isolated solutions are being developed across various research fields and data modalities, often without integration or coordination. This fragmentation can lead to duplicated efforts despite shared goals. The lack of interaction and exchange between research fields hinders progress in understanding and mitigating undesired memorization. In this workshop, we explore the causes and consequences of memorization from both theoretical and practical perspectives. We aim to connect insights from different research fields, including data privacy, ethics, and security in machine learning, to assess their impact on models and society and to explore innovative methods for mitigating associated risks. By bringing together researchers and practitioners from diverse fields, we seek to bridge the gap between research and real-world applications, fostering the development of trustworthy foundation models that benefit society without compromising sensitive data, intellectual property, or individual privacy.


ICML 2025 Workshop on Collaborative and Federated Agentic Workflows (CFAgentic @ ICML'25) Sat 19 Jul 08:30 a.m.  

Alexander Erben · Gauri Joshi · Nicholas Lane · Huan Sun · Shiqiang Wang · Herbert Woisetschlaeger

This workshop aims to provide a platform for discussing the convergence of collaborative and federated learning with agentic workflows - an emerging class of AI systems capable of autonomously executing complex task sequences. We aim to facilitate an engaging discussion among scholars and practitioners by soliciting work addressing key challenges in precision, efficiency, and personalization, safety & security, as well as regulatory compliance in the development of collaborative and federated agentic workflows.


ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models Sat 19 Jul 08:30 a.m.  

Tri Dao · Daniel Y Fu · Max Ryabinin · Daniel Hesslow · Simran Arora · Songlin Yang · Songlin Yang · Dan Biderman · Beidi Chen · Azalia Mirhoseini · Percy Liang

As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities, unlocking new AI applications. These new capabilities have led to a paradigm shift: large foundation models have become predominant in natural language processing and are growing increasingly common in computer vision, audio processing and even robotics. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.

In response to these challenges, diverse research directions have spawned promising works: (1) training and inference either at large scale or in resource-constrained scenarios (e.g., with higher network latency and lower bandwidth, in a collaborative manner across a fleet of contributed devices, or with a single GPU); (2) large-scale distributed training approaches, such as 3D parallelism and sharding; and (3) deep system optimizations, with custom languages such as TVM and Triton. These novel interdisciplinary research directions directly shape and impact the trajectory of research across machine learning.

Accordingly, these emerging lines of research are increasingly relevant to machine learning researchers. Indeed, researchers are key stakeholders: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models (e.g., recent award-winning papers at ICML and NeurIPS); on the other hand, novel research findings may be best demonstrated at scale --- which may require training models as efficiently as possible to make the best use of available resources.

The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. This would be the third installment of the ES-FoMo workshop at ICML. This year, we are bringing further focus on two trends observed in 2024 and early 2025: (1) test-time compute, popularized by OpenAI o1 and DeepSeek r1, and (2) the mergence of new modeling paradigms and modalities such as real-time video and decentralized training. We look forward to continuing to grow this community at ICML 2025.


Workshop on Computer Use Agents Sat 19 Jul 08:30 a.m.  

David Barber · Doina Precup · Andrei Nica · Roberta Raileanu · Harshil Shah · Boyuan Zheng · Shuyan Zhou

Computer use models are attracting significant interest in academia and industry due to their ability to perform complex tasks in non-deterministic environments. However, they are far from being ready for unattended deployment, as evidenced by their performance on the OSWorld benchmark where they achieve only a small fraction of human performance. The rapid evolution of these agents raises important questions regarding their accuracy, safe deployment, and potential impact on the future of work. The topics we would like to cover are:- Learning Algorithms --- which new architectures and learning techniques (e.g. memory mechanisms for extended tasks, exploration strategies) can enhance the intrinsic ability of computer use agents to acquire, represent, and refine knowledge?- Orchestration --- what novel frameworks or control methods (e.g. dynamic task planning, modular coordination, multi-agent systems) can efficiently manage and integrate multiple learning components to optimize overall agent performance?- Interfaces --- how should agents perceive and act within their environments (e.g., via APIs or UI interactions), and should we design unified systems or specialized agents for different modalities?- Guardrails, safety \& societal implications --- what guardrails do we need in order to make computer use models safe for deployment ``in the wild'' while ensuring that they have a positive impact on society?- Benchmarking \& tools --- how can we develop robust environments and evaluation metrics that capture the diversity of real-world settings? Do we need new tools or frameworks to make research on computer use more efficient and accessible?- Human-agent interaction --- how will future interactions evolve? Should we optimize agents for full autonomy or design them as personalized, human-centric collaborators?- Broader applications --- what are some practical applications for computer use agents across domains such as healthcare, scientific research, software engineering and testing etc.?- Capability horizon --- what breakthroughs or engineering challenges are required to enable agents orders of magnitude more capable than today, and what implications would such advances have?


Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences Sat 19 Jul 08:30 a.m.  

Pengtao Xie · James Zou · Le Song · Aidong Zhang · Danielle Grotjahn · Linda Awdishu · Eran Segal · Wei Wang · Ruiyi Zhang

Recent advances in foundation models and large language models (LLMs) have revolutionized life sciences by enabling AI-driven insights into complex biological systems. However, most existing models focus on single-modal data, limiting their ability to capture the inherently multi-modal nature of biological processes. This workshop will explore the development and application of multi-modal foundation models and LLMs that integrate diverse biological data types, such as protein sequences, structures, genomic and transcriptomic data, and metabolomics. By bringing together researchers from machine learning, computational biology, and biomedical sciences, the workshop will address challenges in modality fusion, cross-modal representation learning, scalable pretraining, and interpretability. Discussions will focus on novel architectures, self-supervised learning methods, and real-world applications in drug discovery, precision medicine, and multi-omics data analysis. Through invited talks, poster sessions, contributed presentations, and panel discussions, this workshop aims to advance multi-modal foundation models and LLMs for biological discovery and foster interdisciplinary collaborations that push the boundaries of machine learning in life sciences.


Workshop: Exploration in AI Today (EXAIT) Sat 19 Jul 08:30 a.m.  

Parnian Kassraie · Andrew Wagenmaker · Bhavya · Carmelo Sferrazza · Lenart Treven · Amy X. Lu

How can we efficiently collect observations for optimization, control, and generalization? This is a key challenge in AI and is known as the exploration problem. Effective exploration has driven progress in areas such as robotics, recommender systems, and scheduled medical trials. However, as we address larger, more complex applications—such as drug discovery or language modeling—the exceptionally large search spaces render traditional exploration algorithms ineffective. As a result, recent breakthroughs in AI have come not from traditional exploration algorithms, but largely from training large foundation models on diverse corpora of pre-existing, curated datasets. Despite this, we have witnessed sparks showing that exploration, when done right, can compensate for data and computation—for example, in the training of DeepSeek-R1—suggesting that exploration can still play a key role in AI today.

The Exploration in AI Today (EXAIT) Workshop at ICML 2025 will focus on addressing the evolving role of exploration in AI. We will dwell on the question: what is the place of exploration in today’s AI landscape and in which settings can exploration algorithms address current open challenges? In particular, we consider the potentially pivotal role that exploration might play in navigating complex and high-dimensional search spaces across real-world applications such as robotics, large language model alignment, and AI for science.


Workshop: Methods and Opportunities at Small Scale (MOSS) Sat 19 Jul 08:45 a.m.  

Bingbin Liu · Enric Boix-Adserà · Elisabetta Cornacchia · Surbhi Goel · Abhishek Panigrahi · Eran Malach · Cyril Zhang · Benjamin Edelman

The increasing computational demands of modern ML create a critical challenge: thorough experimentation becomes prohibitively expensive precisely when we most need to understand and steer model behavior. Small-scale experiments (<= 1 GPU) offer a powerful approach for systematic investigation, enabling both scientific understanding and practical advances. Recent work demonstrates the endless opportunities at this scale, including: diagnoses and mitigations of training pathologies; minimalistic replications of modern pipelines; elementary synthetic tasks that “stress test” architectures and motivate new designs; and discovery of intriguing phenomena.This workshop aims to highlight how methods and opportunities at small scale can unlock new insights and drive progress. The emphasis will be on advancing scientific understanding (and, optionally, its interplay with theory), without the need to improve state-of-the-art performance.


The 2nd Workshop on Reliable and Responsible Foundation Models Sat 19 Jul 08:50 a.m.  

Mohit Bansal · Xinyu Yang · Kate Donahue · Giulia Fanti · David Madras · Han Shao · Hongyi Wang · Steven Wu · Xinyu Yang · Huaxiu Yao

Foundation models (FMs), with their emergent and reasoning abilities, are reshaping the future of scientific research and broader human society. However, as their intelligence approaches or surpasses that of humans, concerns arise regarding their responsible use in real-world applications, such as reliability, safety, transparency, and ethics. The workshop on reliable and responsible FMs delves into the urgent need to ensure that such models align with human values. The significance of this topic cannot be overstated, as the real-world implications of these models impact everything from daily information access to critical decision-making in fields like medicine and finance, especially for embodied FMs that directly interact with the physical world. Stakeholders, including developers, practitioners, and policymakers, care deeply about this because the reliable and responsible design, deployment, and oversight of these models dictate not only the success of AI solutions but also the preservation of societal norms, order, equity, and fairness. Some of the fundamental questions that this workshop aims to address are:* Diagnosis: How can we identify and characterize unreliable and irresponsible behaviors in FMs? Topics include prompt sensitivity, lack of self-consistency, and hallucinations in generation.* Evaluation: How should we assess the harmful capabilities of FMs and quantify their societal impact? * Sources: How can we pinpoint and understand the known or emerging sources of FM unreliability? This involves examining training data, optimization objectives, and architectural design.* Generalization: How can responsible and reliable properties be effectively adapted to increasingly advanced FMs, particularly as they incorporate new features such as more modalities or long CoT? * Governance: What principles or guidelines should inform the next generation of FMs to ensure they are reliable and responsible? How can real-time monitoring of these FMs be enabled?* Guarantee: Can we establish theoretical frameworks for reliably and responsibly provable FMs?* Practice: How to leverage domain-specific knowledge to guide FMs towards improved reliability and responsibility across diverse areas, such as drug discovery, education, or clinical health?


Workshop: Building Physically Plausible World Models Sat 19 Jul 08:50 a.m.  

Homanga Bharadhwaj · Boyuan Chen · Yilun Du · Hiroki Furuta · Ruiqi Gao · Hamidreza Kasaei · Sean Kirmani · Kuang-Huei Lee · Ruoshi Liu · Zeyi Liu · Li Fei-Fei · Carl Vondrick · Wenhao Yu

The goal of this workshop is to exchange ideas and establish communications among researchers working on building gener- alizable world models that describe how the physical world evolves in response to interacting agents (e.g. human and robots). Large-scale datasets of videos, images, and text hold the key for learning generalizable world models that are visually plau- sible. However, distilling useful physical information from such diverse unstructured data is challenging and requires careful attention to data curation, developing scalable algorithms, and implementing suitable training curricula. On the other hand, physics-based priors can enable learning plausible scene dynamics but it is difficult to scale to complex phenomenon that lack efficient solvers or even governing dynamic equations. Developing general world models that can simulate complex real-world phenomenon in a physically-plausible fashion can unlock enormous opportunities in generative modeling and robotics, and would be of wide interest to the larger AI community, and we believe this workshop falls at an ideal timing given recent signif- icant progress in both video-modeling models and physics-based simulation. This workshop aims to bring together researchers in machine learning, robotics, physics-based simulation, and computer vision broadly aspiring to build scalable world models by utilizing internet data, simulation, and beyond in myriad ways.


Workshop: DataWorld: Unifying data curation frameworks across domains Sat 19 Jul 08:55 a.m.  

Neha Hulkund · Sara Beery · Benjamin Feuer · Niv Cohen · Thao Nguyen · Ludwig Schmidt · Serena Yeung · Yuhui Zhang

Recently, data-centric research, which has historically taken a backseat to model-centric research, has assumed a central role in the machine learning community. Our workshop aims to explore data-centric methods and theory, with a particular emphasis on real-world data curation. By curation, we mean the set of actions taken by some curator(s) to transition from ideation to a complete dataset. Our topic is wide-ranging, with recent work studying everything from sourcing, to benchmarks.One area that remains relatively underexplored is how data-centric methods can perform differently, depending on the modality and the domain of the data and the downstream application. Which lessons can be shared across domains and modalities, and which cannot? For example, a common part of the data pipeline involves data filtration. Filtration, in domains like medical imaging and wildlife camera traps, faces similar challenges including long-tailed distributions and natural distribution shifts (between hospitals and camera locations, respectively). However, the two domains differ in the types of distribution shift encountered (covariate vs. label vs. subpopulation) and dataset scale (there are generally more camera trap images than medical scans). Another example is the fact that most successful filtration methods in the recent DataComp benchmark tend to disproportionately remove images with non-English captions. Such methods not only degrade the performance on non-English benchmarks, but are also not generalizable to other domains and most real-world applications. Our workshop will invite novel research which seeks to unify seemingly disparate frameworks for data curation; where this is impossible, we hope that the necessary trade-offs and domain-specific challenges will be made clearer.


Workshop: DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats) Sat 19 Jul 08:55 a.m.  

Khoa Doan · Franziska Boenisch · Adam Dziedzic · Aniruddha Saha · Viet Anh Nguyen · Zhenting Wang · Bo Li · Heather Zheng

Generative models have become extremely powerful and are now integral to various aspects of daily life from creative arts to customer service. Given their increasing interaction with people, ensuring their trustworthiness is crucial. This workshop centers on the idea that the safety and reliability of generative models are deeply connected to the nature and treatment of their training data. We aim to explore the hypothesis that building reliable and trustworthy artificial intelligence (AI) systems based on generative models must start with high-quality and responsibly managed data.The workshop will focus on several key areas where training data impacts the trustworthiness of generative models. Among others, we will address 1) privacy concerns, highlighting how improper inclusion and handling of sensitive information in the training data can lead to significant privacy violations; 2) safety risks, like backdoors and data poisoning that threaten robust generations; and 3) the impact of biases in generative models' training data, which can cause models to perpetuate or even amplify societal biases, resulting in unfair outcomes.Through expert talks, panel discussions, and interactive sessions, participants will delve into these issues and explore strategies for developing safe, trustworthy, and reliable generative models. This workshop aims to foster collaboration and drive forward research to ensure that generative models, as they become more embedded in our lives, do so in a trustworthy and beneficial manner.


AI Heard That! ICML 2025 Workshop on Machine Learning for Audio Sat 19 Jul 09:00 a.m.  

Alice Baird · Sander Dieleman · Chris Donahue · Brian Kulis · David Liu · Rachel Manzelli · Shrikanth Narayanan

The Machine Learning for Audio workshop at ICML 2025 will cover a broad range of tasks and challenges involving audio data. These include, but are not limited to: methods of speech modeling, environmental sound generation or other forms of ambient sound, novel generative models, music generation in the form of raw audio, text-to-speech methods, denoising of speech and music, data augmentation, classification of acoustic events, transcription, source separation, and multimodal problems.


Workshop: TerraBytes: Towards global datasets and models for Earth Observation Sat 19 Jul 09:00 a.m.  

Nicolas Audebert · Hossein Azizpour · Valentin Barriere · Javiera Castillo Navarro · Mikolaj Czerkawski · Heng Fang · Alistair Francis · Valerio Marsocci · Andrea Nascetti · Ritu Yadav

Earth Observation presents unique challenges for machine learning due to its non-stationary data distribution, spatio-temporal biases, and multimodal nature. TerraBytes aims to address these challenges by fostering discussions at the intersection of data curation, machine learning, and remote sensing. The workshop focuses on (1) curating less biased, globally representative EO datasets, (2) developing adaptable ML models for EO applications, and (3) bridging the gap between data acquisition and ML communities. By promoting interdisciplinary collaboration, TerraBytes seeks to advance EO research and enable inclusive, fair, and impactful applications.


Workshop on Technical AI Governance Sat 19 Jul 09:00 a.m.  

Ben Bucknall · Lisa Soder · Carlos Mougan · Siddharth Swaroop · Fazl Barez · Anka Reuel · Michael A Osborne · Robert Trager

As the development and use of AI systems expands, policymakers increasingly recognize the need for targeted actions that promote beneficial outcomes while mitigating potential harms. Yet there is often a gap between these policy goals and the technical knowledge required for effective implementation, risking ineffective or actively harmful results (Reuel et al., 2024b). Technical AI governance—a nascent field focused on providing analyses and tools to guide policy decisions and enhance policy implementation—currently lacks sufficient venues for exchanging scholarly work. This workshop aims to provide such a venue, fostering interdisciplinary dialogue between machine learning researchers and policy experts by ensuring each submission is reviewed by both technical and policy specialists. Through this collaboration, we seek to accelerate the development of robust governance strategies that lead to safer, more equitable AI systems.


The Second Workshop on Long-Context Foundation Models Sat 19 Jul 09:10 a.m.  

Zexue He · Tianyu Gao · Amanda Bertsch · Howard Yen

Foundation models have become a cornerstone in the advancement of artificial intelligence, enabling applications across a wide range of domains. Many complex tasks today require processing and synthesizing information over thousands to millions of individual pieces of data, from text and images to audio and genomic sequences. Recent progress in long-context models has made it possible to handle such extensive inputs, but significant challenges remain, particularly in terms of computational efficiency, data quality and quantity, and evaluation. This workshop will convene researchers to explore these challenges and foster developments in long-context foundation models. Key topics include new modeling architectures, training approaches, efficiency techniques, and comprehensive evaluation methods. Additionally, in this edition, special attention will be given to long-context reasoning, multimodal learning, and applications in scientific fields such as genomics, climate science, etc. By tackling these critical challenges, we aim to push the boundaries of long-context modeling and shape its future directions.