Workshops
Programmatic Representations for Agent Learning
This workshop explores the use of programmatic representations to enhance the interpretability, generalizability, efficiency, and scalability of agent learning frameworks. By leveraging structured representations—such as symbolic programs, code-based policies, and rule-based abstractions—agents can achieve greater interpretability, improved generalization, and enhanced efficiency. Programs can explicitly encode policies, reward functions, task structures, and environment dynamics, providing human-understandable reasoning while reducing the reliance on massive data-driven models. Furthermore, programmatic representations enable modularity and compositionality, allowing agents to efficiently reuse knowledge across tasks and adapt with minimal retraining. By bringing together the sequential decision-making community—including researchers in reinforcement learning, imitation learning, planning, search, and optimal control—with experts in program synthesis and code generation, this workshop aims to tackle the fundamental challenges of agent learning at scale and drive progress toward interpretable, generalizable, verifiable, robust and safe autonomous systems across domains ranging from virtual agents to robotics.
The 1st Workshop on Vector Databases
Vector databases (Vector DBs) are a foundational and critical application layer for injecting information into large language models (LLMs). Although different companies have proposed various vector databases, no academic workshop has previously existed to discuss these systems comprehensively. This workshop aims to foster discussions on vector databases from various perspectives, ranging from mathematical theories to implementation-level optimizations. Topics covered in the workshop include retrieval-augmented generation (RAG), algorithms and data structures for approximate nearest neighbor search (ANN), data management systems for handling vector data, query languages, and embedding models. Furthermore, the workshop will also function as a platform for companies and researchers working on vector databases to present technical details (white papers) and exchange ideas.
Tiny Titans: The next wave of On-Device Learning for Foundation Models (TTODLer-FM)
The rapid evolution of Deep Learning, propelled by transformer-based architectures and significant hardware advancements, has unlocked unprecedented capabilities across diverse domains, from biological sciences to autonomous systems. As foundation models continue to scale, they introduce new challenges in resource management, particularly in data centers, and data availability prompting us to broaden our exploration of leveraging distributed and on-device resources for training and inference. Small Language Models (SLMs) are emerging as a compelling alternative for generative AI, particularly at the edge, offering a sustainable balance between efficiency and user privacy. This workshop aims to bring together algorithms and systems experts to discuss the opportunities and challenges of on-device machine learning. We hope to explore to what extent SLMs can compete with or complement LLMs and identify methods to enhance their quality and efficiency. Addressing this shift requires innovation in algorithm and system co-design, underscoring the importance of interdisciplinary approaches for future applications.
2nd AI for Math Workshop @ ICML 2025
Mathematical reasoning stands as a pinnacle of human intelligence. The rapid advancements in artificial intelligence, particularly in large language models (LLMs), have opened new frontiers at the intersection of AI and mathematical reasoning. This workshop aims to explore the potential of AI in comprehending and advancing mathematical reasoning, with a focus on fostering collaboration between humans and machines to push the boundaries of mathematical discovery. The central theme revolves around the question: >``How can we leverage and advance the mathematical reasoning abilities of machine learning models, and drive innovation across scientific and practical domains?''Our workshop will bring together researchers from diverse backgrounds, institutions, and disciplines to discuss the progress and future of AI technologies in mathematics. Specifically, we will delve into the areas related to the following:* Automated Theorem Proving: How can we build consistent theorem-proving systems? How can theorem-proving systems assist humans through human-computer interaction?* Automated Theorem Generation: Can neural models generate new and practically meaningful theorems that have been discovered? How can we utilize these newly generated theorems?* Autoformalization and Verification: How can we improve the precision of translating natural language proofs into formal proofs, and vice versa?* Problem Solving: How can we develop AI models to solve complex mathematical computational problems across various domains? How can AI models improve themselves during the learning process?* Applications of AI in Mathematics: What are the practical applications of AI-driven mathematical reasoning in various fields such as sciences, engineering, finance, and education?The intended outcome is to identify new ideas, open problems, and interdisciplinary areas for future research related to mathematical reasoning. To this end, we welcome papers on areas related, but not limited, to:* Algorithm: How to develop effective algorithms (e.g., reinforcement learning, self-improve/evolve) to improve reasoning ability?What are the key principles for developing algorithms that minimize resource consumption (e.g., time, memory) while maintaining or improving reasoning performance?* Data Generation: Can AI models generate questions that they cannot answer correctly?Can AI models achieve self-improvement through self-generated data?* Tool Utilization: How can AI systems leverage existing tools, such as code and software, to solve practical mathematical problems more effectively?* Limitation Analysis: What are the drawbacks or limitations of current models in mathematical reasoning (e.g. robustness, generalization, and reasoning boundary)? How can these limitations be quantitatively analyzed?
2nd Generative AI for Biology Workshop
The 2024 Nobel Prize in Chemistry was awarded to AI-based protein structure prediction and protein design. It highlights the immense potential of AI in basic science and health research. In the meantime, generative AI models such as large language models (LLMs) and diffusion models are acquiring impressive capabilities in generating language, creating artwork, solving complex reasoning problems, writing computer programs, etc. To further facilitate the dialog between machine learning and biology, we propose to organize a workshop at ICML 2025, focusing on generative AI for biological discovery and therapeutic design. By fostering connections among preeminent researchers from both industry and academia, we aim to gain critical insights into the future of generative-AI-driven biology. Moreover, we hope to bridge the gap between machine learning and biological disciplines by focusing on three central themes that encapsulate innovative research as well as practical implications, which span both cutting-edge research and translational impact.
Assessing World Models: Methods and Metrics for Evaluating Understanding
Generative models across domains are capable of producing outputs that appear to mimic the real world. But have these systems actually understood the laws that govern the world? Researchers across subfields are attempting to answer this question: in natural language processing, researchers measure whether LLMs understand real-world mechanisms in order to measure how robust they are to new tasks; in video generation, researchers assess whether a model has understood the laws of physics in order to evaluate how realistic its videos are; in scientific domains, foundation models are being developed in order to uncover new theories about the world. Despite studying similar questions, these communities remain disparate. This workshop will explore the question: how can we formalize and evaluate whether generative models have understood the real world? While this question is important across communities, we don’t have unified frameworks for defining and evaluating world models. This workshop will bring together these computer science communities along with non-computer-science scientists working on relevant applications.Our invited speakers include Jacob Andreas, Shiry Ginosar, Shirley Ho, Sendhil Mullainathan, and Martin Wattenberg, all of whom have confirmed they will be speaking and that they can make it in-person.
ICML 2025 Workshop on Computational Optimization of Buildings (CO-BUILD)
Tokenization Workshop (TokShop)
Tokenization defines how data are represented as input and output for many current machine learning systems, including language models. Tokenization has been shown to significantly affect the utility and effectiveness of these models (Mielke et al., 2021). This finding has stirred considerable interest in tokenization as a research direction in machine learning and its subfields, such as natural language processing, but currently, there is no venue specifically dedicated to it. Our initiative—TokShop (Tokenization Workshop)—aims to fill this gap and will focus on tokenization in a broad sense.
3rd Workshop on High-dimensional Learning Dynamics (HiLD)
Modern machine learning applications face the challenge of extracting insights from high-dimensional datasets. The 3rd High-dimensional Learning Dynamics (HiLD) Workshop focuses on predicting and analyzing the behavior of learning algorithms in regimes where both the number of samples and parameters are large. This workshop aims to advance research and foster collaboration in several key areas:1. Developing tractable models and dynamical frameworks to explain phenomena observed in deep neural networks (DNNs) and foundation models;2. Establishing mathematical frameworks for neural scaling laws as network width and depth approach infinity;3. Identifying and characterizing relevant observable quantities in high-dimensional limits;4. Understanding the provable effects of optimization algorithms, hyperparameters, and neural architectures on training and test dynamics.The HiLD Workshop will unite experts from random matrix theory, optimization, high-dimensional statistics/probability, and statistical physics to share diverse perspectives on these challenges. By bringing together theorists and practitioners from machine learning with researchers from these adjacent fields, we aim to create new collaborations between communities that often do not interact. Through talks, poster sessions, and panel discussions, the workshop will explore the fundamental dynamics of learning algorithms in high-dimensional settings. This year's workshop theme is "Navigating Complexity: Feature Learning Dynamics at Scale."
2nd Workshop on Models of Human Feedback for AI Alignment (MoFA)
Our workshop brings together experts in machine learning, cognitive science, behavioral psychology, and economics to explore human-AI alignment by examining human (and AI) feedback mechanisms, their mathematical models, and practical implications. By fostering collaboration between technical and behavioral science communities, we aim to develop more realistic models of human feedback that can better inform the development of aligned AI systems.
1st Workshop on Foundation Models for Structured Data (FMSD)
Structured data foundation models are an emerging area of research undergoing rapid growth, yet they still remain critically under-explored relative to image and text modalities. So far, the different structured data sub-communities have had little opportunity to come together and share insights about how to build foundation models for structured data. Yet, strong synergies exist across modalities since models share similar pre-training and in-context learning paradigms. Furthermore, models trained on one modality can also demonstrate promising predictive performance in another. This workshop brings together the tabular and time series communities to jointly discuss foundation models for structured data, enabling the communities to capitalize on their synergies. We aim for advancements in foundation models that unify structured data modalities, addressing challenges of scalability and generalization across real-world applications. This emerging field promises to transform how we approach structured data analysis and drive new opportunities across various domains.
Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures
The scaling of model parameters has unlocked the groundbreaking capabilities of foundation models. Likewise, in human society, scaling and collaboration across individuals, organizations, companies, and nations amplify collective intelligence to unprecedented levels, enabling remarkable achievements that would be impossible for individuals alone, such as space exploration and autonomy. Could this principle of scaling~\cite{kaplan2020scaling} also apply to the growth in the number of agents? Multi-agent systems may offer a promising path forward. By progressively integrating more agents, multi-agent systems can activate diverse functionalities within these foundation model-powered generalist agents and coordinate a broader range of complementary functionalities. This synergy fosters improved problem-solving, adaptability, and decision-making capabilities. As the multi-agent system scales, it has a huge potential to achieve enhanced capabilities and tackle increasingly complex tasks, offering a promising solution toward the ultimate goal of achieving artificial general intelligence (AGI).
Machine Unlearning for Generative AI
Generative AI models are trained on internet-scale datasets, yielding powerful capabilities but also introducing risks like copyright infringement, PII leakage, and harmful knowledge. Targeted removal or unlearning of sensitive data is challenging, as retraining on curated sets is computationally expensive, driving research into machine unlearning and model editing. Yet approaches like RLHF only suppress undesirable outputs, leaving underlying knowledge vulnerable to adversarial extraction. This raises urgent privacy, security, and legal concerns, especially under the EU’s GDPR “right to be forgotten”. Because neural networks encode information across millions of parameters, precise deletion without degrading performance is complex, and adversarial or whitebox attacks can recover ostensibly erased data. This workshop brings together experts in AI safety, privacy, and policy to advance robust, verifiable unlearning methods, standardized evaluation frameworks, and theoretical foundations. By achieving true erasure, we aim to ensure AI can ethically and legally forget sensitive data while preserving broader utility.
CODEML: Championing Open-source DEvelopment in Machine Learning
Open-source software (OSS) development is a cornerstone of modern machine learning research. However, issues such as the sustainability of long-term projects, software reliability, and proper academic acknowledgment of maintenance and contributions are often overlooked. This workshop aims to identify and discuss strategies for successful and sustainable open-source development in ML while also proposing solutions to these challenges. Additionally, the workshop will provide a platform to recognize the efforts of open-source contributors in the field. We will bring together machine learning researchers, engineers, industrial practitioners, and software development experts. The workshop will feature invited talks, panel discussions with experts, and workshop paper submissions from open-source contributors in machine learning.
2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)
Deep learning has advanced by scaling datasets, models, and training computation. At the same time applications have broadened to many kinds of data (personal, scientific, …) and deployments (in clouds, on cars, …). Will these all be solved by more data, parameters, and training? Test-time updates are complementary, and can help on both foundation model servers and edge devices. This workshop examines train-time vs. test-time updates across scales by test-time adaptation, continual learning, in-context learning, and post-training model editing. The test begins now!
Scaling Up Intervention Models
Machine learning and AI have long been concerned about modeling how an agent can change the world around it. However, intervening in the physical world takes effort, leading to sparsity of evidence and the corresponding gaps of credibility when an agent considers carrying out previously unseen actions. Making the most of sparse data within a combinatorial explosion of possible actions, dose levels, and waiting times requires careful thinking, akin to efforts for introducing more compositionality principles into machine learning (Andreas, 2019). The goal of this workshop is to bring together state-of-the-art ideas on how to predict effects of novel interventions and distribution shifts by exploiting original ways of composing evidence from multiple data-generation regimes.
Machine Learning for Wireless Communication and Networks (ML4Wireless)
As wireless communication systems evolve to meet the demands of a hyper-connected world, artificial intelligence models are emerging as the driving force behind a new wave of technological innovation. This workshop will explore how state-of-the-art artificial intelligence and machine learning (ML) methods are poised to redefine the core of wireless networks providing solutions to old and new communication challenges. One of the central themes is semantic communication, where ML enables wireless networks to understand and transmit the meaning behind data, rather than the whole bitstream, drastically improving efficiency in bandwidth-constrained environments and presenting novel scenarios and possible applications that were not even conceivable a couple of years ago. Additionally, the rise of generative and language models for wireless communication is bringing new ways to compress and enhance signal transmissions, impacting several downstream applications such as autonomous driving, video streaming, and virtual reality. Concurrently with widening the range of applications, these models also bring novel challenges related to large models' computational demands or to the regenerated content's controllability and reliability. Central to bridging ML and wireless communication is the study of inverse problems, where generative models play a pivotal role in reconstructing lost or incomplete signals, and solving ill-posed tasks inherent in communication systems constrained by noisy and interference channels with limited bandwidth. The workshop aims also to explore key areas such as multimodal content compression, post-training quantization, efficient semantic feature extraction, and designing trustworthy models tailored for resource-constrained and noisy environments, in which foundational ML research finds crucial applications in communication scenarios.
Workshop Goals: This workshop aims to foster collaboration between ML researchers and wireless communication experts, encouraging cross-disciplinary innovation that will help shape the future of intelligent communication systems as well as more efficient and reliable AI models and techniques. Through a series of presentations, discussions, and interactive sessions, participants will explore both the theoretical foundations and practical applications of ML in wireless networks, with an eye toward addressing the most pressing challenges in this rapidly evolving field. On top of fostering collaborations and networking, we aim to boost research in machine learning and wireless communication topics by i) Hosting discussions about current wireless communication method limitations and how diverse ML models can empower communication systems by solving those challenges. ii) Encourage cross-collaboration between ML researchers and communication ones. iii) Giving space to younger researchers and PhD students to present their works and to get in contact with experts in this area, which is usually arduous in main conference tracks.
Why This Workshop at ICML? We know that artificial intelligence and machine learning models are driving technological transformations across numerous applications, with a particularly significant impact on wireless communication, given our daily reliance on smartphones and the emergence of connected intelligent devices, ranging from autonomous cars to mobile humanoids. Nonetheless, few ML researchers are actively contributing to wireless communication communities and venues, leaving researchers from this field alone in developing AI-powered methods and systems. On the other hand, ML researchers working on communication- potentially interesting topics like compression, quantization, inverse problems, or reliability sometimes lack real-world scenarios, datasets, or embedding systems to test their foundational research. We believe there is an unmet need to bridge the gap between the two research worlds. With this workshop, we aim to close this gap by fostering an active exchange and discussion between ML and communication researchers that can benefit both the research communities and establish a starting point for future collaborations and connections between the two worlds.
Actionable Interpretability
Interpretability research has advanced considerably in uncovering the inner mechanisms of artificial intelligence (AI) systems and has become a crucial subfield within AI. However, translating interpretability findings into actionable improvements in model design, training, and deployment remains a challenge. As a result, such insights have rarely influenced real-world AI development. This workshop addresses a key yet underexplored question: How can interpretability research drive tangible advancements in AI systems? By fostering discussions on the practical applications of interpretability, we aim to bridge this gap and highlight work that moves beyond analysis to achieve concrete improvements in model alignment, robustness, and domain-specific performance. Through this workshop, we strive to refocus interpretability research on actionable impact rather than just analysis, ensuring its insights lead to meaningful advancements in AI.
The Impact of Memorization on Trustworthy Foundation Models
Foundation models have come to underpin many critical applications, such as healthcare, public safety, and education. Ensuring their trustworthiness is, therefore, more important than ever. However, recent research has revealed that foundation models are prone to memorizing details or even entire samples from their training data. This issue can lead to privacy violations, intellectual property infringement, and societal harm when sensitive information is leaked. While unintended memorization risks the integrity of models, a certain degree of it is essential for solving novel and complex tasks, highlighting the importance of balancing performance with data leakage. Currently, isolated solutions are being developed across various research fields and data modalities, often without integration or coordination. This fragmentation can lead to duplicated efforts despite shared goals. The lack of interaction and exchange between research fields hinders progress in understanding and mitigating undesired memorization. In this workshop, we explore the causes and consequences of memorization from both theoretical and practical perspectives. We aim to connect insights from different research fields, including data privacy, ethics, and security in machine learning, to assess their impact on models and society and to explore innovative methods for mitigating associated risks. By bringing together researchers and practitioners from diverse fields, we seek to bridge the gap between research and real-world applications, fostering the development of trustworthy foundation models that benefit society without compromising sensitive data, intellectual property, or individual privacy.
Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences
Recent advances in foundation models and large language models (LLMs) have revolutionized life sciences by enabling AI-driven insights into complex biological systems. However, most existing models focus on single-modal data, limiting their ability to capture the inherently multi-modal nature of biological processes. This workshop will explore the development and application of multi-modal foundation models and LLMs that integrate diverse biological data types, such as protein sequences, structures, genomic and transcriptomic data, and metabolomics. By bringing together researchers from machine learning, computational biology, and biomedical sciences, the workshop will address challenges in modality fusion, cross-modal representation learning, scalable pretraining, and interpretability. Discussions will focus on novel architectures, self-supervised learning methods, and real-world applications in drug discovery, precision medicine, and multi-omics data analysis. Through invited talks, poster sessions, contributed presentations, and panel discussions, this workshop aims to advance multi-modal foundation models and LLMs for biological discovery and foster interdisciplinary collaborations that push the boundaries of machine learning in life sciences.
ES-FoMo III: 3rd Workshop on Efficient Systems for Foundation Models
As models increase in size and training budget, they not only systematically improve in upstream quality, but also exhibit novel emergent capabilities, unlocking new AI applications. These new capabilities have led to a paradigm shift: large foundation models have become predominant in natural language processing and are growing increasingly common in computer vision, audio processing and even robotics. This increase in scale raises proportionate difficulties for practitioners: foundation model training and inference lie at a unique interdisciplinary crossroad, combining open problems in algorithms, system design, and software engineering.
In response to these challenges, diverse research directions have spawned promising works: (1) training and inference either at large scale or in resource-constrained scenarios (e.g., with higher network latency and lower bandwidth, in a collaborative manner across a fleet of contributed devices, or with a single GPU); (2) large-scale distributed training approaches, such as 3D parallelism and sharding; and (3) deep system optimizations, with custom languages such as TVM and Triton. These novel interdisciplinary research directions directly shape and impact the trajectory of research across machine learning.
Accordingly, these emerging lines of research are increasingly relevant to machine learning researchers. Indeed, researchers are key stakeholders: on the one hand, researchers may contribute algorithmic insights and novel methods to improving training and inference of large models (e.g., recent award-winning papers at ICML and NeurIPS); on the other hand, novel research findings may be best demonstrated at scale --- which may require training models as efficiently as possible to make the best use of available resources.
The goal of this workshop is to bring together interdisciplinary experts working on the emerging research questions and challenges associated with foundation model training and inference. This would be the third installment of the ES-FoMo workshop at ICML. This year, we are bringing further focus on two trends observed in 2024 and early 2025: (1) test-time compute, popularized by OpenAI o1 and DeepSeek r1, and (2) the mergence of new modeling paradigms and modalities such as real-time video and decentralized training. We look forward to continuing to grow this community at ICML 2025.
Exploration in AI Today (EXAIT)
How can we efficiently collect observations for optimization, control, and generalization? This is a key challenge in AI and is known as the exploration problem. Effective exploration has driven progress in areas such as robotics, recommender systems, and scheduled medical trials. However, as we address larger, more complex applications—such as drug discovery or language modeling—the exceptionally large search spaces render traditional exploration algorithms ineffective. As a result, recent breakthroughs in AI have come not from traditional exploration algorithms, but largely from training large foundation models on diverse corpora of pre-existing, curated datasets. Despite this, we have witnessed sparks showing that exploration, when done right, can compensate for data and computation—for example, in the training of DeepSeek-R1—suggesting that exploration can still play a key role in AI today.
The Exploration in AI Today (EXAIT) Workshop at ICML 2025 will focus on addressing the evolving role of exploration in AI. We will dwell on the question: what is the place of exploration in today’s AI landscape and in which settings can exploration algorithms address current open challenges? In particular, we consider the potentially pivotal role that exploration might play in navigating complex and high-dimensional search spaces across real-world applications such as robotics, large language model alignment, and AI for science.
Workshop on Computer Use Agents
Computer use models are attracting significant interest in academia and industry due to their ability to perform complex tasks in non-deterministic environments. However, they are far from being ready for unattended deployment, as evidenced by their performance on the OSWorld benchmark where they achieve only a small fraction of human performance. The rapid evolution of these agents raises important questions regarding their accuracy, safe deployment, and potential impact on the future of work. The topics we would like to cover are:- Learning Algorithms --- which new architectures and learning techniques (e.g. memory mechanisms for extended tasks, exploration strategies) can enhance the intrinsic ability of computer use agents to acquire, represent, and refine knowledge?- Orchestration --- what novel frameworks or control methods (e.g. dynamic task planning, modular coordination, multi-agent systems) can efficiently manage and integrate multiple learning components to optimize overall agent performance?- Interfaces --- how should agents perceive and act within their environments (e.g., via APIs or UI interactions), and should we design unified systems or specialized agents for different modalities?- Guardrails, safety \& societal implications --- what guardrails do we need in order to make computer use models safe for deployment ``in the wild'' while ensuring that they have a positive impact on society?- Benchmarking \& tools --- how can we develop robust environments and evaluation metrics that capture the diversity of real-world settings? Do we need new tools or frameworks to make research on computer use more efficient and accessible?- Human-agent interaction --- how will future interactions evolve? Should we optimize agents for full autonomy or design them as personalized, human-centric collaborators?- Broader applications --- what are some practical applications for computer use agents across domains such as healthcare, scientific research, software engineering and testing etc.?- Capability horizon --- what breakthroughs or engineering challenges are required to enable agents orders of magnitude more capable than today, and what implications would such advances have?
ICML 2025 Workshop on Collaborative and Federated Agentic Workflows (CFAgentic @ ICML'25)
This workshop aims to provide a platform for discussing the convergence of collaborative and federated learning with agentic workflows - an emerging class of AI systems capable of autonomously executing complex task sequences. We aim to facilitate an engaging discussion among scholars and practitioners by soliciting work addressing key challenges in precision, efficiency, and personalization, safety & security, as well as regulatory compliance in the development of collaborative and federated agentic workflows.
Methods and Opportunities at Small Scale (MOSS)
The increasing computational demands of modern ML create a critical challenge: thorough experimentation becomes prohibitively expensive precisely when we most need to understand and steer model behavior. Small-scale experiments (<= 1 GPU) offer a powerful approach for systematic investigation, enabling both scientific understanding and practical advances. Recent work demonstrates the endless opportunities at this scale, including: diagnoses and mitigations of training pathologies; minimalistic replications of modern pipelines; elementary synthetic tasks that “stress test” architectures and motivate new designs; and discovery of intriguing phenomena.This workshop aims to highlight how methods and opportunities at small scale can unlock new insights and drive progress. The emphasis will be on advancing scientific understanding (and, optionally, its interplay with theory), without the need to improve state-of-the-art performance.
Building Physically Plausible World Models
The goal of this workshop is to exchange ideas and establish communications among researchers working on building gener- alizable world models that describe how the physical world evolves in response to interacting agents (e.g. human and robots). Large-scale datasets of videos, images, and text hold the key for learning generalizable world models that are visually plau- sible. However, distilling useful physical information from such diverse unstructured data is challenging and requires careful attention to data curation, developing scalable algorithms, and implementing suitable training curricula. On the other hand, physics-based priors can enable learning plausible scene dynamics but it is difficult to scale to complex phenomenon that lack efficient solvers or even governing dynamic equations. Developing general world models that can simulate complex real-world phenomenon in a physically-plausible fashion can unlock enormous opportunities in generative modeling and robotics, and would be of wide interest to the larger AI community, and we believe this workshop falls at an ideal timing given recent signif- icant progress in both video-modeling models and physics-based simulation. This workshop aims to bring together researchers in machine learning, robotics, physics-based simulation, and computer vision broadly aspiring to build scalable world models by utilizing internet data, simulation, and beyond in myriad ways.
The 2nd Workshop on Reliable and Responsible Foundation Models
Foundation models (FMs), with their emergent and reasoning abilities, are reshaping the future of scientific research and broader human society. However, as their intelligence approaches or surpasses that of humans, concerns arise regarding their responsible use in real-world applications, such as reliability, safety, transparency, and ethics. The workshop on reliable and responsible FMs delves into the urgent need to ensure that such models align with human values. The significance of this topic cannot be overstated, as the real-world implications of these models impact everything from daily information access to critical decision-making in fields like medicine and finance, especially for embodied FMs that directly interact with the physical world. Stakeholders, including developers, practitioners, and policymakers, care deeply about this because the reliable and responsible design, deployment, and oversight of these models dictate not only the success of AI solutions but also the preservation of societal norms, order, equity, and fairness. Some of the fundamental questions that this workshop aims to address are:* Diagnosis: How can we identify and characterize unreliable and irresponsible behaviors in FMs? Topics include prompt sensitivity, lack of self-consistency, and hallucinations in generation.* Evaluation: How should we assess the harmful capabilities of FMs and quantify their societal impact? * Sources: How can we pinpoint and understand the known or emerging sources of FM unreliability? This involves examining training data, optimization objectives, and architectural design.* Generalization: How can responsible and reliable properties be effectively adapted to increasingly advanced FMs, particularly as they incorporate new features such as more modalities or long CoT? * Governance: What principles or guidelines should inform the next generation of FMs to ensure they are reliable and responsible? How can real-time monitoring of these FMs be enabled?* Guarantee: Can we establish theoretical frameworks for reliably and responsibly provable FMs?* Practice: How to leverage domain-specific knowledge to guide FMs towards improved reliability and responsibility across diverse areas, such as drug discovery, education, or clinical health?
DIG-BUGS: Data in Generative Models (The Bad, the Ugly, and the Greats)
Generative models have become extremely powerful and are now integral to various aspects of daily life from creative arts to customer service. Given their increasing interaction with people, ensuring their trustworthiness is crucial. This workshop centers on the idea that the safety and reliability of generative models are deeply connected to the nature and treatment of their training data. We aim to explore the hypothesis that building reliable and trustworthy artificial intelligence (AI) systems based on generative models must start with high-quality and responsibly managed data.The workshop will focus on several key areas where training data impacts the trustworthiness of generative models. Among others, we will address 1) privacy concerns, highlighting how improper inclusion and handling of sensitive information in the training data can lead to significant privacy violations; 2) safety risks, like backdoors and data poisoning that threaten robust generations; and 3) the impact of biases in generative models' training data, which can cause models to perpetuate or even amplify societal biases, resulting in unfair outcomes.Through expert talks, panel discussions, and interactive sessions, participants will delve into these issues and explore strategies for developing safe, trustworthy, and reliable generative models. This workshop aims to foster collaboration and drive forward research to ensure that generative models, as they become more embedded in our lives, do so in a trustworthy and beneficial manner.
DataWorld: Unifying data curation frameworks across domains
Recently, data-centric research, which has historically taken a backseat to model-centric research, has assumed a central role in the machine learning community. Our workshop aims to explore data-centric methods and theory, with a particular emphasis on real-world data curation. By curation, we mean the set of actions taken by some curator(s) to transition from ideation to a complete dataset. Our topic is wide-ranging, with recent work studying everything from sourcing, to benchmarks.One area that remains relatively underexplored is how data-centric methods can perform differently, depending on the modality and the domain of the data and the downstream application. Which lessons can be shared across domains and modalities, and which cannot? For example, a common part of the data pipeline involves data filtration. Filtration, in domains like medical imaging and wildlife camera traps, faces similar challenges including long-tailed distributions and natural distribution shifts (between hospitals and camera locations, respectively). However, the two domains differ in the types of distribution shift encountered (covariate vs. label vs. subpopulation) and dataset scale (there are generally more camera trap images than medical scans). Another example is the fact that most successful filtration methods in the recent DataComp benchmark tend to disproportionately remove images with non-English captions. Such methods not only degrade the performance on non-English benchmarks, but are also not generalizable to other domains and most real-world applications. Our workshop will invite novel research which seeks to unify seemingly disparate frameworks for data curation; where this is impossible, we hope that the necessary trade-offs and domain-specific challenges will be made clearer.
TerraBytes: Towards global datasets and models for Earth Observation
Earth Observation presents unique challenges for machine learning due to its non-stationary data distribution, spatio-temporal biases, and multimodal nature. TerraBytes aims to address these challenges by fostering discussions at the intersection of data curation, machine learning, and remote sensing. The workshop focuses on (1) curating less biased, globally representative EO datasets, (2) developing adaptable ML models for EO applications, and (3) bridging the gap between data acquisition and ML communities. By promoting interdisciplinary collaboration, TerraBytes seeks to advance EO research and enable inclusive, fair, and impactful applications.
AI Heard That! ICML 2025 Workshop on Machine Learning for Audio
The Machine Learning for Audio workshop at ICML 2025 will cover a broad range of tasks and challenges involving audio data. These include, but are not limited to: methods of speech modeling, environmental sound generation or other forms of ambient sound, novel generative models, music generation in the form of raw audio, text-to-speech methods, denoising of speech and music, data augmentation, classification of acoustic events, transcription, source separation, and multimodal problems.
Workshop on Technical AI Governance
As the development and use of AI systems expands, policymakers increasingly recognize the need for targeted actions that promote beneficial outcomes while mitigating potential harms. Yet there is often a gap between these policy goals and the technical knowledge required for effective implementation, risking ineffective or actively harmful results (Reuel et al., 2024b). Technical AI governance—a nascent field focused on providing analyses and tools to guide policy decisions and enhance policy implementation—currently lacks sufficient venues for exchanging scholarly work. This workshop aims to provide such a venue, fostering interdisciplinary dialogue between machine learning researchers and policy experts by ensuring each submission is reviewed by both technical and policy specialists. Through this collaboration, we seek to accelerate the development of robust governance strategies that lead to safer, more equitable AI systems.
The Second Workshop on Long-Context Foundation Models
Foundation models have become a cornerstone in the advancement of artificial intelligence, enabling applications across a wide range of domains. Many complex tasks today require processing and synthesizing information over thousands to millions of individual pieces of data, from text and images to audio and genomic sequences. Recent progress in long-context models has made it possible to handle such extensive inputs, but significant challenges remain, particularly in terms of computational efficiency, data quality and quantity, and evaluation. This workshop will convene researchers to explore these challenges and foster developments in long-context foundation models. Key topics include new modeling architectures, training approaches, efficiency techniques, and comprehensive evaluation methods. Additionally, in this edition, special attention will be given to long-context reasoning, multimodal learning, and applications in scientific fields such as genomics, climate science, etc. By tackling these critical challenges, we aim to push the boundaries of long-context modeling and shape its future directions.