This meeting room is for ICML delegates to relax and recharge in a comfortable environment.
New In ML
The New In ML workshop is an affinity workshop which designed to empower early-career machine learning researchers by providing mentorship, practical guidance, and an inclusive forum for professional growth. This workshop will feature keynote sessions, including a discussion on AI research principles (including AI safety), as well as targeted talks from leaders in academia and industry, covering topics such as research best practices, reproducibility, and the effective use of large language models for coding, writing, and reviewing. Participants are invited to submit their work through a dual-track call for papers, comprising a main track for preliminary research and a reproducibility track aimed at validating existing findings. Through interactive sessions, personalized feedback from senior reviewers, and comprehensive career guidance, the workshop aims to cultivate a collaborative community that supports the transition of new researchers into competitive, impactful contributors in the field of machine learning.
AI's Models of the World, and Ours
Many different threads in recent work on generative AI address the simultaneous challenge of evaluating an AI system's explicit behavior at one level and its implicit representations of the world at another. Such distinctions become crucial as we interact with powerful AI systems, where a mismatch between the system's model of the world and our model of the world can lead to measurable situations in which the system has inadvertently `set us up to fail' through our interaction with it. We explore these questions through the lens of generation, drawing on examples from game-playing, geographic navigation, and other complex tasks: When we train a model to win chess games, what happens when we pair it with a weaker partner who makes some of the moves? When we train a model to find shortest paths, what happens when it has to deal with unexpected detours? The picture we construct is further complicated by theoretical results indicating that successful generation can be achieved even by agents that are provably incapable of identifying the model they're generating from.
The talk will include joint work with Ashton Anderson, Karim Hamade, Reid McIlroy-Young, Siddhartha Sen, Justin Chen, Sendhil Mullainathan, Ashesh Rambachan, Keyon Vafa, and Fan Wei.
Science communication skills are often lacking from academic programs, but knowing how to explain your research effectively will help you when presenting it to your peers, performing in a job interview, or soliciting funding for a project. This hands-on session will give you practical tips and exercises to craft a short, effective and accessible overview of your work for a wide range of audiences and applications.
Generative AI's Collision with Copyright Law
The development of generative AI models has understandably caused considerable excitement among machine learning professionals. Few have paid attention to the potential copyright implications of using massive amounts of data publicly available on the Internet to train these models. Commercial developers in the U.S. have expressed confidence that the copyright doctrine of fair use would shield them from liability. In the EU, recently adopted text and data mining exceptions seemed to legalize generative AI training. Israel and Japan have similar rules. But with more than forty copyright-related lawsuits pending against the largest generative AI developers in the U.S. and now a few in Canada, and with the EU and UK aiming to require compliance with their laws, copyright is looming large in the future of generative AI developers. While it is seemingly impossible to create a global licensing regime that would cover all uses of all in-copyright works as training data, proposals to establish collective licensing regimes are under discussion in the EU, UK, and U.S. The machine learning community needs to understand enough about these copyright debates to participate meaningfully in shaping legal environments that will foster innovation in this field, support scientific research, create socially valuable tools, and treat works and their authors with respect.
Muslims in ML Social
This social aims to create a welcoming and inclusive space for Muslim researchers, and students at ICML to connect, support one another, and build community. Everyone is welcome—regardless of background, identity, or beliefs.
The session will include:
- A brief welcome and introductions
- 1:1 and small group mentorship matching (covering topics like graduate school, industry, and academic careers)
- Informal networking over refreshments and open discussion
This event is designed to foster meaningful connections, provide career guidance, and offer a relaxed environment for reflection and support.
Multi-Agent Systems in Ambient Settings
- From Lab to Life: Orchestrating Ambient Agents in the Real World
Rapael Kalandadze (AI Lab, Wandero)
> This talk explores the shift of multi-agent systems from controlled experiments to real-world deployment. We'll examine key challenges, effective strategies, and practical examples of building systems that truly work. This isn't science fiction anymore - it's large-scale system design in action.
2. Teaching Ambient Agents to Understand and Pursue Human Intent
Shirley Wu (Stanford, Microsoft Research)
> This talk explores how long-term alignment strategies can make ambient agent systems more helpful, efficient, and truly human-centered. Shirley Wu presents CollabLLM, a framework that trains agents to look beyond immediate replies by simulating multi-turn interactions and rewarding responses that advance conversations over time. The result: proactive agents that clarify intent, surface missing context, and collaborate more naturally in ambient, ongoing settings.
3. Safety Guarantees for Ambient Agents via Asking for Help
Benjamin Plaut (UC Berkeley, Stanford)
> Most reinforcement learning algorithms essentially rely on trial-and-error: they explore all possible behaviors and see what works well. However, this approach is problematic when some actions are "catastrophic", i.e., irreparable. Ambient computer-use agents have access to many irreparable actions, such as deleting crucial files or sending disastrous emails. We show that designing agents to ask for help in unfamiliar situations improves safety both theoretically and empirically. We believe this is a first step towards a scalable foundation for trustworthy always-on AI systems.
4. WATCHing the Watchers: Real-Time Monitoring for Safer AI Agents.
Drew Prinster (John Hopkins, Yale)
> This talk explores how adaptive monitoring systems can detect, interpret, and respond to failures in long-running AI agents. As agentic systems move from lab to deployment - often operating without constant human oversight - the need for robust, real-time monitoring becomes critical. Drew Prinster presents WATCH, a statistical monitoring framework that rapidly detects performance shifts, distinguishes harmless from dangerous changes, and pinpoints the cause of degradation. This approach enables safer, more reliable deployment of AI in dynamic, high-stakes environments like healthcare or large-scale interactive systems, where false alarms and undetected failures both carry serious consequences.
Panel Discussion - “The Ambient Shift: Redefining Intelligence, Safety & Exploration in Multi-Agent Systems”
Panelists: Yiding Jiang (Carnegie Melon, Google Research),
Clément Romac (Huggingface, Inria), Jindong Wang (William & Mary, Microsoft Research)
> As AI agents move from isolated chats to always-on ambient systems fundamental questions arise: How should these agents explore, generalize, and align with user goals in dynamic, high-dimensional environments? How can we trust them when they act autonomously and concurrently? And what new infrastructures are needed to support this paradigm shift? How can we train LLMs to be fully reliable in production environments?
This panel brings together leading researchers at the intersection of curiosity-driven learning, agent safety, and evaluation to reimagine agent intelligence in ambient settings. This discussion will surface the key redefinitions shaping the future of multi-agent systems.
How to Break Into an Industry Research Lab
However, at top labs, researchers are expected not only to build world-class systems - but also to navigate complex org structures, advocate for themselves, secure resources, and align with fast-moving business priorities.
But - many early-career researchers have never recruited, negotiated, or managed stakeholder politics. The result? A massive information asymmetry - where super capable researchers struggle to advance simply because they don’t have access to the unspoken rules.
This social aims to level the playing field.
The panelists will share hard-earned insights on how to break into industry, choose the right company and team, navigate interviews and negotiation, and set yourself up for long-term success.
We want attendees to walk away with less fear, more clarity, and an actionable playbook for launching a successful research career in industry!
🎁 Bonuses:
Everyone who attends will also receive:
- A 62-page Technical Interview Guide for AI Researchers with real interview questions from the OpenAI, Anthropic, and Microsoft interview loops
- The scripts used to negotiate $$75K in additional compensation at every FAANG company
AI Security & Policy Social
How can we extract deeper insights from LLM evaluations?
Join experts from the UK AI Security Institute for an interactive discussion at ICML focused on improving how we analyse, interpret, and act on evaluation data for frontier AI systems. As large language models become more capable and influential, evaluations have become a cornerstone of scientific understanding, safety assessments, and deployment decisions. Yet current evaluation designs and methodologies are often poorly suited to answering the questions we care most about—such as uncovering latent capabilities, forecasting performance trajectories, and identifying dangerous failure modes.
This session will explore four key dimensions of evaluation methodology: developing tools for richer evaluation-data analysis; advancing statistical techniques for uncertainty and variability; building efficient evaluation pipelines that prioritise signal-rich tasks; and mapping evaluation results onto capability or risk thresholds. We’ll identify open research questions, promising methodological directions, and opportunities for collaboration to make evaluations more rigorous, interpretable, and decision-relevant.
Whether you you are an eval designer yourself, train your own models, or work on risks related to safety and misuse, this session will help you think critically about the importance of evaluation insights to your own work.