Workshop

Workshop on Computer Use Agents

David Barber · Doina Precup · Andrei Nica · Roberta Raileanu · Harshil Shah · Boyuan Zheng · Shuyan Zhou

Project Page [ OpenReview]

Abstract

Computer use models are attracting significant interest in academia and industry due to their ability to perform complex tasks in non-deterministic environments. However, they are far from being ready for unattended deployment, as evidenced by their performance on the OSWorld benchmark where they achieve only a small fraction of human performance. The rapid evolution of these agents raises important questions regarding their accuracy, safe deployment, and potential impact on the future of work. The topics we would like to cover are:- Learning Algorithms --- which new architectures and learning techniques (e.g. memory mechanisms for extended tasks, exploration strategies) can enhance the intrinsic ability of computer use agents to acquire, represent, and refine knowledge?- Orchestration --- what novel frameworks or control methods (e.g. dynamic task planning, modular coordination, multi-agent systems) can efficiently manage and integrate multiple learning components to optimize overall agent performance?- Interfaces --- how should agents perceive and act within their environments (e.g., via APIs or UI interactions), and should we design unified systems or specialized agents for different modalities?- Guardrails, safety \& societal implications --- what guardrails do we need in order to make computer use models safe for deployment ``in the wild'' while ensuring that they have a positive impact on society?- Benchmarking \& tools --- how can we develop robust environments and evaluation metrics that capture the diversity of real-world settings? Do we need new tools or frameworks to make research on computer use more efficient and accessible?- Human-agent interaction --- how will future interactions evolve? Should we optimize agents for full autonomy or design them as personalized, human-centric collaborators?- Broader applications --- what are some practical applications for computer use agents across domains such as healthcare, scientific research, software engineering and testing etc.?- Capability horizon --- what breakthroughs or engineering challenges are required to enable agents orders of magnitude more capable than today, and what implications would such advances have?

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

8:30 AM

Opening remarks

Video

8:40 AM

Nouha Dziri - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

Video

9:05 AM

Zhiyong Wu - Large Scale Reinforcement Leanring for General Computer Agents

Video

9:30 AM

Spotlight "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows" - Rob Xiangru Tang

Video

9:40 AM

Posters & Coffee break

10:30 AM

Qingyun Wu

Video

10:55 AM

Yu Su - The Intelligence Feedback Loop: From Biological Inspiration to Augmented Cognition

Video

11:20 AM

Ruslan Salakhutdinov - Scaling up Multimodal AI Agents

Video

11:45 AM

Spotlight "Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search" - Sam Holt

Video

1:00 PM

Sercan Arık

Video

1:15 PM

Panel discussion - Ruslan Salakhutdinov, Alexandre Drouin, Qingyun Wu, Victor Zhong, Nouha Dziri, Yu Su

Video

2:15 PM

Alexandre Drouin - Computer-use agents in the enterprise: progress and key challenges

Video

2:40 PM

Spotlight "How to Train Your LLM Web Agent: A Statistical Diagnosis" - Massimo Caccia

Video

2:50 PM

Spotlight "OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents" - Maksym Andriushchenko

Video

3:00 PM

Posters & Coffee break

3:50 PM

Graham Neubig

Video

4:05 PM

Victor Zhong - Building and Evaluating Generalist Agents

Video

4:30 PM

Alane Suhr - Training Language-Conditioned Agents with Reinforcement Learning

Video

4:55 PM

Closing Remarks

Video

5:05 PM

Poster & Social

Weathering the CUA Storm: Mapping Security Threats in the Rapid Rise of Computer Use Agents

Dan Jones · Martin Pouliot · Giorgio Severi · Joris de Gruyter · Gary Lopez Munoz · Santiago Zanella-Beguelin · Justin Song · Amanda Minnich · Pamela Cortez