Skip to yearly menu bar Skip to main content


Workshop

Workshop on Computer Use Agents

Andrei Nica · Doina Precup · David Barber · Roberta Raileanu · Roberta Raileanu · Boyuan Zheng · Shuyan Zhou

Computer use models are attracting significant interest in academia and industry due to their ability to perform complex tasks in non-deterministic environments. However, they are far from being ready for unattended deployment, as evidenced by their performance on the OSWorld benchmark where they achieve only a small fraction of human performance. The rapid evolution of these agents raises important questions regarding their accuracy, safe deployment, and potential impact on the future of work. The topics we would like to cover are:

  • Learning Algorithms --- which new architectures and learning techniques (e.g. memory mechanisms for extended tasks, exploration strategies) can enhance the intrinsic ability of computer use agents to acquire, represent, and refine knowledge?
  • Orchestration --- what novel frameworks or control methods (e.g. dynamic task planning, modular coordination, multi-agent systems) can efficiently manage and integrate multiple learning components to optimize overall agent performance?
  • Interfaces --- how should agents perceive and act within their environments (e.g., via APIs or UI interactions), and should we design unified systems or specialized agents for different modalities?
  • Guardrails, safety \& societal implications --- what guardrails do we need in order to make computer use models safe for deployment ``in the wild'' while ensuring that they have a positive impact on society?
  • Benchmarking \& tools --- how can we develop robust environments and evaluation metrics that capture the diversity of real-world settings? Do we need new tools or frameworks to make research on computer use more efficient and accessible?
  • Human-agent interaction --- how will future interactions evolve? Should we optimize agents for full autonomy or design them as personalized, human-centric collaborators?
  • Broader applications --- what are some practical applications for computer use agents across domains such as healthcare, scientific research, software engineering and testing etc.?
  • Capability horizon --- what breakthroughs or engineering challenges are required to enable agents orders of magnitude more capable than today, and what implications would such advances have?

Live content is unavailable. Log in and register to view live content