Spotlight Poster Tue, Jul 15, 2025 • 11:00 AM – 1:30 PM PDT

Ad-Hoc Human-AI Coordination Challenge

Tin Dizdarevic · Ravi Hammond · Tobias Gessler · Anisoara Calinescu · Jonathan Cook · Matteo Gallici · Andrei Lupu · Jakob Foerster

Project Page [ Poster] [ OpenReview]

Abstract

Achieving seamless coordination between AI agents and humans is crucial for real-world applications, yet it remains a significant open challenge. Hanabi is a cooperative card game featuring imperfect information, constrained communication, theory of mind requirements, and coordinated action -- making it an ideal testbed for human-AI coordination. However, its use for human-AI interaction has been limited by the challenges of human evaluation. In this work, we introduce the Ad-Hoc Human-AI Coordination Challenge (AH2AC2) to overcome the constraints of costly and difficult-to-reproduce human evaluations. We develop \textit{human proxy agents} on a large-scale human dataset that serve as robust, cheap, and reproducible human-like evaluation partners in AH2AC2. To encourage the development of data-efficient methods, we open-source a dataset of 3,079 games, deliberately limiting the amount of available human gameplay data. We present baseline results for both two- and three- player Hanabi scenarios. To ensure fair evaluation, we host the proxy agents through a controlled evaluation system rather than releasing them publicly. The code is available at \href{https://github.com/FLAIROx/ah2ac2}{https://github.com/FLAIROx/ah2ac2}.

Lay Summary

Making AI that can smoothly work with humans is a big challenge, especially because testing if an AI is a good human teammate is often costly, inconsistent, and hard to repeat. To tackle this, we created the "Ad-Hoc Human-AI Coordination Challenge" (AH2AC2), using the cooperative card game Hanabi as a testing ground. We trained special AI "stand-ins" (human proxies) on thousands of real human games to act like human players. Researchers can now test their own AI agents by having them play Hanabi with our proxies through a controlled online system, and we provide a small public dataset of human games to help them get started. This new challenge provides a fair, affordable, and repeatable way to see how well AIs can coordinate with human-like partners, aiming to speed up progress in building AI that can truly collaborate with people and learn to do so without needing massive amounts of human data.

Video

Chat is not available.