Poster
in
Workshop: Multi-modal Foundation Model meets Embodied AI (MFM-EAI)
MAP-THOR: Benchmarking Long-Horizon Multi-Agent Planning Frameworks in Partially Observable Environments
Siddharth Nagar Nayak · Adelmo Orozco · Marina Have · Vittal Thirumalai · Jackson Zhang · Darren Chen · Aditya Kapoor · Eric Robinson · Karthik Gopalakrishnan · brian ichter · James Harrison · Anuj Mahajan · Hamsa Balakrishnan
Evaluating embodied multi-agent planners necessitates robust and versatile benchmarks. We introduce MAP-THOR (Multi-Agent Planning in AI2-THOR), a benchmark specifically designed to assess the performance of embodied multi-agent planning systems in realistic, partially observable environments within the AI2-THOR environment. Existing benchmarks offer extensive environments for single-agent tasks, but fail to capture the complexities inherent in multi-agent interactions, non-stationarity, partial observability and long-horizon planning. Addressing these gaps, MAP-THOR facilitates the development of frameworks that allocate tasks and enable coordination among multiple agents. MAP-THOR introduces a comprehensive suite of household tasks demanding collaboration and adaptation to dynamic environmental changes, mirroring real-world scenarios. Our benchmark includes detailed metrics for success rate, efficiency, and collaborative effectiveness, setting a new standard for evaluating multi-agent planning systems. Through rigorous experiments, we show that MAP-THOR offers a robust evaluation framework for language model (LM)-based multi-agent planning systems. Ultimately, we hope that MAP-THOR serves as a standard benchmark to identify embodied multi-agent planning frameworks that systematically improve generalization for long-horizon partially observable planning.