Poster
in
Workshop: Multi-modal Foundation Model meets Embodied AI (MFM-EAI)

BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks

Stephanie Milani ⋅ Anssi Kanervisto ⋅ Karolis Jucys ⋅ Sander Schulhoff ⋅ Brandon Houghton ⋅ Rohin Shah

Project Page [ OpenReview]

Abstract

The MineRL BASALT competition has catalyzed advances in learning from human feedback through four hard-to-specify tasks in Minecraft, such as create and photograph a waterfall. Building on two successful years of competitions, we introduce the BASALT Evaluation and Demonstrations Dataset (BEDD), a resource for algorithm development and performance assessment. BEDD contains 26 million image-action pairs from nearly 14,000 videos of human players completing the BASALT tasks. It also includes over 3,000 dense pairwise human evaluations of both human and algorithmic agents, complete with natural language justifications for the preference assessments. Collectively, these components are designed to support the development and evaluation of multi-modal AI systems in the context of Minecraft. The code and data are available at: https://github.com/minerllabs/basalt-benchmark.

Chat is not available.