Poster Tue, Jul 7, 2026 • 6:30 PM – 8:15 PM PDT HALL A #2706

GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models

Shangyu Xing ⋅ Changhao Xiang ⋅ Xinyu Liu ⋅ Zhangtai Wu ⋅ Zhen Wu ⋅ Yue YIfan ⋅ Yuteng Han ⋅ Fei Zhao ⋅ Xinyu Dai

Abstract

Geometric shapes play important roles in both physical world and human cognition. While multimodal large language models (MLLMs) have made significant advancements in visual understanding, their abilities to recognize geometric shapes and their spatial relationships, which we term geometric perception, are not explicitly and systematically explored. To address this gap, we introduce GePBench, a novel benchmark specifically designed to assess the geometric perception capabilities of MLLMs. Our extensive evaluations reveal that even the current state-of-the-art MLLMs exhibit significant deficiencies in geometric perception tasks. Furthermore, we show that models trained with GePBench data demonstrate considerable improvements on a wide range of downstream tasks, highlighting the critical role of geometric perception in enabling advanced multimodal applications. Our code and datasets will be publicly available.