Op-CAD: Benchmarking and Investigating Operation-oriented CAD Generation
Abstract
Recent research has made growing efforts to leverage large language models (LLMs) for computer-aided design (CAD), a domain that demands advanced geometric and spatial reasoning across long operation sequence. However, existing studies remain limited in addressing complex modeling tasks that necessitate step-by-step reasoning, primarily due to the scarcity of high-quality CAD datasets and the absence of fine-grained evaluation frameworks. In response to these challenges, we introduce Op-CAD, the first large-scale, multi-modal dataset for operation-oriented CAD generation, encompassing four operation types and five modalities. Furthermore, we introduce a novel CAD parsing module together with a geometry-guided hierarchical annotation pipeline, which decomposes modeling sequences into discrete operations and substantially improves the annotation accuracy of Vision-Language Models (VLMs). Based on our dataset, we redefine the CAD modeling task by decoupling geometric and spatial perspectives and introduce a novel metric, Chamfer/Fillet Intersection over Union (CF-IoU), to fill the void in assessing chamfer and fillet operations. By comprehensively evaluating eight LLMs on Op-CAD, we establish a benchmark for current models on operation-oriented tasks. Finally, we investigate performance enhancement strategies through fine-tuning on Op-CAD and propose Chain-of-Operation (COOP), a novel prompting strategy that emulates human-engineer reasoning.