Bridging Tokens and Geometry: Token-wise 3D Supervision for CAD Generation
Abstract
Computer-Aided Design (CAD) generation is typically formulated as a sequence modeling task over parametric tokens. Recent studies introduce visual information through additional visual inputs or rendering of the final generated programs. However, these methods provide no intermediate visual feedback, hindering the association of individual tokens with their geometric effects. In this work, we propose an Argument-induced 3D Point Loss (A3PL) that maps argument tokens to corresponding 3D points, enabling dense token-wise geometric supervision. To reduce learning complexity and invalid sequences, we further introduce a Grammar-constrained Operator (GCO) that leverages the structured nature of CAD programs to regulate sequence generation. We evaluate our approach on five CAD generation tasks with diverse input modalities, including text, Scalable Vector Graphics (SVG) sketches, point clouds, and CAD sequences. Our approach improves generation accuracy and program validity across different input modalities. Code and dataset are made publicly available.