GIFT: Bootstrapping Image-to-CAD Program Synthesis via Geometric Feedback
Abstract
Mapping images to executable CAD programs is a central challenge in generative design, yet aligning visual inputs with symbolic code remains difficult. Existing approaches typically rely on brittle supervised fine-tuning or costly online reinforcement learning to overcome data limitations. In this work, we ask: how far can we push performance by leveraging test-time compute to bootstrap an augmented training set? We identify the primary bottleneck as the scarcity of diverse data aligning visual geometry with program syntax, rather than model capacity. To address this, we introduce Geometric Inference Feedback Tuning (GIFT), a framework that uses geometric feedback to generate high-quality data augmentations. GIFT systematically analyzes model failures via inference-time scaling, verifying geometric accuracy with a CAD kernel. GIFT bootstraps and curates an alignment dataset through two core mechanisms: Soft-Rejection Sampling (SRS), which captures diverse valid programs beyond ground-truth matching, and Failure-Driven Augmentation (FDA), which improves robustness by re-purposing rendered near-miss failures as synthetic training examples to cover hard negative geometries. By amortizing these insights into the model weights, GIFT matches the performance of extensive test-time scaling with an 80 % reduction in inference compute. It outperforms strong baselines by 12 % and remains competitive with complex multimodal systems, all without additional supervision or specialized architectures.