Infinite-Precision Autoregressive Modeling for Vector Graphics and Layouts
Abstract
Transformer-based autoregressive models excel in data generation but are inherently constrained by their reliance on discretized tokens, which limits their ability to represent continuous values with high precision. We analyze the scalability limitations of existing discretization-based approaches for generating hybrid discrete-continuous sequences, particularly in high-precision domains such as logos, layouts, and semiconductor circuit designs, where precision loss potentially leads to visual artifacts, aesthetic degradation, and even functional failure. To address the challenge, we propose a novel unified framework that jointly models discrete and continuous values for variable-length sequences. Our approach employs a hybrid approach that combines categorical prediction for discrete values with diffusion-based modeling for continuous values, incorporating two key technical components: an end-of-sequence (EOS) logit adjustment mechanism that uses an MLP to dynamically adjust EOS token logits based on sequence context, and a length regularization term integrated into the loss function. Additionally, we present ContLayNet, a large-scale benchmark comprising 334K high-precision semiconductor layout samples with specialized evaluation metrics that capture functional correctness, where precision errors significantly impact performance. Experiments on semiconductor layouts (ContLayNet), graphic layouts, and SVGs demonstrate that our approach achieves higher-fidelity hybrid vector representations than discretization-based and fixed-schema baselines, while scaling to high-precision generation across multiple domains.