UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization
Zhen Yang ⋅ Wenyi Hong ⋅ Mingde Xu ⋅ Xinyue Fan ⋅ Weihan Wang ⋅ Jiale Cheng ⋅ Xiaotao Gu ⋅ Jie Tang
Abstract
UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI development that is inherently iterative and feedback-driven. We reformulate UI-to-code as an interactive visual optimization problem, where code generation is embedded in a closed-loop process of execution, visual inspection, and iterative refinement driven by rendered visual feedback. To address the non-differentiability of visual objectives and the noise of absolute visual evaluators, we propose Relative Visual Policy Optimization (RVPO), a preference-based reinforcement learning method that optimizes relative visual rankings among rendered candidates under execution feedback. We instantiate this paradigm in UI2Code$^{\text{N}}$, an open-source 9B model trained via continual pre-training, supervised fine-tuning, and reinforcement learning. Experiments demonstrate state-of-the-art performance on UI drafting, UI polishing, and UI editing benchmarks, even outperforming larger models, with performance consistently improving through iterative visual optimization.
Successful Page Load