Bringing Code ALIVE: Optimizing Interactive Frontend Mini-Games via Automated Play and Reinforcement Learning at Scale
Abstract
The rapid evolution of Large Language Models (LLMs) has empowered even non-programmers to create visually appealing frontend mini-games with a single instruction. However, open-source models significantly lag behind proprietary counterparts in this domain. The core bottleneck is the lack of an evaluation mechanism that balances reliability with scalability, as existing methods either fail to verify dynamic interactivity or incur prohibitive computational costs. To bridge this gap, we introduce ALIVE (Aligning LLMs via Interactive Visual Execution), a high-throughput framework that leverages one-shot planning and DOM-based analysis to automatically evaluate generated games at scale. Extensive experiments demonstrate that ALIVE significantly outperforms static judge baselines in identifying functional flaws while remaining orders of magnitude more efficient than GUI agents. Functioning as a scalable `pre-flight' evaluation layer, it curates high-quality data for Supervised Fine-Tuning (SFT) and provides a consistent reward signal for Reinforcement Learning (RL). We leverage this pipeline to train ALIVE-Coder, a model achieving superior performance in interactive frontend generation. To the best of our knowledge, our work offers the first scalable path to evaluate and optimize interactive code, substantially advancing open-source capabilities.