Code2Video: A Code-centric Paradigm for Educational Video Creation
Abstract
While recent generative models can synthesize videos in pixel space, they often fail to produce educational videos with precise structures, domain knowledge, and coherent transitions. We argue that this setting is better served by operating in a renderable environment that is explicitly controlled by code. We propose Code2Video, a code-centric agent framework that generates educational videos by writing executable Python programs. Code2Video includes three agents: a Planner that converts lecture content into a temporal storyboard, a Coder that turns the storyboard into runnable code with scope-guided auto-fix, and a Critic that refines layout using a VLM guided by visual anchor prompting, i.e., mappings from target visual outcomes to code edits. For evaluation, we build MMMC, a benchmark of professionally produced, discipline-specific educational videos. We assess Code2Video using aesthetic scores (VLM-as-a-Judge), code efficiency, and TeachQuiz, an end-to-end metric that measures how well an unlearned VLM can recover knowledge after watching generated videos. Code2Video improves performance by 40% over direct code generation and produces videos comparable to human-crafted tutorials.