FormAct: Agentic Source Editing for Rich-Format Document Generation
Abstract
Rich-format documents are essential for everyday operations yet costly to author, motivating the need for automated generation to enhance productivity. To this end, we present FormAct, an agentic system that generates professional rich-format documents from scratch. FormAct operates on an HTML source representation and performs iterative source refinement with an editing agent that invokes a suite of tools, including a syntax-aware source editor and a template retriever, and a review agent that critiques rendered pages to guide refinement. Additionally, we incorporate edit-triggered context compression to maintain a bounded working context and keep multi-round editing efficient. To support development and evaluation, we introduce RichDocBench for end-to-end generation, and RichDocFuzz to evaluate formatting-error recognition for reviewer agents. Through extensive automated evaluation and blind human-preference studies, we show that FormAct consistently outperforms strong baselines, including Codex-CLI, with particularly strong improvements in generating error-free, professional rich-format documents.