FIRE: Learning to Navigate and Act on Real-World Files via Stateful Reinforcement Learning
Abstract
Large language models still struggle to reliably answer questions grounded in real-world files like spreadsheets and slides, where evidence is scattered across irregular layouts and heterogeneous formats. We address this by formalizing File Reasoning, a setting where agents must interact directly with unprocessed files (XLSX, PDF, DOCX, PPTX) within a persistent sandbox. To support this, we introduce a unified data pipeline and release a high-difficulty benchmark of over 400 verifiable tasks that preserve native file structure. Furthermore, we propose a reinforcement learning framework grounded in stateful file execution. We train and release FIRE (File Interactive Reasoning Expert), a family of models that learn to optimize long-horizon planning using genuine execution feedback from the environment. Unlike stateless tool-use methods, our approach enables agents to recover from errors and adapt to structural ambiguities. Empirical results show that Qwen3-32B-FIRE achieves the strongest performance among open-source models under identical execution constraints.