Opt-Miner: Empowering Information-Seeking Agent with Tree-Guided Data Synthesis for Optimization Modeling
Abstract
Large Language Model (LLM) agents have shown significant potential in automated optimization modeling for mathematical problems. However, real-world problems are still challenging due to their knowledge-intensive nature. Existing methods, constrained by static parametric knowledge, often lack the domain expertise required to comprehend complex scenarios and apply appropriate mathematical techniques, leading to errors. To address this challenge, we propose the Opt-Miner framework, where the agent learns to identify missing knowledge, retrieve technical documents on the web, and ground its mathematical models for improved modeling performance. The core of Opt-Miner is a novel tree-guided data synthesis pipeline coupled with a retrieval-based group relative policy optimization (R-GRPO) algorithm, designed to foster the agent’s information-seeking capabilities. Specifically, we first formulate each problem into a tree structure, with its scenario contexts and mathematical techniques embedded in subtrees. We then employ subtree union, transfer, and knowledge fogging to synthesize complex, multi-domain problems that incorporate knowledge gaps, thereby necessitating active information seeking to solve these problems. Based on synthesized data, we propose R-GRPO for agent reinforcement learning. Experiments demonstrate that Opt-Miner-Qwen3-8B achieves performance comparable to 32B state-of-the-art specialized agents and commercial reasoning models.