Towards Professional-Grade Financial Agents: Benchmarking, Tooling, and Structured Reasoning
Abstract
Financial reasoning requires precise execution. While Large Language Model (LLM) agents have shown encouraging progress in financial reasoning, their effectiveness in realistic financial workflows is severely hindered by the lack of holistic benchmarks and the fragility of unstructured reasoning. To evaluate these capabilities, we introduce ProFinR, the first Professional Finance Reasoning benchmark that covers four types of financial tasks, comprising 528 expert-designed tasks. To solve these complex financial reasoning questions, we construct Financial Tool Universe, a tool library containing 53 domain-specific tools organized into 13 categories. Building on the tool library, we introduce ProFinAgents, a structured agent framework based on Directed Acyclic Graph (DAG) and Case-Based Memory (CBM). Compared with strictly sequential workflows, ProFinAgent coordinates tool execution through DAG. This allows for parallel execution and reduces latency compared to serial pipelines. Furthermore, the CBM component refines decision-making over time by retrieving prior cases to mitigate reasoning failures. Experimental results demonstrate that ProFinAgent achieves a 49.81% performance gain over state-of-the-art baselines with a 47.1% reduction in inference latency.