MARS-SQL: A Multi-Agent Reinforcement Learning Framework For Text-To-SQL
Abstract
Large Language Models (LLMs) often struggle with the precise logic and schema alignment required for complex Text-to-SQL tasks. While current methods rely heavily on static prompting, they lack the ability to dynamically adapt and self-correct through environmental interaction. To bridge this gap, we propose MARS-SQL, a multi-agent architecture that leverages interactive Reinforcement Learning (RL) to optimize SQL generation. Unlike monolithic approaches, our method decomposes the problem into three specialized roles: schema linking, query generation, and solution validation. Central to our approach is a generation agent trained via a multi-turn RL policy, which operates within a ReAct-style loop. This agent learns to iteratively reason, execute intermediate SQL actions on a live database, and refine its strategy based on execution feedback. To ensure robustness, we introduce a validation mechanism that treats solution selection as a generative modeling task, identifying the optimal interaction trajectory through next-token prediction probabilities. Empirical evaluations demonstrate the effectiveness of coupling interactive learning with trajectory ranking. MARS-SQL achieves state-of-the-art performance, recording an execution accuracy of 77.84\% on the BIRD development dataset and 89.75\% on the Spider test dataset.