Graph-Link: Bridging the Semantic-Structural Gap in Text-to-SQL via Constrained Subgraph Induction
Abstract
Schema Linking serves as the foundational perception layer in Text-to-SQL, tasked with grounding natural language queries into relevant schema elements. However, existing retrieval-based approaches suffer from a critical structural blindness: by prioritizing elements with high textual similarity, they inadvertently prune semantically-thin but topologically-critical bridge tables, thereby severing relational pathways necessary for multi-hop joins. To bridge this gap, we propose Graph-Link, a novel framework that reformulates schema linking from an independent retrieval task into a constrained subgraph induction problem. We argue that generating executable SQL necessitates a connected subgraph that satisfies both semantic relevance and structural constraints. Accordingly, Graph-Link employs a hierarchical schema graph to model the search space across multiple granularities, and then applies a Steiner-tree-based optimization for subgraph induction that guarantees the topological connectivity while maximizing the signal-to-noise ratio for downstream LLMs. Extensive experiments on BIRD and Spider 2.0 demonstrate that Graph-Link achieves state-of-the-art schema linking performance, improving recall and hit rates by up to 7.0\% over competitive baselines, and boosts downstream SQL generation accuracy on complex queries by 13.8\%.