LawChain: A Resource-Efficient Blueprint for Legal Information Retrieval in Low-Resource Jurisdictions
Abstract
We present LawChain, a resource-efficient framework for legal information retrieval in low-resource jurisdictions, developed through a deployment on Sri Lankan legislative Acts. This setting reflects challenges common to public-sector ML in low-resource settings, including many jurisdictions in the Global South: noisy digitized documents, inconsistent legal text structure, limited evaluation benchmarks, and a vocabulary mismatch between lay queries and statutory language. To address these constraints, we build a layout-aware extraction pipeline and a modular retrieval architecture combining BM25, semantic embeddings, localized query expansion, an agentic refinement loop, and a Neo4j knowledge graph. Using a Gemini-based LLM-as-a-judge evaluation framework, the combined system achieves Precision@5 of 0.8345 and Recall@5 of 0.9357. Beyond this deployment, LawChain offers a reusable design pattern for sustainable, locally grounded retrieval under data and compute constraints. We release the system architecture, a 22,812-node legal knowledge graph, and a human-refined dataset of more than 26,000 question-answer pairs to support future work on legal information access in low-resource settings.