RLIE: Rule Generation with Logistic Regression, Iterative Refinement, and Evaluation for Large Language Models
Abstract
Large Language Models (LLMs) can propose natural-language rules, circumventing the reliance on a predefined predicate space in traditional rule learning. However, existing LLM-based methods often neglect the global interactions among rules, and the potential of using fine-grained rule importance scores to calibrate neuro-symbolic reasoning remains underexplored. To address this gap, we introduce RLIE, a framework that integrates LLMs with probabilistic modeling to learn weighted rule sets in four stages: (1) Rule generation: proposing and filtering candidate rules via LLMs; (2) Logistic regression: learning sparse, calibrated weights for global rule selection; (3) \textbf{I}terative refinement: revising the rule set with error-driven hard examples; and (4) \textbf{E}valuation: {validating the learned system via comparative inference paradigms}. Across multiple real-world datasets and LLM backbones, our learned weighted rules \textbf{achieve superior stability and accuracy}, whereas rule-injection prompting yields mixed results and often degrades performance. These results suggest LLMs excel at semantic rule discovery but are less reliable at controlled probabilistic aggregation. Our findings highlight both the promise and the limits of LLMs for inductive reasoning, motivating a principled integration with classic probabilistic rule combination for reliable neuro-symbolic reasoning.