GLARE: Scalable Neuro-Symbolic Reward Shaping for LLM Agents via Group-Level Automata
Abstract
Reinforcement Learning (RL) with Group Relative Policy Optimization (GRPO) shows great promise for enhancing LLM reasoning, but remains challenged by sparse and unstable rewards in long-horizon tasks. Existing approaches to reward shaping struggle to balance semantic expressiveness, reliability, and computational efficiency: heuristic rules lack flexibility, while LLM-as-a-Judge incurs high computational cost and suffer from inconsistent and misaligned scoring signals in long-context settings. To address these challenges, we introduce GLARE, a neuro-symbolic reward framework that decouples semantic abstraction from credit assignment. Specifically, to leverage semantic understanding while preserving symbolic determinism, we first extract and symbolize trajectory events into a discrete representation. These events are then translated into Linear Temporal Logic (LTL) formulas, which are compiled into deterministic automata that track the agent's progress via state transitions. This mechanism yields dense and consistent reward signals, avoiding unstable direct scoring while significantly reducing computational cost. Empirical results on ALFWorld show that GLARE outperforms GRPO by 12.1\% in success rate, while achieving an 8.1\% improvement over conventional LLM-based judges using only 15\% of their computational cost.