HyPOLE: Hyperproperty-Guided Multi-Agent Reinforcement Learning under Partial Observation
Abstract
Formal specification is a powerful tool to guide the learning process and provides significant advantages over ad-hoc reward shaping: (1) mathematical rigor; (2) expressiveness to specify objectives and constraints, and (3) the ability to define strategies to achieve objectives. However, these benefits remain largely unexplored in the context of MARL. This paper introduces HyPOLE, a novel framework for MARL under partial observability, where learning is guided by the expressive power of the so-called hyperproperties and, in particular, the temporal logic HyperLTL. HyPOLE targets settings in which agents operate under partial observability, modeled as partially observable Markov decision processes (POMDPs). We integrate CTDE techniques with HyPOLE to synthesize decentralized policies, and our evaluation on StarCraft~II and Wildfire benchmark demonstrates clear advantages over vanilla MARL baselines.