Closing the Sim-to-Real Gap in Non-Markovian Spreading Processes via GPU-Accelerated Distributional RL
Heman Shakeri
Abstract
Controlling spreading processes on networks such as epidemics, information cascades, product adoption, requires policies that perform on realistic stochastic dynamics, not just tractable approximations. Yet policies trained on standard simplifications (mean-field ODEs, Markovian dynamics) suffer severe performance degradation at deployment. We trace this sim-to-real gap to three theoretical pathologies: Optimism Bias, where deterministic approximations systematically underestimate variance via Jensen's inequality; Hub Blindness, where global state aggregation obscures the super-spreaders driving scale-free networks; and the Valley of Death, where mean-value critics fail to navigate the bimodal nature (extinction vs. viral) of cascade outcomes. We resolve these challenges through two synergistic contributions. First, the Stratified Mean-Field Observer partitions nodes by influence tier, preserving hub dynamics at $O(N)$ cost while producing fixed-dimensional observations that enable zero-shot transfer across network scales and topologies. Second, we demonstrate that Distributional RL via Truncated Quantile Critics is essential for risk-aware control of bimodal cascades. Trained on a GPU-accelerated simulator supporting non-Markovian renewal dynamics, our approach achieves $59\times$ improvement over Markovian baselines and robust zero-shot transfer to real-world social networks (Facebook, Twitter, YouTube), effectively closing the simulation-to-reality gap.
Successful Page Load