Exploring Motif-based Heterogeneous Graph Learning for ReDoS Detection
Hong Huang ⋅ Chengyu Yao ⋅ Rongchen Li ⋅ Weihao Su ⋅ Chengyao Peng ⋅ Haiming Chen ⋅ Guiyi He
Abstract
Regular expressions (regexes) frequently exhibit super-linear worst-case behavior in regex engines, exposing software to Regex Denial-of-Service (ReDoS) attacks. Detecting such vulnerabilities is challenging, especially for extended features such as lookarounds and backreferences: existing static approaches are efficient but often lack support for extended features, whereas dynamic and hybrid approaches reduce false positives by executing regex matching on real engines, but incur high runtime overhead. To address this trade-off, we propose ReDoS-MotifGNN (RMGNN), a motif-based graph learning framework for ReDoS detection that leverages the low inference latency of graph neural networks (GNNs). RMGNN converts regexes into Heterogeneous Regex Graphs (HRGs) and encodes three ReDoS-related motifs into HRGs to incorporate domain priors, while preserving the syntactic structure and extended features of the input regex. Furthermore, it applies heterogeneous propagation with kernel-guided motif learning to capture multi-scale semantics, which are fused via residual cross-attention for robust prediction. Comprehensive evaluation on four real-world datasets (over 317k regexes) demonstrates that RMGNN outperforms six state-of-the-art baselines in F1-score and achieves an average 244$\times$ speedup over the top F1-performing baseline.
Successful Page Load