Poster

One Bug, Hundreds Behind: LLMs for Large-Scale Bug Discovery

Qiushi Wu ⋅ Yue Xiao ⋅ Dhilung Kirat ⋅ Kevin Eykholt ⋅ Jiyong Jang ⋅ Douglas Schales

Abstract

Recurring Pattern Bugs (RPBs) are defined as bugs where a single root cause appears repeatedly across multiple code segments. These bugs remain a persistent security threat even after individual instances are patched. Various static analyzers exist for finding specific bug patterns but require significant engineering effort and fail to generalize well beyond their predefined template, preventing them from detecting RPBs. To tackle RPBs, we introduce BugStone, a hybrid framework combining LLVM-based program analysis with Large Language Models to automate RPB detection. BugStone leverages a single patched instance to synthesize abstract error patterns and retrieves semantically similar bugs throughout the codebase. To evaluate BugStone, we create a ground truth dataset by analyzing over 1.9K security bugs reports, on which BugStone achieves 92.2% precision and 79.1% pairwise accuracy. We further validated BugStone through a large-scale real-world deployment. In the Linux kernel, BugStone identified over 22K potential issues; a manual audit of 400 samples confirmed 246 valid bugs, including invalid pointer dereferences, resource leaks, type errors, performance issues, and others. To evaluate the generalizability of BugStone, we further applied it to the top 100 Python projects, discovering multiple critical command injection vulnerabilities.