Threat2Traffic: Multi-Agent Environment Synthesis for Malware Traffic Generation from Threat Intelligence
Abstract
Data-driven cybersecurity research is fundamentally constrained by the scarcity of labeled datasets, yet acquiring authentic, large-scale malware traffic remains bottlenecked by obsolescent public datasets, unscalable manual construction, and inflexible sandboxes that fail to satisfy the sample-specific dependencies required for malware to exhibit malicious behavior. Threat intelligence documents these dependencies, and LLM agents offer a path to extract them for environment construction, yet directly applying such agents faces two challenges: input-side ambiguity and output-side fragility. In this paper, we propose Threat2Traffic, a multi-agent framework that extracts sample-specific dependencies from threat intelligence, reconstructs tailored environments, and captures malware traffic. To address input-side ambiguity, it formulates dependency extraction as structured multi-agent deliberation over an evidence graph. To overcome output-side fragility, it incorporates invariant-guided synthesis with dual-layer validation under syntactic and semantic constraints. Evaluated on 1,077 samples across eight malware families, Threat2Traffic achieves 83.1\% reproduction success, highlighting its effectiveness for scalable and realistic malware traffic generation. We release the core source code and traffic dataset at https://github.com/apos3637/Threat2Traffic