Spotlight
in
Workshop: ICML workshop on Machine Learning for Cybersecurity (ICML-ML4Cyber)
A High Fidelity Cybersecurity Dataset for Attack Modeling
Craig Laprade · Benjamin Bowman · H. Howie Huang
Recent high-profile cyber attacks have made it clear that the traditional signature-based defensive tools at the network perimeter are no longer sufficient for protecting the ever-expanding attack surface of enterprise-level computer networks. Advanced Persistent Threats (APTs) are gaining unauthorized access to networks, and performing complex, multi-stage attack campaigns, often only being detected long after accomplishing their mission. Unfortunately, due to the sensitivity of enterprise network data, there is a lack of realistic and complete, enterprise-grade data available to the research community for the purposes of building better algorithms and tools capable of modeling attacks and defending enterprise networks. The existing cybersecurity datasets often contain little to no attack data, or are unrealistically simple attacks which are not representative of full APT-level attack campaigns. In this work we generate a compact yet realistic attack-focused dataset in a simulated enterprise computer network using tools and procedures common to both the attackers and defenders. We orchestrate, document, and carry out an APT-level compromise of the domain which covers multiple tactics, techniques, and procedures across the full life-cycle of an attack. We perform full network-level monitoring typical of enterprise network defenders to capture a high fidelity and complete representation of the attack. We evaluate our dataset against the existing datasets available to the community for breadth, completeness, and utility.