Skip to yearly menu bar Skip to main content


Oral
in
Workshop: ICML workshop on Machine Learning for Cybersecurity (ICML-ML4Cyber)

CyberEnt: Extracting Domain Specific Entities from Cybersecurity Text

Casey Hanks · Michael Maiden · Priyanka Ranade · Tim Finin · Anupam Joshi


Abstract:

Cyber Threat Intelligence (CTI) is information de-scribing threat vectors, vulnerabilities, and attacksand is often used as training data for AI-based cy-ber defense systems such as Cybersecurity Knowl-edge Graphs (CKG). There is a large need todevelop community-accessible datasets to trainexisting AI-based cybersecurity pipelines to effi-ciently and accurately extract meaningful insightsfrom CTI. We have created an initial unstructuredCTI corpus from a variety of open sources thatwe are using to train and test cybersecurity entitymodels using the spaCy framework and exploringself-learning methods to automatically recognizecybersecurity entities. We also describe methodsto apply cybersecurity domain entity linking withexisting world knowledge from Wikidata. Ourfuture work will survey and test spaCy NLP tools,and create methods for continuous integration ofnew information extracted from text.

Chat is not available.