Catch Me If You Can: Detecting Phishing Emails Through Generative-Adversarial Training
Abstract
Phishing emails remain one of the leading causes of cybersecurity breaches, often bypassing modern Natural Language Processing (NLP)-based detectors. Traditional classifiers are highly vulnerable to adversarial text-based attacks, where subtle word substitutions can fool detection systems without evading human detection. To address this challenge, we propose an adversarial training framework where it employs multiple attacker models to generate adversarial phishing samples that exploit discriminator model weaknesses. A BERT-based discriminator is iteratively trained on both original and adversarial datasets in each round of training to prevent overfitting to a single attack method. Misclassified adversarial samples are integrated into the dataset, yielding an augmented corpus that progressively improves classifier generalization. Experimental results highlight the difficulty of defending against adaptive adversaries. This research underscores the importance of adversarial robustness in phishing detection and contributes towards the development of stronger defenses for real-world cybersecurity applications.