Transformer-Based Approaches for Source Code Vulnerability Detection
Koushik Tonmoy ⋅ Afridi Hassan ⋅ Shahriyar Zaman Ridoy
Abstract
Detecting software vulnerabilities is critical for the security of systems written in memory-unsafe languages such as C and C++, yet traditional static analyzers rely on rigid rules and syntactic pattern matching that generalize poorly across codebases and produce many false positives. We study pretrained code transformers as a data-driven alternative for multi-class vulnerability classification on the Draper VDISC dataset, comparing three pipelines: classical classifiers trained over frozen code embeddings, end-to-end fine-tuning of the transformers, and ensemble stacking of model logits through a meta-classifier. We evaluate CodeBERT, GraphCodeBERT, and UniXcoder, which differ in their pretraining objectives and inductive biases. Fine-tuning substantially outperforms frozen and classical baselines, with UniXcoder achieving the strongest single-model results, and logit-stacking ensembles---particularly UniXcoder paired with GraphCodeBERT---improve further to $82.9\%$ accuracy. Our findings highlight the value of architectural diversity for static vulnerability detection and provide a reproducible benchmark across four modeling strategies.
Successful Page Load