Skip to yearly menu bar Skip to main content


Poster

Naive Bayes Classifiers over Missing Data: Decision and Poisoning

Song Bian · Xiating Ouyang · ZHIWEI FAN · Paris Koutris


Abstract:

Machine learning models, such as Naive Bayes Classifier (NBC), are increasingly being applied in various domains. However, the performance of machine learning models can be significantly affected by the quality of the training data, especially when the dataset is dirty or incomplete. This paper studies certifiable robustness in the context of the NBC, focusing on datasets with missing values. A test point is defined as certifiably robust for an ML model if the prediction remains consistent across all possible cleaned versions (among exponentially many) of the dataset the model trained on. We propose an efficient algorithm that decides the certifiable robustness of multiple test points for NBC. Furthermore, we explore the vulnerability of NBC to targeted data poisoning attacks, where missing values are intentionally inserted to disrupt the model's certifiability. Our theoretical analysis reveals that crafting such attacks to render a single test point certifiably non-robust for NBC can be achieved optimally while ensuring multiple test points become certifiably non-robust is proven to be NP-hard. Extensive experiments demonstrate the efficacy of our approaches, which outperform existing baselines.

Live content is unavailable. Log in and register to view live content