Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Data: Automated Creation, Privacy, Bias

Benchmarking Differential Privacy and Federated Learning for BERT Models

Priyam Basu · Rakshit Naidu · Zumrut Muftuoglu · Sahib Singh · FatemehSadat Mireshghallah

Keywords: [ Non-convex Optimization ]


Depression is a serious medical illness that can have adverse effects on how one feels, thinks, and acts, which can lead to emotional and physical problems. Natural Language Processing (NLP) techniques can be applied to help with the diagnosis of such illnesses, using written peoples' utterances and writings. Due to the sensitive nature of such data, privacy measures need to be taken for handling and training models. In this work, we study the effects that Differential Privacy (DP) and Federated Learning (FL) have, on training contextualized language models (BERT, ALBERT, RoBERTa and DistilBERT), and offer insights on how to privately train NLP models. We envisage this work to be used in the healthcare/mental health industry to keep medical history private. Hence, we provide the open-source implementation of this work. To see the behavior of privacy implementations on the different datasets, the work is also implemented on a Sexual Harassment Twitter dataset.

Chat is not available.