TEM: High Utility Metric Differential Privacy on Text
Ricardo Silva Carvalho · Theodore Vasiloudis · Oluwaseyi Feyisetan

Ensuring the privacy of users whose data are used to train Natural Language Processing (NLP) models is necessary to build and maintain customer trust. Differential Privacy (DP) has emerged as the most successful method to protect the privacy of individuals.

However, applying DP to the NLP domain comes with unique challenges. The most successful previous methods use a generalization of DP for metric spaces, and apply the privatization by adding noise to inputs in the metric space of word embeddings. However, these methods assume that one specific distance measure is being used, ignore the density of the space around the input, and assume the embeddings used have been trained on non-sensitive data.

In this work we propose the Truncated Exponential Mechanism (TEM), a general method that allows the privatization of words using any distance metric, on embeddings that can be trained on sensitive data. Our method makes use of the exponential mechanism to turn the privatization step into a selection problem. This allows the noise applied to be calibrated to the density of the embedding space around the input, and makes domain adaptation possible for the embeddings. In our experiments, we demonstrate that our method significantly outperforms the state-of-the-art in terms of utility for the same level of privacy, while providing more flexibility in the metric selection.

Author Information

Ricardo Silva Carvalho (Simon Fraser University)
Theodore Vasiloudis (Amazon.com)

I completed my PhD on the topic of "Large-scale Machine Learning through Approximation and Distributed Computing" at the Royal Institute of Technology, KTH in Stockholm. Before that I did my MSc in Machine Learning at KTH as well, completing my thesis at Spotify with the topic "Extending recommendation algorithms by modeling user context". During my PhD I completed internships at Data Artisans, Pandora Media, and Amazon. I am currently an Applied Scientist for Amazon, working in the Search Science and AI group

Oluwaseyi Feyisetan (Amazon)

