Skip to yearly menu bar Skip to main content

Workshop: Theory and Practice of Differential Privacy

Disclosure avoidance in redistricting data: is $\epsilon=12.2$ private?

Abraham Flaxman

Abstract: As part of the 2020 decennial census, the US Census Bureau has developed a new approach to disclosure avoidance, based on differential privacy, called the TopDown Algorithm. The first results to use this new approach will be the Public Law 74 redistricting file, and the Census Bureau recently released a demonstration of their algorithm on data from the 2010 census using a privacy loss budget of $\epsilon=12.2$. We conducted a simulation study to investigate the risk of re-identification in data like this. We reconstructed a microdata file based on the 2010 decennial census and used this to simulate a commercial database and a test database. We used exact record linkage between the simulated commercial database and the demonstration data (as well as simulated demonstrate data with larger and smaller privacy loss budgets) and measured the putative and confirmed re-identifications to investigate how much protection a range of values of $\epsilon$ might provide. In our simulation, we found 40.5 million putative re-identifications among the total US population, of which 38.6 million were confirmed. Among individuals who are not of the majority race/ethnicity of their census tract, we found 2.5 million putative re-identifications, of which 0.9 million were confirmed. Balancing the trade-off between privacy and accuracy is a policy decision, and we hope that this analysis can provide some additional evidence to inform stakeholders.

Chat is not available.