The Fundamental Law of Information Reconstruction, a.k.a. the Database Reconstruction Theorem, exposes a vulnerability in the way statistical agencies have traditionally published data. But it also exposes the same vulnerability for the way Amazon, Apple, Facebook, Google, Microsoft, Netflix, and other Internet giants publish data. We are all in this data-rich world together. And we all need to find solutions to the problem of how to publish information from these data while still providing meaningful privacy and confidentiality protections to the providers. Fortunately for the American public, the Census Bureau's curation of their data is already regulated by a very strict law that mandates publication for statistical purposes only and in a manner that does not expose the data of any respondent--person, household or business--in a way that identifies that respondent as the source of specific data items. The Census Bureau has consistently interpreted that stricture on publishing identifiable data as governed by the laws of probability. An external user of Census Bureau publications should not be able to assert with reasonable certainty that particular data values were directly supplied by an identified respondent. Traditional methods of disclosure avoidance now fail because they are not able to formalize and quantify that risk. Moreover, when traditional methods are assessed using current tools, the relative certainty with which specific values can be associated with identifiable individuals turns out to be orders of magnitude greater than anticipated at the time the data were released. In light of these developments, the Census Bureau has committed to an open and transparent modernization of its data publishing systems using formal methods like differential privacy. The intention is to demonstrate that statistical data, fit for their intended uses, can be produced when the entire publication system is subject to a formal privacy-loss budget. To date, the team developing these systems has demonstrated that differential privacy can be implemented for the data publications from the 2020 Census used to re-draw every legislative district in the nation (PL94-171 tables). That team has also developed methods for quantifying and displaying the system-wide trade-offs between the accuracy of those data and the privacy-loss budget assigned to the tabulations. Considering that work began in mid-2016 and that no organization anywhere in the world has yet deployed a full, central differential privacy system, this is already a monumental achievement. But it is only the tip of the iceberg in terms of the statistical products historically produced from a decennial census. Demographic profiles, based on the detailed tables traditionally published in summary files following the publication of redistricting data, have far more diverse uses than the redistricting data. Summarizing those use cases in a set of queries that can be answered with a reasonable privacy-loss budget is the next challenge. Internet giants, businesses and statistical agencies around the world should also step-up to these challenges. We can learn from, and help, each other enormously.
John M. Abowd (U.S. Census Bureau)
John Abowd is the U.S. Census Bureau’s associate director for research and methodology, and chief scientist. He was named to the position in June 2016. The Research and Methodology Directorate leads critical work to modernize our operations and products. He is leading the agency’s efforts to create a differentially private protection system for the 2020 Census and future data products. His long association with the Census Bureau began in 1998 when he joined the team that helped found the longitudinal employer-household dynamics program. In 2008, he led the team that created the world’s first application of a differentially private protection system for the program’s OnTheMap job location tool. Abowd is also the Edmund Ezra Day Professor of economics, statistics, and information science at Cornell University. He is a fellow and past president of the Society of Labor Economists. He is also a fellow of the American Statistical Association and the Econometric Society, as well as an elected member of the International Statistical Institute. He earned his Ph.D. from the University of Chicago and an A.B. from the Department of Economics at the University of Notre Dame.