Poster
in
Workshop: Theory and Practice of Differential Privacy
Reproducibility in Learning
Russell Impagliazzo · Rex Lei · Jessica Sorrell
Reproducibility is vital to ensuring scientific conclusions are reliable, and researchers have an obligation to ensure that their results are replicable. However, many scientific fields are suffering from a "reproducibility crisis," a term coined circa 2010 to refer to the failure of results from a variety of scientific disciplines to replicate. Within the subfields of machine learning and data science, there are similar concerns about the reliability of published findings. The performance of models produced by machine learning algorithms may be affected by the values of random seeds or hyperparameters chosen during training, and performance may be brittle to deviations from the values disseminated in published results.
In this work, we aim to initiate the study of reproducibility as a property of learning algorithms. We define a new notion of reproducibility, which informally says that a randomized algorithm is reproducible if two distinct runs of the algorithm on two samples drawn from the same distribution, with internal randomness fixed between both runs, produces the same output with high probability. We show that it is possible to efficiently simulate any statistical query algorithm reproducibly, but that the converse does not hold. We show that reproducibility implies differential privacy, and that reproducible algorithms permit data reuse under adaptive queries.