Timezone: »
Given data drawn from an unknown distribution, D, to what extent is it possible to amplify'' this dataset and faithfully output an even larger set of samples that appear to have been drawn from D? We formalize this question as follows: an (n,m) amplification procedure takes as input n independent draws from an unknown distribution D, and outputs a set of m > n
samples'' which must be indistinguishable from m samples drawn iid from D. We consider this sample amplification problem in two fundamental settings: the case where D is an arbitrary discrete distribution supported on k elements, and the case where D is a d-dimensional Gaussian with unknown mean, and fixed covariance matrix. Perhaps surprisingly, we show a valid amplification procedure exists for both of these settings, even in the regime where the size of the input dataset, n, is significantly less than what would be necessary to learn distribution D to non-trivial accuracy. We also show that our procedures are optimal up to constant factors. Beyond these results, we describe potential applications of such data amplification, and formalize a number of curious directions for future research along this vein.
Author Information
Brian Axelrod (Stanford)
Shivam Garg (Stanford University)
Vatsal Sharan (Stanford University)
Gregory Valiant (Stanford University)
More from the Same Authors
-
2021 : Sub-population Guarantees for Importance Weights and KL-Divergence Estimation »
Parikshit Gopalan · Nina Narodytska · Omer Reingold · Vatsal Sharan · Udi Wieder -
2023 : Lexinvariant Language Models »
Qian Huang · Eric Zelikman · Sarah Chen · Yuhuai Wu · Greg Valiant · Percy Liang -
2023 Poster: One-sided Matrix Completion from Two Observations Per Row »
Steven Cao · Percy Liang · Greg Valiant -
2021 Poster: Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training »
Kai Sheng Tai · Peter Bailis · Gregory Valiant -
2021 Spotlight: Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training »
Kai Sheng Tai · Peter Bailis · Gregory Valiant -
2021 Poster: Reward Identification in Inverse Reinforcement Learning »
Kuno Kim · Shivam Garg · Kirankumar Shiragur · Stefano Ermon -
2021 Spotlight: Reward Identification in Inverse Reinforcement Learning »
Kuno Kim · Shivam Garg · Kirankumar Shiragur · Stefano Ermon -
2020 Poster: Causal Strategic Linear Regression »
Yonadav Shavit · Benjamin Edelman · Brian Axelrod -
2019 Poster: Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data »
Vatsal Sharan · Kai Sheng Tai · Peter Bailis · Gregory Valiant -
2019 Poster: Equivariant Transformer Networks »
Kai Sheng Tai · Peter Bailis · Gregory Valiant -
2019 Oral: Equivariant Transformer Networks »
Kai Sheng Tai · Peter Bailis · Gregory Valiant -
2019 Oral: Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data »
Vatsal Sharan · Kai Sheng Tai · Peter Bailis · Gregory Valiant -
2019 Poster: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2019 Oral: Maximum Likelihood Estimation for Learning Populations of Parameters »
Ramya Korlakai Vinayak · Weihao Kong · Gregory Valiant · Sham Kakade -
2017 Poster: Estimating the unseen from multiple populations »
Aditi Raghunathan · Greg Valiant · James Zou -
2017 Poster: Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use »
Vatsal Sharan · Gregory Valiant -
2017 Talk: Estimating the unseen from multiple populations »
Aditi Raghunathan · Greg Valiant · James Zou -
2017 Talk: Orthogonalized ALS: A Theoretically Principled Tensor Decomposition Algorithm for Practical Use »
Vatsal Sharan · Gregory Valiant