Timezone: »
Oral
Robust Density Estimation from Batches: The Best Things in Life are (Nearly) Free
Ayush Jain · Alon Orlitsky
In many applications data are collected in batches, some potentially biased, corrupt, or even adversarial. Learning algorithms for this setting have therefore garnered considerable recent attention. In particular, a sequence of works has shown that all approximately piecewise polynomial distributionsand in particular all Gaussian, Gaussianmixture, logconcave, lowmodal, and monotonehazard distributionscan be learned robustly in polynomial time. However, these results left open the question, stated explicitly in~\cite{chen2020learning}, about the best possible sample complexity of such algorithms. We answer this question, showing that, perhaps surprisingly, up to logarithmic factors, the optimal sample complexity is the same as for genuine, nonadversarial, data! To establish the result, we reduce robust learning of approximately piecewise polynomial distributions to robust learning of the probability of all subsets of size at most $k$ of a larger discrete domain, and learn these probabilities in optimal sample complexity linear in $k$ regardless of the domain size. In simulations, the algorithm runs very quickly and estimates distributions to essentially the accuracy achieved when all adversarial batches are removed. The results also imply the first polynomialtime sampleoptimal algorithm for robust intervalbased classification based on batched data.
Author Information
Ayush Jain (UC San Diego)
Alon Orlitsky (UCSD)
Related Events (a corresponding poster, oral, or spotlight)

2021 Poster: Robust Density Estimation from Batches: The Best Things in Life are (Nearly) Free »
Wed. Jul 21st 04:00  06:00 PM Room
More from the Same Authors

2023 Poster: Efficient ListDecodable Regression using Batches »
Abhimanyu Das · Ayush Jain · Weihao Kong · Rajat Sen 
2022 Poster: TURF: TwoFactor, Universal, Robust, Fast Distribution Learning Algorithm »
Yi Hao · Ayush Jain · Alon Orlitsky · Vaishakh Ravindrakumar 
2022 Spotlight: TURF: TwoFactor, Universal, Robust, Fast Distribution Learning Algorithm »
Yi Hao · Ayush Jain · Alon Orlitsky · Vaishakh Ravindrakumar 
2021 Poster: Compressed Maximum Likelihood »
Yi Hao · Alon Orlitsky 
2021 Spotlight: Compressed Maximum Likelihood »
Yi Hao · Alon Orlitsky 
2020 Poster: Optimal Sequential Maximization: One Interview is Enough! »
Moein Falahatgar · Alon Orlitsky · Venkatadheeraj Pichapati 
2020 Poster: Optimal Robust Learning of Discrete Distributions from Batches »
Ayush Jain · Alon Orlitsky 
2020 Poster: Data Amplification: InstanceOptimal Property Estimation »
Yi Hao · Alon Orlitsky 
2019 Poster: DoublyCompetitive Distribution Estimation »
Yi Hao · Alon Orlitsky 
2019 Oral: DoublyCompetitive Distribution Estimation »
Yi Hao · Alon Orlitsky 
2018 Poster: The Limits of Maxing, Ranking, and Preference Learning »
Moein Falahatgar · Ayush Jain · Alon Orlitsky · Venkatadheeraj Pichapati · Vaishakh Ravindrakumar 
2018 Oral: The Limits of Maxing, Ranking, and Preference Learning »
Moein Falahatgar · Ayush Jain · Alon Orlitsky · Venkatadheeraj Pichapati · Vaishakh Ravindrakumar 
2017 Poster: A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions »
Jayadev Acharya · Hirakendu Das · Alon Orlitsky · Ananda Suresh 
2017 Poster: Maximum Selection and Ranking under Noisy Comparisons »
Moein Falahatgar · Alon Orlitsky · Venkatadheeraj Pichapati · Ananda Theertha Suresh 
2017 Talk: A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions »
Jayadev Acharya · Hirakendu Das · Alon Orlitsky · Ananda Suresh 
2017 Talk: Maximum Selection and Ranking under Noisy Comparisons »
Moein Falahatgar · Alon Orlitsky · Venkatadheeraj Pichapati · Ananda Theertha Suresh