Timezone: »
Poster
Data Amplification: Instance-Optimal Property Estimation
Yi Hao · Alon Orlitsky
Tue Jul 14 07:00 AM -- 07:45 AM & Tue Jul 14 08:00 PM -- 08:45 PM (PDT) @ Virtual
The best-known and most commonly used technique for distribution-property estimation uses a plug-in estimator, with empirical frequency replacing the underlying distribution. We present novel linear-time-computable estimators that significantly ``amplify'' the effective amount of data available. For a large variety of distribution properties including four of the most popular ones and for every underlying distribution, they achieve the accuracy that the empirical-frequency plug-in estimators would attain using a logarithmic-factor more samples. Specifically, for Shannon entropy and a broad class of Lipschitz properties including the $L_1$ distance to a fixed distribution, the new estimators use $n$ samples to achieve the accuracy attained by the empirical estimators with $n\log n$ samples. For support-size and coverage, the new estimators use $n$ samples to achieve the performance of empirical frequency with sample size $n$ times the logarithm of the property value. Significantly strengthening the traditional min-max formulation, these results hold not only for the worst distributions, but for each and every underlying distribution. Furthermore, the logarithmic amplification factors are optimal. Experiments on a wide variety of distributions show that the new estimators outperform the previous state-of-the-art estimators designed for each specific property.
Author Information
Yi Hao (University of California, San Diego)
Alon Orlitsky (UCSD)
More from the Same Authors
-
2022 Poster: TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm »
Yi Hao · Ayush Jain · Alon Orlitsky · Vaishakh Ravindrakumar -
2022 Spotlight: TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm »
Yi Hao · Ayush Jain · Alon Orlitsky · Vaishakh Ravindrakumar -
2021 Poster: Compressed Maximum Likelihood »
Yi Hao · Alon Orlitsky -
2021 Spotlight: Compressed Maximum Likelihood »
Yi Hao · Alon Orlitsky -
2021 Poster: Robust Density Estimation from Batches: The Best Things in Life are (Nearly) Free »
Ayush Jain · Alon Orlitsky -
2021 Oral: Robust Density Estimation from Batches: The Best Things in Life are (Nearly) Free »
Ayush Jain · Alon Orlitsky -
2020 Poster: Optimal Sequential Maximization: One Interview is Enough! »
Moein Falahatgar · Alon Orlitsky · Venkatadheeraj Pichapati -
2020 Poster: Optimal Robust Learning of Discrete Distributions from Batches »
Ayush Jain · Alon Orlitsky -
2019 Poster: Doubly-Competitive Distribution Estimation »
Yi Hao · Alon Orlitsky -
2019 Oral: Doubly-Competitive Distribution Estimation »
Yi Hao · Alon Orlitsky -
2018 Poster: The Limits of Maxing, Ranking, and Preference Learning »
Moein Falahatgar · Ayush Jain · Alon Orlitsky · Venkatadheeraj Pichapati · Vaishakh Ravindrakumar -
2018 Oral: The Limits of Maxing, Ranking, and Preference Learning »
Moein Falahatgar · Ayush Jain · Alon Orlitsky · Venkatadheeraj Pichapati · Vaishakh Ravindrakumar -
2017 Poster: A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions »
Jayadev Acharya · Hirakendu Das · Alon Orlitsky · Ananda Suresh -
2017 Poster: Maximum Selection and Ranking under Noisy Comparisons »
Moein Falahatgar · Alon Orlitsky · Venkatadheeraj Pichapati · Ananda Theertha Suresh -
2017 Talk: A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions »
Jayadev Acharya · Hirakendu Das · Alon Orlitsky · Ananda Suresh -
2017 Talk: Maximum Selection and Ranking under Noisy Comparisons »
Moein Falahatgar · Alon Orlitsky · Venkatadheeraj Pichapati · Ananda Theertha Suresh