Timezone: »
Talk
Estimating the unseen from multiple populations
Aditi Raghunathan · Greg Valiant · James Zou
Given samples from a distribution, how many new elements should we expect to find if we keep on sampling this distribution? This is an important and actively studied problem, with many applications ranging from species estimation to genomics. We generalize this extrapolation and related unseen estimation problems to the multiple population setting, where population $j$ has an unknown distribution $D_j$ from which we observe $n_j$ samples. We derive an optimal estimator for the total number of elements we expect to find among new samples across the populations. Surprisingly, we prove that our estimator's accuracy is independent of the number of populations. We also develop an efficient optimization algorithm to solve the more general problem of estimating multi-population frequency distributions. We validate our methods and theory through extensive experiments. Finally, on a real dataset of human genomes across multiple ancestries, we demonstrate how our approach for unseen estimation can enable cohort designs that can discover interesting mutations with greater efficiency.
Author Information
Aditi Raghunathan (Stanford)
Greg Valiant
James Zou (Stanford)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Estimating the unseen from multiple populations »
Wed. Aug 9th 08:30 AM -- 12:00 PM Room Gallery #2
More from the Same Authors
-
2021 : Stateful Performative Gradient Descent »
Zachary Izzo · James Zou · Lexing Ying -
2022 : On the nonlinear correlation of ML performance across data subpopulations »
Weixin Liang · Yining Mao · Yongchan Kwon · Xinyu Yang · James Zou -
2022 : MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts »
Weixin Liang · Xinyu Yang · James Zou -
2022 : Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning »
Weixin Liang · Yuhui Zhang · Yongchan Kwon · Serena Yeung · James Zou -
2023 Poster: Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value »
Yongchan Kwon · James Zou -
2023 Poster: One-sided Matrix Completion from Two Observations Per Row »
Steven Cao · Percy Liang · Greg Valiant -
2023 Poster: Data-Driven Subgroup Discovery for Linear Regression »
Zachary Izzo · Ruishan Liu · James Zou -
2023 Poster: Accuracy on the Curve: On the Nonlinear Correlation of ML Performance Between Data Subpopulations »
Weixin Liang · Yining Mao · Yongchan Kwon · Xinyu Yang · James Zou -
2023 Poster: Discover and Cure: Concept-aware Mitigation of Spurious Correlation »
Ying-Xin Wu · Mert Yuksekgonul · Linjun Zhang · James Zou -
2022 : Invited talk #2 James Zou (Title: Machine learning to make clinical trials more efficient and diverse) »
James Zou -
2022 : 7-UP: generating in silico CODEX from a small set of immunofluorescence markers »
James Zou -
2022 : Contributed Talk 2: MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts »
Weixin Liang · Xinyu Yang · James Zou -
2022 Poster: Meaningfully debugging model mistakes using conceptual counterfactual explanations »
Abubakar Abid · Mert Yuksekgonul · James Zou -
2022 Spotlight: Meaningfully debugging model mistakes using conceptual counterfactual explanations »
Abubakar Abid · Mert Yuksekgonul · James Zou -
2021 Poster: Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization »
John Miller · Rohan Taori · Aditi Raghunathan · Shiori Sagawa · Pang Wei Koh · Vaishaal Shankar · Percy Liang · Yair Carmon · Ludwig Schmidt -
2021 Spotlight: Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization »
John Miller · Rohan Taori · Aditi Raghunathan · Shiori Sagawa · Pang Wei Koh · Vaishaal Shankar · Percy Liang · Yair Carmon · Ludwig Schmidt -
2021 Poster: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Spotlight: Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices »
Evan Liu · Aditi Raghunathan · Percy Liang · Chelsea Finn -
2021 Poster: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2021 Oral: Just Train Twice: Improving Group Robustness without Training Group Information »
Evan Liu · Behzad Haghgoo · Annie Chen · Aditi Raghunathan · Pang Wei Koh · Shiori Sagawa · Percy Liang · Chelsea Finn -
2020 Poster: Understanding and Mitigating the Tradeoff between Robustness and Accuracy »
Aditi Raghunathan · Sang Michael Xie · Fanny Yang · John Duchi · Percy Liang -
2020 Poster: DROCC: Deep Robust One-Class Classification »
Sachin Goyal · Aditi Raghunathan · Moksh Jain · Harsha Vardhan Simhadri · Prateek Jain -
2019 Poster: Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits »
Martin Zhang · James Zou · David Tse -
2019 Oral: Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits »
Martin Zhang · James Zou · David Tse -
2017 Poster: Learning Latent Space Models with Angular Constraints »
Pengtao Xie · Yuntian Deng · Yi Zhou · Abhimanu Kumar · Yaoliang Yu · James Zou · Eric Xing -
2017 Talk: Learning Latent Space Models with Angular Constraints »
Pengtao Xie · Yuntian Deng · Yi Zhou · Abhimanu Kumar · Yaoliang Yu · James Zou · Eric Xing