Streaming Algorithms for High-Dimensional Robust Statistics

Ilias Diakonikolas · Daniel Kane · Ankit Pensia · Thanasis Pittas

Hall E #1120

Keywords: [ T: Learning Theory ]

[ Abstract ]
[ Slides [ Poster [ Paper PDF
Wed 20 Jul 3:30 p.m. PDT — 5:30 p.m. PDT
Spotlight presentation: T: Learning Theory/Domain Adaptation
Wed 20 Jul 7:30 a.m. PDT — 9 a.m. PDT


We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust statistics tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.

Chat is not available.