Timezone: »

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm
Yi Hao · Ayush Jain · Alon Orlitsky · Vaishakh Ravindrakumar

Tue Jul 19 01:50 PM -- 01:55 PM (PDT) @ Hall G
Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $\ell_1$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d$ polynomial, where $t\ge1$ and $d\ge0$. Letting $c_{t,d}$ denote the smallest such factor, clearly $c_{1,0}=1$, and it can be shown that $c_{t,d}\ge 2$ for all other $t$ and $d$. Yet current computationally efficient algorithms show only $c_{t,1}\le 2.25$ and the bound rises quickly to $c_{t,d}\le 3$ for $d\ge 9$. We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t,d}=2$ for all $(t,d)\ne(1,0)$. Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

Author Information

Yi Hao (UCSD)
Ayush Jain (UC San Diego)
Alon Orlitsky (UCSD)
Vaishakh Ravindrakumar (UC San Diego)

Related Events (a corresponding poster, oral, or spotlight)