ICML 2021 Near-Optimal Algorithms for Explainable k-Medians and k-Means Spotlight

Spotlight

Near-Optimal Algorithms for Explainable k-Medians and k-Means

Konstantin Makarychev · Liren Shan

[ Abstract ] [ Visit Unsupervised Learning 1 ] [ Paper ]

[ Paper ]

Abstract: We consider the problem of explainable

k

$k$ -medians and

k

$k$ -means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem, our goal is to find a \emph{threshold decision tree} that partitions data into

k

$k$ clusters and minimizes the

k

$k$ -medians or

k

$k$ -means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is

\tilde{O} (\log k)

$\tilde O(\log k)$ competitive with

k

$k$ -medians with

ℓ_{1}

$\ell_1$ norm and

\tilde{O} (k)

$\tilde O(k)$ competitive with

k

$k$ -means. This is an improvement over the previous guarantees of

O (k)

$O(k)$ and

O (k^{2})

$O(k^2)$ by Dasgupta et al (2020). We also provide a new algorithm which is

O (\log^{\nicefrac 32} k)

$O(\log^{\nicefrac{3}{2}} k)$ competitive for

k

$k$ -medians with

ℓ_{2}

$\ell_2$ norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of

Ω (\log k)

$\Omega(\log k)$ for

k

$k$ -medians; in this work, we prove a lower bound of

\tilde{Ω} (k)

$\tilde\Omega(k)$ for

k

$k$ -means. We also provide a lower bound of

Ω (\log k)

$\Omega(\log k)$ for

k

$k$ -medians with

ℓ_{2}

$\ell_2$ norm.

Chat is not available.