clustering

  • Shai Ben-David and Nika Haghtalab

    Clustering in the Presence of Background Noise (pdf)

    We address the problem of noise management in clustering algorithms. Namely, issues that arise when on top of some cluster structure the data also contains an unstructured set of points. We consider how clustering algorithms can be ``robustified" so that they recover the cluster structure in spite of the unstructured part of the input. We introduce some quantitative measures of such robustness that take into account the strength of the embedded cluster structure as well was the mildness of the noise subset. We propose a simple and efficient method to turn any centroid-based clustering algorithm into a noise-robust one, and prove robustness guarantees for our method with respect to these measures. We also prove that more straightforward ways of ``robustifying'' clustering algorithms fail to achieve similar guarantees.

  • Safiye Celik and Benjamin Logsdon and Su-In Lee

    Efficient Dimensionality Reduction for High-Dimensional Network Estimation (pdf)

    We propose module graphical lasso (MGL), an aggressive dimensionality reduction and network estimation technique for a high-dimensional Gaussian graphical model (GGM). MGL achieves scalability, interpretability and robustness by exploiting the modularity property of many real-world networks. Variables are organized into tightly coupled modules and a graph structure is estimated to determine the conditional independencies among modules. MGL iteratively learns the module assignment of variables, the latent variables, each corresponding to a module, and the parameters of the GGM of the latent variables. In synthetic data experiments, MGL outperforms the standard graphical lasso and three other methods that incorporate latent variables into GGMs. When applied to gene expression data from ovarian cancer, MGL outperforms standard clustering algorithms in identifying functionally coherent gene sets and predicting survival time of patients. The learned modules and their dependencies provide novel insights into cancer biology as well as identifying possible novel drug targets.

  • Claudio Gentile and Shuai Li and Giovanni Zappella

    Online Clustering of Bandits (pdf)

    We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation (``bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.

2013-2014 ICML | International Conference on Machine Learning