Cluster Analysis of Heterogeneous Rank Data

Cluster Analysis of Heterogeneous Rank Data
Ludwig M. Busse - Institute of Computational Science, ETH Zurich, Switzerland Peter Orbanz - Institute of Computational Science, ETH Zurich, Switzerland Joachim M. Buhmann - Institute of Computational Science, ETH Zurich, Switzerland
Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.

Ludwig M. Busse - Institute of Computational Science, ETH Zurich, Switzerland
Peter Orbanz - Institute of Computational Science, ETH Zurich, Switzerland
Joachim M. Buhmann - Institute of Computational Science, ETH Zurich, Switzerland

Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.