CHB: A Diagnostic Toolkit for Hardness-Aware Clustering Evaluation
Abstract
Clustering is commonly compared through leaderboards that collapse performance into a single aggregate ranking. Such summaries obscure why methods succeed, which data properties align with failure, and how conclusions shift under representation changes and realistic tuning constraints. We present CHB, a diagnostic toolkit for hardness-aware clustering evaluation. CHB maps each dataset--representation pair to an interpretable hardness fingerprint capturing (i) separation, (ii) cohesion and scale heterogeneity, and (iii) topology through scalable persistent-homology summaries. Using this diagnostic space, CHB evaluates clustering algorithms under standardized, compute-aware tracks. Conditioning results on hardness coordinates turns comparison into diagnosis: across a broad range of datasets and their representations, CHB reveals reproducible structural regimes, uncovers regime-dependent ranking reversals across method families, and surfaces robustness signatures, including topology-linked breakdowns. CHB further enables representation auditing by attributing gains to measurable shifts in the hardness fingerprint rather than just external performance changes. We release CHB as an open, extensible artifact for evaluating new clustering methods and embeddings within a shared diagnostic framework.