Automatic Unsupervised Ensemble Outlier Model Selection
Abstract
Unsupervised outlier detection is attractive because it eliminates the need for labeled data. Further, forming multi-model ensembles can improve detection robustness performance. However, composing an ensemble without labeled data is challenging. Naively composing ensembles can cause ensemble saturation, where redundant or unreliable detection models degrade performance and incur unnecessary computations. We propose MetaEns, an automatic unsupervised framework for the selection of outlier detection model ensembles. Using labeled meta-datasets, MetaEns learns a model that predicts marginal ensemble gains that estimate the expected improvement of adding a candidate model to a partially constructed ensemble. At test time, this learned signal is combined with a submodular-inspired proxy objective that enforces diminishing returns through diversity-aware discounting and family-level risk regularization, thereby enabling greedy sequential selection with adaptive early stopping. As a result, MetaEns constructs compact, high-quality ensembles without access to ground-truth labels. Experiments on 39 real-world datasets show that MetaEns is able to consistently outperform state-of-the-art unsupervised selectors and ensemble baselines, achieving higher average precision while using fewer models.