Timezone: »

Deep ensemble diversity and robustness on classification tasks
Zelda Mariet

Ensembles of neural networks have been shown to achieve state-of-the-art performance on a variety of ML benchmark tasks, and particularly on tasks evaluating robustness to dataset shift. Conventional wisdom attributes this success to the diversity of the neural networks within the ensemble: the more diverse the predictions, the more robust the aggregated output should be. Under the mean squared error loss, the influence of ensemble diversity is apparent from the bias-variance decomposition, which separates the ensemble loss into two terms: the first evaluates the individual model quality of ensemble members, and the second the overall ensemble diversity. Classification tasks, however, typically rely upon KL divergence-based losses with less tractable bias-variance decompositions, and thus several ad hoc metrics have been proposed as measures of classifier diversity. In this work, we a) show empirically that various metrics of ensemble diversity indeed correlate with improved performance on classification tasks, and b) leverage a generalization of the bias-variance decomposition to propose a theoretically-motivated diversity metric with a strong correlation to ensemble loss. On out-of-distribution tasks, albeit to a lesser degree, diversity metrics also correlate with ensemble loss.

Author Information

Zelda Mariet (Google Inc.)

More from the Same Authors