Timezone: »

On Estimating ROC Arc Length and Lower Bounding Maximal AUC for Imbalanced Classification
Song Liu

Many astrophysical datasets have extremely imbalanced classes, and ROC curves are often used to measure the performance of classifiers on imbalanced datasets due to their insensitivity to class distributions. This paper studies the arc length of ROC curves and provides a novel way of lower bounding the maximal AUC. We show that when the data likelihood ratio is used as the score function, the arc length of the corresponding ROC curve gives rise to a novel $f$-divergence. This $f$-divergence can be expressed using a variational objective and estimated only using samples from the positive and negative data distributions. Moreover, we show the space below the optimal ROC curve can be expressed as a similar variational objective depending on the arctangent likelihood ratio. These new insights lead to a novel two-step procedure for finding a good score function by lower bounding the maximal AUC. Experiments on RR-Lyrae datasets show the proposed two-step procedure achieves good AUC performance in imbalanced binary classification tasks while being less computationally demanding.