Timezone: »

Cross-Entropy Loss Functions: Theoretical Analysis and Applications
Anqi Mao · Mehryar Mohri · Yutao Zhong

Wed Jul 26 02:00 PM -- 03:30 PM (PDT) @ Exhibit Hall 1 #739
Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of loss functions, *comp-sum losses*, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We give the first $H$-consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set $H$ used. We further show that our bounds are *tight*. These bounds depend on quantities called *minimizability gaps*. To make them more explicit, we give a specific analysis of these gaps for comp-sum losses. We also introduce a new family of loss functions, *smooth adversarial comp-sum losses*, that are derived from their comp-sum counterparts by adding in a related smooth term. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss. While our main purpose is a theoretical analysis, we also present an extensive empirical analysis comparing comp-sum losses. We further report the results of a series of experiments demonstrating that our adversarial robustness algorithms outperform the current state-of-the-art, while also achieving a superior non-adversarial accuracy.

Author Information

Anqi Mao (Courant Institute of Mathematical Sciences, NYU)
Mehryar Mohri (Google Research and Courant Institute of Mathematical Sciences)
Yutao Zhong (Courant Institute of Mathematical Sciences, NYU)

More from the Same Authors