Modern neural networks have recently been found to be poorly calibrated, primarily in the direction of over-confidence. Methods like entropy penalty and temperature smoothing improve calibration by clamping confidence, but in doing so compromise the many legitimately confident predictions. We propose a more principled fix that minimizes an explicit calibration error during training. We present MMCE, a RKHS kernel based measure of calibration that is efficiently trainable alongside the negative likelihood loss without careful hyper-parameter tuning. Theoretically too, MMCE is a sound measure of calibration that is minimized at perfect calibration, and whose finite sample estimates are consistent and enjoy fast convergence rates. Extensive experiments on several network architectures demonstrate that MMCE is a fast, stable, and accurate method to minimize calibration error while maximally preserving the number of high confidence predictions.
Aviral Kumar (IIT Bombay)
I am a final year undergraduate in Computer Science at Indian Institute of Technology Bombay, Mumbai, India. I will join UC Berkeley as a Ph.D. student starting Fall 2018.
Sunita Sarawagi (IIT Bombay)
Ujjwal Jain (IIT Bombay)
Related Events (a corresponding poster, oral, or spotlight)
2018 Oral: Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings »
Fri Jul 13th 02:50 -- 03:00 PM Room A6