Tutorial
Validity, Reliability, and Significance: A Tutorial on Statistical Methods for Reproducible Machine Learning
Stefan Riezler · Michael Hagmann
Moderator : Pin-Yu Chen
Room 307
Scientific progress in machine learning is driven by empirical studies that evaluate the relative quality of models. The goal of such an evaluation is to compare machine learning methods themselves, not to reproduce single test-set evaluations of particular optimized instances of trained models. The practice of reporting performance scores of single best models is particularly inadequate for deep learning because of a strong dependence of their performance on various sources of randomness. Such an evaluation practice raises methodological questions of whether a model predicts what it purports to predict(validity), whether a model’s performance is consistent across replications of the training process (reliability), and whether a performance difference between two models is due to chance (significance). The goal oft his tutorial is to provide answers to these questions by concrete statistical tests. The tutorial is hands-on and accompanied by a textbook (Riezler and Hagmann,2021) and a webpage including R and Python code: https://www.cl.uni-heidelberg.de/statnlpgroup/empirical_methods/
Schedule
Mon 6:30 a.m. - 6:35 a.m.
|
Opening remarks
(
Talk
)
>
SlidesLive Video |
Stefan Riezler · Michael Hagmann 🔗 |
Mon 6:35 a.m. - 6:45 a.m.
|
Introduction
(
Talk
)
>
SlidesLive Video |
Stefan Riezler 🔗 |
Mon 6:45 a.m. - 7:00 a.m.
|
Mathematical Background: Linear Mixed Effects Model (LMEM) and Generalized Likelihood Ratio Test (GLRT)
(
Talk
)
>
SlidesLive Video |
Michael Hagmann 🔗 |
Mon 7:00 a.m. - 7:15 a.m.
|
Significance
(
Talk
)
>
SlidesLive Video |
Stefan Riezler 🔗 |
Mon 7:15 a.m. - 7:30 a.m.
|
Reliability
(
Talk
)
>
SlidesLive Video |
Stefan Riezler 🔗 |
Mon 7:30 a.m. - 7:45 a.m.
|
Break
|
🔗 |
Mon 7:45 a.m. - 7:55 a.m.
|
Recap: A worked-through example
(
Talk
)
>
SlidesLive Video |
Stefan Riezler 🔗 |
Mon 7:55 a.m. - 8:05 a.m.
|
Q&A
(
Discussion
)
>
SlidesLive Video |
Stefan Riezler · Michael Hagmann 🔗 |
Mon 8:05 a.m. - 8:15 a.m.
|
Mathematical background: Generalized Additive Model (GAM)
(
Talk
)
>
SlidesLive Video |
Michael Hagmann 🔗 |
Mon 8:15 a.m. - 8:30 a.m.
|
Validity
(
Talk
)
>
SlidesLive Video |
Stefan Riezler 🔗 |
Mon 8:30 a.m. - 8:35 a.m.
|
Closing remarks
(
Talk
)
>
SlidesLive Video |
Stefan Riezler · Michael Hagmann 🔗 |
Mon 8:35 a.m. - 8:45 a.m.
|
Q&A and Discussion
(
Discussion
)
>
SlidesLive Video |
Stefan Riezler · Michael Hagmann 🔗 |