Replicability of machine learning experiments measures how likely itis that the outcome of one experiment is repeated when performed witha different randomization of the data. In this paper, we present anestimator of replicability of an experiment that is efficient. Moreprecisely, the estimator is unbiased and has lowest variance in theclass of estimators formed by a linear combination of outcomes ofexperiments on a given data set.We gathered empirical data for comparing experiments consisting ofdifferent sampling schemes and hypothesis tests. Both factors areshown to have an impact on replicability of experiments. The datasuggests that sign tests should not be used due to low replicability.Ranked sum tests show better performance, but the combination of asorted runs sampling scheme with a t-test gives the most desirableperformance judged on Type I and II error and replicability. |