Workshop: Subset Selection in Machine Learning: From Theory to Applications

A Practical Notation for Information-Theoretic Quantities between Outcomes and Random Variables

Andreas Kirsch · Yarin Gal


Information theory is of importance to machine learning, but the notation for information-theoretic quantities is not always clear. The right notation can convey valuable intuitions and concisely express new ideas. We propose such a notation for information-theoretic quantities between events (outcomes) and random variables. We apply this notation to BALD (Bayesian Active Learning by Disagreement), an acquisition function in Bayesian active learning: it selects the most informative (unlabelled) samples for labeling by an expert; and extend BALD to the core-set problem, which consists of selecting the most informative samples given the labels.