Timezone: »

Off-Policy Confidence Sequences
Nikos Karampatziakis · Paul Mineiro · Aaditya Ramdas

Thu Jul 22 06:25 AM -- 06:30 AM (PDT) @ None

We develop confidence bounds that hold uniformly over time for off-policy evaluation in the contextual bandit setting. These confidence sequences are based on recent ideas from martingale analysis and are non-asymptotic, non-parametric, and valid at arbitrary stopping times. We provide algorithms for computing these confidence sequences that strike a good balance between computational and statistical efficiency. We empirically demonstrate the tightness of our approach in terms of failure probability and width and apply it to the ``gated deployment'' problem of safely upgrading a production contextual bandit system.

Author Information

Nikos Karampatziakis (Microsoft)
Paul Mineiro (Microsoft)
Aaditya Ramdas (Carnegie Mellon University)

Aaditya Ramdas is an assistant professor in the Departments of Statistics and Machine Learning at Carnegie Mellon University. These days, he has 3 major directions of research: 1. selective and simultaneous inference (interactive, structured, post-hoc control of false discovery/coverage rate,…), 2. sequential uncertainty quantification (confidence sequences, always-valid p-values, bias in bandits,…), and 3. assumption-free black-box predictive inference (conformal prediction, calibration,…).

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors