Skip to yearly menu bar Skip to main content


Poster

Bayesian Counterfactual Risk Minimization

Ben London · Ted Sandler

Pacific Ballroom #113

Keywords: [ Bandits ] [ Statistical Learning Theory ] [ Theory and Algorithms ]


Abstract: We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard L2 regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.

Live content is unavailable. Log in and register to view live content