We present a Bayesian view of counterfactual risk minimization (CRM), also known as offline policy optimization from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the truncated IPS estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard $L_2$ regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.
Ben London (Amazon)
Ted Sandler (Amazon.com)
Related Events (a corresponding poster, oral, or spotlight)
2019 Poster: Bayesian Counterfactual Risk Minimization »
Tue Jun 11th 06:30 -- 09:00 PM Room Pacific Ballroom