Skip to yearly menu bar Skip to main content


Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits

Qingyue Zhao ⋅ Kaixuan Ji ⋅ Heyang Zhao ⋅ Tong Zhang ⋅ Quanquan Gu

Abstract

Chat is not available.