Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Reinforcement Learning for Real Life

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Haruka Kiyohara · Yuta Saito · Tatsuya Matsuhiro · Yusuke Narita · Nobuyuki Shimizu · Yasuo Yamamoto


Abstract:

In real world recommender systems, we often aim to optimize \textit{ranking} decision making. In these applications, \textit{off-policy evaluation} (OPE) is beneficial because it enables performance estimation of unknown ranking policies using only logged data. However, naive application of OPE for ranking policies faces a critical variance issue. To tackle the issue, we often introduce user behavior assumptions to make combinatorial item space tractable. However, a strong assumption may in turn cause serious bias in the performance estimation. Therefore, it is important to appropriately control the bias-variance tradeoff by imposing a reasonable assumption. To achieve this, we propose \textit{doubly robust} (DR) estimator for ranking policies that works under the \textit{cascade} assumption. Since the cascade assumption assumes that a user interacts with items sequentially from the top position to the bottom, it is more reasonable than assuming that a user interacts with items independently. The proposed estimator leads to an unbiased estimation in more cases compared to the existing estimator built on the independence assumption. Furthermore, compared to the previous estimator built on the same cascade assumption, DR reduces the variance under a reasonable assumption. Finally, the experiments show that the proposed estimator works favorably on various synthetic settings.

Chat is not available.