Incentivized Exploration with Stochastic Covariates: A Two-Stage Mechanism Design for Recommender System
Yuantong Li ⋅ Guang Cheng ⋅ Xiaowu Dai
Abstract
Recommender systems play a crucial role in internet economies by connecting users with relevant products. However, designing effective recommender systems faces the key challenges: the \textit{exploration-exploitation} tradeoff in securing \textit{incentive} to explore new products against user's self-interested preferences. While prior work addresses Bayesian Incentive Compatibility (BIC) in fixed-design linear bandits \cite{sellke2023price}, we tackle the challenge of stochastic user covariates sampled online. Unlike standard black-box reductions \citep{mansour2020bayesian}, our two-stage framework exploits the linear reward structure to achieve sublinear regret while satisfying incentive constraints. To address it, we propose a two-stage algorithm that integrates incentivized exploration with \textit{any efficient plug-in offline learning algorithms}. In the first stage, it explores products while maintaining incentive compatibility to gather optimal samples. The second stage employs \textit{inverse proportional gap sampling strategy} integrated with any efficient learning methods to secure sublinear regret. Theoretically, we prove that algorithm \recon{} achieves $\tilde{O}(\sqrt{KdT})$ regret and simultaneously satisfies incentive constraints, and discovers the tradeoff between incentive budget and regret, validating in experiments. We demonstrate RCB's strong incentive gain, sublinear regret, and robustness through a real application on personalized warfarin dosing and simulations. To the best of our knowledge, this is the first analysis for BIC in online preference learning settings.
Successful Page Load