Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Interactive Learning with Implicit Human Feedback

Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

Thomas Kleine Büning · Aadirupa Saha · Christos Dimitrakakis · Haifeng Xu


Abstract: We study a strategic variant of the multi-armed bandit problem, which we coin the \emph{strategic click-bandit}. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards. Like in classical bandits, rewards follow a fixed unknown distribution. However, we assume that the click-through rate of each arm is chosen strategically by the arm (e.g., a host on Airbnb) in order to maximize the number of times it gets clicked. The algorithm designer does not know the post-click rewards nor the arms' actions (i.e., strategically chosen click-rates) in advance, and must learn both values over time. To solve this problem, we design an incentive-aware learning algorithm, UCB-S, which achieves two goals simultaneously: (a) aligning incentives by incentivizing desirable arm actions under uncertainty; (b) learning unknown parameters. We approximately characterize all Nash equilibria among arms under UCB-S and show a $\tilde O(\sqrt{KT})$ regret bound in every equilibrium. We also show that incentive-unaware algorithms generally fail to achieve low regret in the strategic click-bandit setup.

Chat is not available.