Timezone: »

 
Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation
Thomas Kleine Büning · Aadirupa Saha · Christos Dimitrakakis · Haifeng Xu
Event URL: https://openreview.net/forum?id=iIhXNqNh1c »
We study a strategic variant of the multi-armed bandit problem, which we coin the \emph{strategic click-bandit}. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards. Like in classical bandits, rewards follow a fixed unknown distribution. However, we assume that the click-through rate of each arm is chosen strategically by the arm (e.g., a host on Airbnb) in order to maximize the number of times it gets clicked. The algorithm designer does not know the post-click rewards nor the arms' actions (i.e., strategically chosen click-rates) in advance, and must learn both values over time. To solve this problem, we design an incentive-aware learning algorithm, UCB-S, which achieves two goals simultaneously: (a) aligning incentives by incentivizing desirable arm actions under uncertainty; (b) learning unknown parameters. We approximately characterize all Nash equilibria among arms under UCB-S and show a $\tilde O(\sqrt{KT})$ regret bound in every equilibrium. We also show that incentive-unaware algorithms generally fail to achieve low regret in the strategic click-bandit setup.

Author Information

Thomas Kleine Büning (University of Oslo)
Aadirupa Saha (TTIC)

Bio: Aadirupa Saha is currently a visiting faculty at Toyota Technological Institute at Chicago (TTIC). She obtained her PhD from the Department of Computer Science, Indian Institute of Science, Bangalore, advised by Aditya Gopalan and Chiranjib Bhattacharyya. She spent two years at Microsoft Research New York City as a postdoctoral researcher. During her PhD, Aadirupa interned at Microsoft Research, Bangalore, Inria, Paris, and Google AI, Mountain View. Her research interests include Bandits, Reinforcement Learning, Optimization, Learning theory, Algorithms. She has organized various workshops, tutorials and also served as a reviewer in top ML conferences. Research Interests: Machine Learning Theory (specifically Online Learning, Bandits, Reinforcement Learning), Optimization, Game Theory, Algorithms. She is recently interested in exploring ML problems at the intersection of Fairness, Privacy, Game theory and Mechanism design.

Christos Dimitrakakis (Chalmers / Harvard / Lille / Oslo)
Haifeng Xu (University of Chicago)

More from the Same Authors