Timezone: »
We study a strategic variant of the multi-armed bandit problem, which we coin the \emph{strategic click-bandit}. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards. Like in classical bandits, rewards follow a fixed unknown distribution. However, we assume that the click-through rate of each arm is chosen strategically by the arm (e.g., a host on Airbnb) in order to maximize the number of times it gets clicked. The algorithm designer does not know the post-click rewards nor the arms' actions (i.e., strategically chosen click-rates) in advance, and must learn both values over time. To solve this problem, we design an incentive-aware learning algorithm, UCB-S, which achieves two goals simultaneously: (a) aligning incentives by incentivizing desirable arm actions under uncertainty; (b) learning unknown parameters. We approximately characterize all Nash equilibria among arms under UCB-S and show a $\tilde O(\sqrt{KT})$ regret bound in every equilibrium. We also show that incentive-unaware algorithms generally fail to achieve low regret in the strategic click-bandit setup.
Author Information
Thomas Kleine Büning (University of Oslo)
Aadirupa Saha (TTIC)
Bio: Aadirupa Saha is currently a visiting faculty at Toyota Technological Institute at Chicago (TTIC). She obtained her PhD from the Department of Computer Science, Indian Institute of Science, Bangalore, advised by Aditya Gopalan and Chiranjib Bhattacharyya. She spent two years at Microsoft Research New York City as a postdoctoral researcher. During her PhD, Aadirupa interned at Microsoft Research, Bangalore, Inria, Paris, and Google AI, Mountain View. Her research interests include Bandits, Reinforcement Learning, Optimization, Learning theory, Algorithms. She has organized various workshops, tutorials and also served as a reviewer in top ML conferences. Research Interests: Machine Learning Theory (specifically Online Learning, Bandits, Reinforcement Learning), Optimization, Game Theory, Algorithms. She is recently interested in exploring ML problems at the intersection of Fairness, Privacy, Game theory and Mechanism design.
Christos Dimitrakakis (Chalmers / Harvard / Lille / Oslo)
Haifeng Xu (University of Chicago)
More from the Same Authors
-
2022 : Policy Fairness in Sequential Allocations under Bias Dynamics »
Meirav Segal · Anne-Marie George · Christos Dimitrakakis -
2023 : Rethinking Incentives in Recommender Systems: Are Monotone Rewards Always Beneficial? »
Fan Yao · Chuanhao Li · Karthik Abinav Sankararaman · Yiming Liao · Yan Zhu · Qifan Wang · Hongning Wang · Haifeng Xu -
2023 : Inverse Game Theory for Stackelberg Games: the Blessing of Bounded Rationality »
Jibang Wu · Weiran Shen · Fei Fang · Haifeng Xu -
2023 : Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts »
Chaoqi Wang · Ziyu Ye · Zhe Feng · Ashwinkumar Badanidiyuru · Haifeng Xu -
2023 : Learning from a Learning User for Optimal Recommendations »
Fan Yao · Chuanhao Li · Denis Nekipelov · Hongning Wang · Haifeng Xu -
2023 Workshop: The Many Facets of Preference-Based Learning »
Aadirupa Saha · Mohammad Ghavamzadeh · Robert Busa-Fekete · Branislav Kveton · Viktor Bengs -
2023 Oral: How Bad is Top-$K$ Recommendation under Competing Content Creators? »
Fan Yao · Chuanhao Li · Denis Nekipelov · Hongning Wang · Haifeng Xu -
2023 Poster: Federated Online and Bandit Convex Optimization »
Kumar Kshitij Patel · Lingxiao Wang · Aadirupa Saha · Nati Srebro -
2023 Poster: How Bad is Top-$K$ Recommendation under Competing Content Creators? »
Fan Yao · Chuanhao Li · Denis Nekipelov · Hongning Wang · Haifeng Xu -
2022 Workshop: Complex feedback in online learning »
Rémy Degenne · Pierre Gaillard · Wouter Koolen · Aadirupa Saha -
2022 Poster: Interactive Inverse Reinforcement Learning for Cooperative Games »
Thomas Kleine Buening · Anne-Marie George · Christos Dimitrakakis -
2022 Poster: Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences »
Aadirupa Saha · Pierre Gaillard -
2022 Spotlight: Versatile Dueling Bandits: Best-of-both World Analyses for Learning from Relative Preferences »
Aadirupa Saha · Pierre Gaillard -
2022 Spotlight: Interactive Inverse Reinforcement Learning for Cooperative Games »
Thomas Kleine Buening · Anne-Marie George · Christos Dimitrakakis -
2022 Poster: Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models »
Viktor Bengs · Aadirupa Saha · Eyke Hüllermeier -
2022 Poster: Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits »
Aadirupa Saha · Shubham Gupta -
2022 Spotlight: Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits »
Aadirupa Saha · Shubham Gupta -
2022 Spotlight: Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models »
Viktor Bengs · Aadirupa Saha · Eyke Hüllermeier -
2021 Poster: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Poster: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Confidence-Budget Matching for Sequential Budgeted Learning »
Yonathan Efroni · Nadav Merlis · Aadirupa Saha · Shie Mannor -
2021 Spotlight: Adversarial Dueling Bandits »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Poster: Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization »
Aadirupa Saha · Nagarajan Natarajan · Praneeth Netrapalli · Prateek Jain -
2021 Spotlight: Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization »
Aadirupa Saha · Nagarajan Natarajan · Praneeth Netrapalli · Prateek Jain -
2021 Poster: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2021 Spotlight: Dueling Convex Optimization »
Aadirupa Saha · Tomer Koren · Yishay Mansour -
2020 Poster: From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model »
Aadirupa Saha · Aditya Gopalan -
2020 Poster: Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards »
Aadirupa Saha · Pierre Gaillard · Michal Valko