Timezone: »
In this paper, we study how to learn from inconsistent human feedback in the setting of combinatorial bandits with semi-bandit feedback -- where an online learner in every time step chooses a size-$k$ set of arms, observes a stochastic reward for each arm, and endeavors to maximize the sum of the per-arm rewards in the set. We consider the challenging setting where these per-arm rewards are not only set-dependent, but also {\em inconsistent:} the expected reward of arm "a" can be larger than arm "b" in one set, but smaller in another. Inconsistency is often observed in practice, falls outside the purview of many popular semi-bandit models, and in general can result in it being combinatorially hard to find the optimal set.Motivated by the observed practice of using UCB-based algorithms even in settings where they are not strictly justified, our main contribution is to present a simple assumption - weak optimal set consistency. We show that this assumption allows for inconsistent set-dependent arm rewards, and also subsumes many widely used models for semi-bandit feedback. Most importantly, we show that it ensures that a simple UCB-based algorithm finds the optimal set, and achieves $O\left(\min(\frac{k^3 n \log T}{\epsilon}, k^2\sqrt{n T \log T})\right)$ regret (which nearly matches the lower bound).
Author Information
Shuo Yang (University of Texas at Austin)
Tongzheng Ren (UT Austin / Google Brain)
Inderjit Dhillon (UT Austin & Amazon)
Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. His main research interests are in big data, machine learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit has received several awards, including the ICES Distinguished Research Award, the SIAM Outstanding Paper Prize, the Moncrief Grand Challenge Award, the SIAM Linear Algebra Prize, the University Research Excellence Award, and the NSF Career Award. He has published over 160 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. Inderjit is an ACM Fellow, an IEEE Fellow, a SIAM Fellow and an AAAS Fellow.
Sujay Sanghavi (UT Austin)
More from the Same Authors
-
2022 : Positive Unlabeled Contrastive Representation Learning »
Anish Acharya · Sujay Sanghavi · Li Jing · Bhargav Bhushanam · Michael Rabbat · Dhruv Choudhary · Inderjit Dhillon -
2023 : Contextual Set Selection Under Human Feedback With Model Misspecification »
Shuo Yang · Rajat Sen · Sujay Sanghavi -
2023 : Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity »
Charlie Hou · Kiran Thekumparampil · Michael Shavlovsky · Giulia Fanti · Yesh Dattatreya · Sujay Sanghavi -
2023 Poster: Beyond Uniform Lipschitz Condition in Differentially Private Optimization »
Rudrajit Das · Satyen Kale · Zheng Xu · Tong Zhang · Sujay Sanghavi -
2023 Poster: Understanding Self-Distillation in the Presence of Label Noise »
Rudrajit Das · Sujay Sanghavi -
2022 Poster: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Poster: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Spotlight: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Spotlight: Making Linear MDPs Practical via Contrastive Representation Learning »
Tianjun Zhang · Tongzheng Ren · Mengjiao Yang · Joseph E Gonzalez · Dale Schuurmans · Bo Dai -
2022 Poster: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2022 Spotlight: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2021 Poster: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Spotlight: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2020 : Discussion Panel »
Krzysztof Dembczynski · Prateek Jain · Alina Beygelzimer · Inderjit Dhillon · Anna Choromanska · Maryam Majzoubi · Yashoteja Prabhu · John Langford -
2020 : Invited Talk 5 Q&A - Inderjit Dhillon »
Inderjit Dhillon -
2020 : Invited Talk 5 - Multi-Output Prediction: Theory and Practice - Inderjit Dhillon »
Inderjit Dhillon -
2020 Poster: Learning to Encode Position for Transformer with Continuous Dynamical Model »
Xuanqing Liu · Hsiang-Fu Yu · Inderjit Dhillon · Cho-Jui Hsieh -
2020 Poster: Accountable Off-Policy Evaluation With Kernel Bellman Statistics »
Yihao Feng · Tongzheng Ren · Ziyang Tang · Qiang Liu -
2020 Poster: Extreme Multi-label Classification from Aggregated Labels »
Yanyao Shen · Hsiang-Fu Yu · Sujay Sanghavi · Inderjit Dhillon -
2019 Poster: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Poster: Learning with Bad Training Data via Iterative Trimmed Loss Minimization »
Yanyao Shen · Sujay Sanghavi -
2019 Oral: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Oral: Learning with Bad Training Data via Iterative Trimmed Loss Minimization »
Yanyao Shen · Sujay Sanghavi -
2018 Poster: Learning long term dependencies via Fourier recurrent units »
Jiong Zhang · Yibo Lin · Zhao Song · Inderjit Dhillon -
2018 Poster: Towards Fast Computation of Certified Robustness for ReLU Networks »
Tsui-Wei Weng · Huan Zhang · Hongge Chen · Zhao Song · Cho-Jui Hsieh · Luca Daniel · Duane Boning · Inderjit Dhillon -
2018 Oral: Towards Fast Computation of Certified Robustness for ReLU Networks »
Tsui-Wei Weng · Huan Zhang · Hongge Chen · Zhao Song · Cho-Jui Hsieh · Luca Daniel · Duane Boning · Inderjit Dhillon -
2018 Oral: Learning long term dependencies via Fourier recurrent units »
Jiong Zhang · Yibo Lin · Zhao Song · Inderjit Dhillon -
2018 Poster: Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization »
Jiong Zhang · Qi Lei · Inderjit Dhillon -
2018 Oral: Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization »
Jiong Zhang · Qi Lei · Inderjit Dhillon -
2017 Poster: Gradient Boosted Decision Trees for High Dimensional Sparse Output »
Si Si · Huan Zhang · Sathiya Keerthi · Dhruv Mahajan · Inderjit Dhillon · Cho-Jui Hsieh -
2017 Talk: Gradient Boosted Decision Trees for High Dimensional Sparse Output »
Si Si · Huan Zhang · Sathiya Keerthi · Dhruv Mahajan · Inderjit Dhillon · Cho-Jui Hsieh -
2017 Poster: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon -
2017 Poster: Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization »
Qi Lei · En-Hsu Yen · Chao-Yuan Wu · Inderjit Dhillon · Pradeep Ravikumar -
2017 Talk: Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization »
Qi Lei · En-Hsu Yen · Chao-Yuan Wu · Inderjit Dhillon · Pradeep Ravikumar -
2017 Talk: Recovery Guarantees for One-hidden-layer Neural Networks »
Kai Zhong · Zhao Song · Prateek Jain · Peter Bartlett · Inderjit Dhillon