Timezone: »
Contextual Set Selection Under Human Feedback With Model Misspecification
Shuo Yang · Rajat Sen · Sujay Sanghavi
Event URL: https://openreview.net/forum?id=6Z2uBx5bpZ »
A common and efficient way to elicit human feedback is to present users with a set of options, and record their relative preferences on the presented options. The contextual combinatorial bandits problem captures this setting algorithmically; however, it implicitly assumes an underlying consistent reward model for the options. The setting of human feedback (which e.g. may use different reviewers for different samples) means that there may not be any such model -- it is *misspecified*. We first derive a lower-bound for our setting, and then show that model misspecification can lead to catastrophic failure of the C$^2$UCB algorithm (which is otherwise near-optimal when there is no misspecification). We then propose two algorithms: the first algorithm (MC$^2$UCB) requires knowledge of the level of misspecification $\epsilon$ (i.e., the absolute deviation from the closest well-specified model). The second algorithm is a general framework that extends to unknown $\epsilon$. Our theoretical analysis shows that both algorithms achieve near-optimal regret. Further empirical evaluations, conducted both in a synthetic environment and a real-world application of movie recommendations, demonstrate the adaptability of our algorithm to various degrees of misspecification. This highlights the algorithm's ability to effectively learn from human feedback, even with model misspecification.
A common and efficient way to elicit human feedback is to present users with a set of options, and record their relative preferences on the presented options. The contextual combinatorial bandits problem captures this setting algorithmically; however, it implicitly assumes an underlying consistent reward model for the options. The setting of human feedback (which e.g. may use different reviewers for different samples) means that there may not be any such model -- it is *misspecified*. We first derive a lower-bound for our setting, and then show that model misspecification can lead to catastrophic failure of the C$^2$UCB algorithm (which is otherwise near-optimal when there is no misspecification). We then propose two algorithms: the first algorithm (MC$^2$UCB) requires knowledge of the level of misspecification $\epsilon$ (i.e., the absolute deviation from the closest well-specified model). The second algorithm is a general framework that extends to unknown $\epsilon$. Our theoretical analysis shows that both algorithms achieve near-optimal regret. Further empirical evaluations, conducted both in a synthetic environment and a real-world application of movie recommendations, demonstrate the adaptability of our algorithm to various degrees of misspecification. This highlights the algorithm's ability to effectively learn from human feedback, even with model misspecification.
Author Information
Shuo Yang (University of Texas at Austin)
Rajat Sen (Google Research)
I am a 4th year PhD. student in WNCG, UT Austin. I am advised by [Dr. Sanjay Shakkottai](http://users.ece.utexas.edu/~shakkott/Sanjay_Shakkottai/Contact.html). I received my Bachelors degree in ECE, IIT Kharagpur in 2013. I have spent most of my childhood in Durgapur and Kolkata, West Bengal, India. My research interests include online learning (especially Multi-Armed Bandit problems), causality, learning in queueing systems, recommendation systems and social networks. I like to work on real-world problems that allow rigorous theoretical analysis.
Sujay Sanghavi (UT Austin)
More from the Same Authors
-
2022 : Positive Unlabeled Contrastive Representation Learning »
Anish Acharya · Sujay Sanghavi · Li Jing · Bhargav Bhushanam · Michael Rabbat · Dhruv Choudhary · Inderjit Dhillon -
2023 : UCB Provably Learns From Inconsistent Human Feedback »
Shuo Yang · Tongzheng Ren · Inderjit Dhillon · Sujay Sanghavi -
2023 : Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity »
Charlie Hou · Kiran Thekumparampil · Michael Shavlovsky · Giulia Fanti · Yesh Dattatreya · Sujay Sanghavi -
2023 Poster: Beyond Uniform Lipschitz Condition in Differentially Private Optimization »
Rudrajit Das · Satyen Kale · Zheng Xu · Tong Zhang · Sujay Sanghavi -
2023 Poster: Understanding Self-Distillation in the Presence of Label Noise »
Rudrajit Das · Sujay Sanghavi -
2023 Poster: Efficient List-Decodable Regression using Batches »
Abhimanyu Das · Ayush Jain · Weihao Kong · Rajat Sen -
2022 Poster: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Spotlight: Asymptotically-Optimal Gaussian Bandits with Side Observations »
Alexia Atsidakou · Orestis Papadigenopoulos · Constantine Caramanis · Sujay Sanghavi · Sanjay Shakkottai -
2022 Poster: On Learning Mixture of Linear Regressions in the Non-Realizable Setting »
Soumyabrata Pal · Arya Mazumdar · Rajat Sen · Avishek Ghosh -
2022 Spotlight: On Learning Mixture of Linear Regressions in the Non-Realizable Setting »
Soumyabrata Pal · Arya Mazumdar · Rajat Sen · Avishek Ghosh -
2022 Poster: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2022 Spotlight: Linear Bandit Algorithms with Sublinear Time Complexity »
Shuo Yang · Tongzheng Ren · Sanjay Shakkottai · Eric Price · Inderjit Dhillon · Sujay Sanghavi -
2021 Poster: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2021 Spotlight: Top-k eXtreme Contextual Bandits with Arm Hierarchy »
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon -
2020 Poster: Extreme Multi-label Classification from Aggregated Labels »
Yanyao Shen · Hsiang-Fu Yu · Sujay Sanghavi · Inderjit Dhillon -
2019 Poster: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Poster: Learning with Bad Training Data via Iterative Trimmed Loss Minimization »
Yanyao Shen · Sujay Sanghavi -
2019 Oral: Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling »
Shanshan Wu · Alexandros Dimakis · Sujay Sanghavi · Felix Xinnan Yu · Daniel Holtmann-Rice · Dmitry Storcheus · Afshin Rostamizadeh · Sanjiv Kumar -
2019 Oral: Learning with Bad Training Data via Iterative Trimmed Loss Minimization »
Yanyao Shen · Sujay Sanghavi -
2018 Poster: Multi-Fidelity Black-Box Optimization with Hierarchical Partitions »
Rajat Sen · kirthevasan kandasamy · Sanjay Shakkottai -
2018 Oral: Multi-Fidelity Black-Box Optimization with Hierarchical Partitions »
Rajat Sen · kirthevasan kandasamy · Sanjay Shakkottai -
2017 Poster: Identifying Best Interventions through Online Importance Sampling »
Rajat Sen · Karthikeyan Shanmugam · Alexandros Dimakis · Sanjay Shakkottai -
2017 Talk: Identifying Best Interventions through Online Importance Sampling »
Rajat Sen · Karthikeyan Shanmugam · Alexandros Dimakis · Sanjay Shakkottai