Timezone: »

Spotlight
Top-k eXtreme Contextual Bandits with Arm Hierarchy
Rajat Sen · Alexander Rakhlin · Lexing Ying · Rahul Kidambi · Dean Foster · Daniel Hill · Inderjit Dhillon

Wed Jul 21 06:25 PM -- 06:30 PM (PDT) @ None
Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy for selecting multiple arms. We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|F|T)})$, where $A$ is the total number of arms and $F$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step. In the extreme setting, where the total number of arms can be in the millions, we propose a practically-motivated arm hierarchy model that induces a certain structure in mean rewards to ensure statistical and computational efficiency. The hierarchical structure allows for an exponential reduction in the number of relevant arms for each context, thus resulting in a regret guarantee of $O(k\sqrt{(\log A-k+1)T \log (|F|T)})$. Finally, we implement our algorithm using a hierarchical linear function class and show superior performance with respect to well-known benchmarks on simulated bandit feedback experiments using extreme multi-label classification datasets. On a dataset with three million arms, our reduction scheme has an average inference time of only 7.9 milliseconds, which is a 100x improvement.

#### Author Information

I am a 4th year PhD. student in WNCG, UT Austin. I am advised by [Dr. Sanjay Shakkottai](http://users.ece.utexas.edu/~shakkott/Sanjay_Shakkottai/Contact.html). I received my Bachelors degree in ECE, IIT Kharagpur in 2013. I have spent most of my childhood in Durgapur and Kolkata, West Bengal, India. My research interests include online learning (especially Multi-Armed Bandit problems), causality, learning in queueing systems, recommendation systems and social networks. I like to work on real-world problems that allow rigorous theoretical analysis.

##### Inderjit Dhillon (UT Austin & Amazon)

Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. His main research interests are in big data, machine learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit has received several awards, including the ICES Distinguished Research Award, the SIAM Outstanding Paper Prize, the Moncrief Grand Challenge Award, the SIAM Linear Algebra Prize, the University Research Excellence Award, and the NSF Career Award. He has published over 160 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. Inderjit is an ACM Fellow, an IEEE Fellow, a SIAM Fellow and an AAAS Fellow.