Timezone: »

Faster graph bandit learning using information about the neighbors
Michal Valko

Thu Aug 10 10:30 PM -- 11:10 PM (PDT) @ None
We will talk about sequential learning problems where the available actions (arms) form a graph. These settings naturally arise in recommender systems, advertising, or sensor networks. Specifically, at each round, we are asked to pick a node representing an item to recommend or a sensor to take a reading from; after which we receive a bandit feedback in order to optimize a given performance measure (regret). To address these settings, we can always ignore the graph structure and use known algorithms for multi-armed bandits. However, their performance scales unfavorably with the number of nodes $N$, for example, $O(\sqrt{NT})$, which is undesirable when $N$ means a thousand of sensors or a million of movies. We will describe several graph-bandit problems and show how to use their graph structure to design new algorithms with faster learning rates, scaling not with $N$ but with graph-dependent quantities, often much smaller than $N$ in real-world graphs. In particular, when we receive side observations from the neighboring nodes, we can replace $N$ in the regret guarantees with the independence number, or, in the noisy case, with the effective independence number.

#### Author Information

##### Michal Valko (DeepMind)

Michal is a research scientist in DeepMind Paris and SequeL team at Inria Lille - Nord Europe, France, lead by Philippe Preux and Rémi Munos. He also teaches the course Graphs in Machine Learning at l'ENS Cachan. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimising the data that humans need spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as bandit algorithms, semi-supervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure can lead to provably faster learning. In the past the common thread of Michal's work has been adaptive graph-based learning and its application to the real world applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his PhD in 2011 from University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos.