Timezone: »

Oral
Exploiting structure of uncertainty for efficient matroid semi-bandits
Pierre Perrault · Vianney Perchet · Michal Valko

Thu Jun 13 04:25 PM -- 04:30 PM (PDT) @ Hall B
We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as \emph{independence}. However, while being minimax optimal in terms of regret, these algorithms are intractable. In our paper, we first reduce their implementation to a specific \emph{submodular maximization}. Then, in case of \emph{matroid} constraints, we design adapted approximation routines, thereby providing the first efficient algorithms that exploit reward structure. In particular, we improve the state-of-the-art efficient gap-free regret bound by a factor $\sqrt{k}$, where $k$ is the maximum action size. Finally, we show how our improvement translates to more general \emph{budgeted combinatorial semi-bandits}.

#### Author Information

##### Michal Valko (DeepMind)

Michal is a research scientist in DeepMind Paris and SequeL team at Inria Lille - Nord Europe, France, lead by Philippe Preux and Rémi Munos. He also teaches the course Graphs in Machine Learning at l'ENS Cachan. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimising the data that humans need spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as bandit algorithms, semi-supervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure can lead to provably faster learning. In the past the common thread of Michal's work has been adaptive graph-based learning and its application to the real world applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his PhD in 2011 from University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos.