Timezone: »

Poster
Second-Order Kernel Online Convex Optimization with Adaptive Sketching
Daniele Calandriello · Alessandro Lazaric · Michal Valko

Mon Aug 07 01:30 AM -- 05:00 AM (PDT) @ Gallery #81
Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $O(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $O(\sqrt{T})$ regret. Nonetheless, many common losses in kernel problems, such as squared loss, logistic loss, and squared hinge loss posses stronger curvature that can be exploited. In this case, second-order KOCO methods achieve $O(\log(\Det(K)))$ regret, which we show scales as $O(deff \log T)$, where $deff$ is the effective dimension of the problem and is usually much smaller than $O(\sqrt{T})$. The main drawback of second-order methods is their much higher $O(t^2)$ space and time complexity. In this paper, we introduce kernel online Newton step (KONS), a new second-order KOCO method that also achieves $O(deff\log T)$ regret. To address the computational complexity of second-order methods, we introduce a new matrix sketching algorithm for the kernel matrix~$K$, and show that for a chosen parameter $\gamma \leq 1$ our Sketched-KONS reduces the space and time complexity by a factor of $\gamma^2$ to $O(t^2\gamma^2)$ space and time per iteration, while incurring only $1/\gamma$ times more regret.

#### Author Information

##### Michal Valko (Inria Lille - Nord Europe)

Michal is a research scientist in DeepMind Paris and SequeL team at Inria Lille - Nord Europe, France, lead by Philippe Preux and Rémi Munos. He also teaches the course Graphs in Machine Learning at l'ENS Cachan. Michal is primarily interested in designing algorithms that would require as little human supervision as possible. This means 1) reducing the “intelligence” that humans need to input into the system and 2) minimising the data that humans need spend inspecting, classifying, or “tuning” the algorithms. Another important feature of machine learning algorithms should be the ability to adapt to changing environments. That is why he is working in domains that are able to deal with minimal feedback, such as bandit algorithms, semi-supervised learning, and anomaly detection. Most recently he has worked on sequential algorithms with structured decisions where exploiting the structure can lead to provably faster learning. In the past the common thread of Michal's work has been adaptive graph-based learning and its application to the real world applications such as recommender systems, medical error detection, and face recognition. His industrial collaborators include Adobe, Intel, Technicolor, and Microsoft Research. He received his PhD in 2011 from University of Pittsburgh under the supervision of Miloš Hauskrecht and after was a postdoc of Rémi Munos.