## ICML 2008 Tutorials Schedule, Saturday 5 July

### Place: Main building, University of Helsinki, Fabianinkatu 33

There will be three tutorial slots, namely morning, early afternoon and late afternoon. Each slot will be 2 hours 30 mins long, including a 10 min comfort break.NB: Tutorials will only take place if a minimum of 20 participants have registered.

**Morning 9:00-11:30**

T1 Painless embeddings of distributions: the function space view

Alex Smola, Arthur Gretton, and Kenji Fukumizu (S13, 3rd floor)
[course webpage]

T2 Stochastic optimal control theory

Bert Kappen, Marc Toussaint (S5, 3rd floor)
[course
webpage]

T3 Dimensionality Reduction the Probabilistic Way

Neil Lawrence (S1, 2nd floor)
[course
webpage]

**Early afternoon 13:00-15:30**

T4 Graphical models and variational methods: Message-passing and relaxations

Martin Wainwright (S1, 2nd floor)
[course
webpage]

T5 Playing Machines: Machine Learning Applications in Computer Games

Ralf Herbrich, Thore Graepel (S5, 3rd floor)
[course
webpage]

T6 Beyond Convexity: Submodularity in Machine Learning

Andreas Krause, Carlos Guestrin (S13, 4th floor)
[course
webpage]

**Late Afternoon 16:00-18:30**

T7 Tutorial on Theory and Applications of Online Learning

Shai Shalev-Shwartz, Yoram Singer (S1, 2nd floor)
[course
webpage]

T8 Visual Object Recognition and Retrieval

Rob Fergus (S13, 3rd floor)
[course
webpage]

T9 Sparse Linear Models: Bayesian Inference and Experimental Design

Matthias Seeger (SH, 4th floor)
[course
webpage]

### Abstracts

**T1 Painless embeddings of distributions: the function space view**

Alex Smola, Arthur Gretton, and Kenji Fukumizu [course webpage]

In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear algorithms from linear ones. More recently, however, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics. For instance, in kernel independent component analysis, general nonlinear dependencies show up as linear correlations once they are computed in a suitable reproducing kernel Hilbert space. This tutorial provides an introduction to embeddings of probability distributions into reproducing kernel Hilbert spaces, as a way of painlessly dealing with high order statistics. We will cover both theoretical issues, such as conditions under which different probability distributions have unique mappings; as well as practical applications ranging from tests of distribution properties (homogeneity, independence, conditional independence) to density estimation to causal inference.

**T2 Stochastic optimal control theory**

Bert Kappen, Radboud University, Nijmegen, the Netherlands
Marc Toussaint, Technical University, Berlin, Germany
[course
webpage]

Stochastic optimal control theory concerns the problem of how to act optimally when reward is only obtained at a later time. The stochastic optimal control problem is central to modeling of intelligent behavior in animals or machines. Examples are the control of multi-joint robot arms, navigation of vehicles, coordination of multi-agent systems, and decision making in financial applications. Classical optimal control theory is based on principles like the Hamilton-Jacobi-Bellman equation, the Pontryagin maximum principle, and special cases like the LQ-case and the Ricatti equations. More familiar to the Machine Learner are Reinforcement Learning or (Partially Observable) Markov Decision Processes which can be viewed as special cases of stochastic control theory. This tutorial aims to introduce to the classical principles as well as the more modern frameworks and thereby to provide an integrative view on the different notions. Special emphasis is given on newer approaches of using inference techniques to solving stochastic optimal control problems. The tutorial is introductory and aimed at the 'average' machine learning researcher. No background in control theory and/or reinforcement learning is assumed. A basic understanding of Bayesian networks and statistical inference is assumed.

**T3 Dimensionality Reduction the Probabilistic Way**

Neil Lawrence
[course
webpage]

Synopsis: The main focus of this tutorial will be probabilistic interpretations of dimensional reduction. It is aimed to complement the tutorial given by Lawrence Saul at NIPS 2005 on "Spectral Methods for Dimensional Reduction". Its particular focus will be probabilistic approaches to dimensional reduction based on generative models. These approaches have become increasingly popular in graphics and vision through the Gaussian Process Latent Variable Model. However, there also is a history to these methods which is perhaps less widely known amoungst the newer generation of researchers. In particular the Generative Topographic Mapping and Latent Density Networks. This tutorial will give grounding to these methods through unifying them in the context of probabilistic latent variable models. This will involve a introduction to these approaches through the mechanism of probabilistic PCA, then a discussion of density networks leading into the generative topographic mapping. Finally the dual interpretation of probabilistic PCA and its extension to the GP-LVM will be given. Throughout the tutorial we will develop intuition about the methods with an ongoing set of example data sets. A particular focus of these example data sets will be motion capture data. Motion capture data is a nice example to use because it is easy for the human eye to tell when samples from the model are realistic.

One aspect of the tutorial will be the difference between the probabilistic approaches and the more commonly applied spectral approaches. In particular we will emphasise the distance preservation character of the probabilistic approaches: namely that local distances in the data are not necessarily preserved in the latent space. This contrasts with spectral algorithms which typically aim to preserve such local distances. These different characteristics mean that probabilistic approaches complement the spectral approaches, but the bring their own range of associated problems, in particular local minima in the optimisation space. Heuristics for avoiding these local minima will also be discussed.

**T4 Graphical models and variational methods: Message-passing and relaxations**

Martin Wainwright
[course
webpage]

Graphical models provide a flexible framework for capturing dependencies among large collections of random variables, and are by now an essential component of the statistical machine learning toolbox. Any application of graphical models involves a core set of computational challenges, centered around the problems of marginalization, mode-finding, parameter estimation, and structure estimation. Although efficiently solvable for graphs without cycles (trees) and graphs of low treewidth more generally, exact solutions to these core problems are computationally challenging for general graphical models with large numbers of nodes and/or state space sizes. Consequently, many applications of graphical models require efficient methods for computing approximate solutions to these core problems. The past decade and a half has witnessed an explosion of activity on approximate algorithms for graphical models. This tutorial will show how a wide class of methods----including mean field theory, sum-product or belief propagation algorithms, expectation-propagation, and max-product algorithms----are all variational methods, meaning that they can be understood as algorithms for solving particular optimization problems on graphs. The perspective also forges connections to convex optimization, including linear programming and other type of conic relaxations.

**T5 Playing Machines: Machine Learning Applications in Computer Games**

Ralf Herbrich, Thore Graepel
[course
webpage]

The tutorial will give an introduction to the emerging area of applying machine learning to computer games and of using computer games as test beds for machine learning. One of the key problems in computer games is the creation of AI driven agents that interact with the player so as to create a great interactive gaming experience. As a consequence a substantial part of the tutorial will consider adaptive and learning game AI based on supervised and reinforcement learning. However, computer games also offer a great variety of other challenges including problems in graphics, sound, networking, player rating and matchmaking, interface design, narrative generation etc. Selected problems from some of these areas will be discussed together with machine learning approaches to solve them. Since this is an application area, the tutorial will focus on past and recent applications, open problems and promising avenues for future research. It will also provide resources available to people who would like to work in this fascinating and fun research space.

**T6 Beyond Convexity: Submodularity in Machine Learning**

Andreas Krause, Carlos Guestrin
[course
webpage]

Convex optimization has become a main workhorse for many machine learning algorithms during the past ten years. When minimizing a convex loss function for, e.g., training a Support Vector Machine, we can rest assured to efficiently find an optimal solution, even for large problems. In recent years, another fundamental problem structure, which has similar beneficial properties, has emerged as very useful in a variety of machine learning applications: Submodularity is an intuitive diminishing returns property, stating that adding an element to a smaller set helps more than adding it to a larger set. Similarly to convexity, submodularity allows one to efficiently find provably (near-)optimal solutions. In this tutorial, we will give an introduction to the concept of submodularity, discuss algorithms for optimizing submodular functions and - as the main focus - illustrate their usefulness in solving difficult machine learning problems, such as active learning and sparse experimental design, informative path planning, structure learning, clustering, influence maximization and ranking.

**T7 Tutorial on Theory and Applications of Online Learning**

Shai Shalev-Shwartz, Yoram Singer
[course
webpage]

Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Â The roots of online learning goes back to Hannan's work in the 50s. Online learning became of great interest to practitioners due the recent emergence of large scale web applications. Notable examples of web-based applications are online advertisement placement and online web ranking. The tutorial is targeted at people from all areas of machine learning and covers the formal foundations along with algorithmic and practical aspects of online learning. Â The goal is to provide a high-level, broad, and rigorous overview of the formal online framework. By the end of tutorial the attendees should have acquired enough knowledge to be able to pin-point an online algorithm that best matches an application they face.

The tutorial starts with a simple example of predicting the next element of a binary sequence. We then formally introduce the basic definitions of online learning and the notion of regret analysis. Next we describe the problem of predicting with experts advice by analyzing a few algorithms and contrasting them with an impossibility result. This basic setting is then re-examined in the context of online learning of general linear predictors. We give a recent analysis which reveals an underlying primal-dual apparatus for the analysis of online algorithms. We conclude the formal part of the tutorial with a description of extensions and generalizations of online learning tasks while underscoring connections to game theory, information theory, and reinforcement learning. We recap the tutorial with two complete examples that demonstrate the usage of online learning for portfolio selection and for text filtering.

**T8 Visual Object Recognition and Retrieval**

Rob Fergus
[course
webpage]

The tutorial will address the problem of recognizing visual object classes in images, currently the focus of much interest in Computer Vision. As recent innovations in the area draw heavily on machine learning concepts, the tutorial will attempt to highlight the growing intersection between the two areas. The material will be divided five sections, covering (i) bag of words models; (ii) parts and structure models; (iii) discriminative methods; (iv) combined recognition and segmentation and (v) retrieval schemes for large datasets. The emphasis will be on the important general concepts rather than in depth coverage of contemporary papers. The tutorial is a revised version of the prize-winning short course given at ICCV 2005 and CVPR 2007 in conjunction with Fei-Fei Li (Princeton) and Antonio Torralba (MIT).

**T9 Sparse Linear Models: Bayesian Inference and Experimental Design**

Matthias Seeger
[course
webpage]

Sparse linear models are cornerstones of applied statistics, embodying fundamental ideas such as feature selection, shrinkage, and automatic relevance determination. While much progress has been made recently in understanding point estimation of sparse signals, Bayesian inference is needed to drive higher-level tasks such as experimental design, where valid uncertainties and covariances are more important than point estimates. In this tutorial, the major determnistic inference approximations to date (expectation propagation, sparse Bayesian learning, variational mean field Bayes) will be introduced for the sparse linear model, and their mathematics (scale mixtures, convex duality, moment matching) will be clarified. Sequential Bayesian design, with the application to optimizing an image measurement architecture, serves as motivation for this effort.

ICML 2008 Tutorials chair: Chris Williams