Timezone: »

 
Keynote by Sham Kakade: Prediction, Learning, and Memory
Sham Kakade

Fri Jun 14 10:40 AM -- 11:10 AM (PDT) @ None

Building accurate language models that capture meaningful long-term dependencies is a core challenge in language processing. We consider the problem of predicting the next observation given a sequence of past observations, specifically focusing on the question of how to make accurate predictions that explicitly leverage long-range dependencies. Empirically, and perhaps surprisingly, we show that state-of-the-art language models, including LSTMs and Transformers, do not capture even basic properties of natural language: the entropy rates of their generations drift dramatically upward over time. We also provide provable methods to mitigate this phenomenon: specifically, we provide a calibration-based approach to improve an estimated model based on any measurable long-term mismatch between the estimated model and the true underlying generative distribution. More generally, we will also present fundamental information theoretic and computational limits of sequential prediction with a memory.

Bio: Sham Kakade is a Washington Research Foundation Data Science Chair, with a joint appointment in the Department of Computer Science and the Department of Statistics at the University of Washington. He works on the theoretical foundations of machine learning, focusing on designing provable and practical statistically and computationally efficient algorithms. Amongst his contributions, with a diverse set of collaborators, are: establishing principled approaches in reinforcement learning (including the natural policy gradient, conservative policy iteration, and the PAC-MDP framework); optimal algorithms in the stochastic and non-stochastic multi-armed bandit problems (including the widely used linear bandit and the Gaussian process bandit models); computationally and statistically efficient tensor decomposition methods for estimation of latent variable models (including estimation of mixture of Gaussians, latent Dirichlet allocation, hidden Markov models, and overlapping communities in social networks); faster algorithms for large scale convex and nonconvex optimization (including how to escape from saddle points efficiently). He is the recipient of the IBM Goldberg best paper award (in 2007) for contributions to fast nearest neighbor search and the best paper, INFORMS Revenue Management and Pricing Section Prize (2014). He has been program chair for COLT 2011.

Sham completed his Ph.D. at the Gatsby Computational Neuroscience Unit at University College London, under the supervision of Peter Dayan, and he was a postdoc at the Dept. of Computer Science, University of Pennsylvania , under the supervision of Michael Kearns. Sham was an undergraduate at Caltech , studying in physics under the supervision of John Preskill. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago.

Author Information

Sham Kakade (University of Washington)

Sham Kakade is a Washington Research Foundation Data Science Chair, with a joint appointment in the Department of Computer Science and the Department of Statistics at the University of Washington, and is a co-director for the Algorithmic Foundations of Data Science Institute. He works on the mathematical foundations of machine learning and AI. Sham's thesis helped in laying the foundations of the PAC-MDP framework for reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods, Conservative Policy Iteration, for reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models (applicable to mixture of Gaussians, HMMs, and LDA); the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the IBM Goldberg best paper award (in 2007) for contributions to fast nearest neighbor search and the best paper, INFORMS Revenue Management and Pricing Section Prize (2014). He has been program chair for COLT 2011. Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He then completed his Ph.D. in computational neuroscience at the Gatsby Unit at University College London, under the supervision of Peter Dayan. He was a postdoc at the Dept. of Computer Science, University of Pennsylvania , where he broadened his studies to include computational game theory and economics from the guidance of Michael Kearns. Sham has been a Principal Research Scientist at Microsoft Research, New England, an associate professor at the Department of Statistics, Wharton, UPenn, and an assistant professor at the Toyota Technological Institute at Chicago.

More from the Same Authors