Skip to yearly menu bar Skip to main content

Invited Talk: David Duvenaud

July 24, 2021, 2:45 p.m.

July 24, 2021, 9:45 a.m.

Bayesian multiscale models exploit variants of the “decouple/recouple'' concept to enable advances in forecasting and monitoring of increasingly large-scale time series. Recent and current applications include financial and commercial forecasting, as well as dynamic network studies. I overview some recent developments via examples from applications in large-scale consumer demand and sales forecasting with intersecting marketing related goals. Two coupled applied settings involve (a) models for forecasting daily sales of each of many items in every supermarket of a large national chain, and (b) models for understanding and forecasting customer/household-specific purchasing behavior to informs decisions about personalized pricing and promotions on a continuing basis. The multiscale concept is applied in each setting to define new classes of hierarchical Bayesian state-space models customized to the application. In each area, micro-level, individual time series are represented via customized model forms that also involve aggregate-level factors, the latter being modelled and forecast separately. The implied conditional decoupling of many time series enables computational scalability, while the effects of shared multiscale factors define recoupling to appropriately reflect cross-series dependencies. The ideas are of course relevant to other applied settings involving large-scale, hierarchically structured time series.

July 24, 2021, 9 a.m.

Mihaela van der Schaar

Professor van der Schaar is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge, a Turing Faculty Fellow at The Alan Turing Institute in London, and Chancellor's Professor at UCLA. She was elected IEEE Fellow in 2009. She has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), an NSF Career Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. She holds 35 granted USA patents. In 2019, she was identified by National Endowment for Science, Technology and the Arts as the female researcher based in the UK with the most publications in the field of AI. She was also elected as a 2019 "Star in Computer Networking and Communications". Her current research focus is on machine learning, AI and operations research for healthcare and medicine. For more details, see her website:

July 23, 2021, 5:30 a.m.

Speaker's bio: Professor Richard Susskind OBE is an author, speaker, and independent adviser to major professional firms and to national governments. His main area of expertise is the future of professional service and, in particular, the way in which the IT and the Internet are changing the work of lawyers. He has worked on legal technology for over 30 years. He lectures internationally, has written many books, and advised on numerous government inquiries.

July 23, 2021, 11:30 a.m.

Kiante Brantley

July 23, 2021, 1:30 p.m.

Pieter Abbeel

July 23, 2021, 1 p.m.

Chelsea Finn

Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has included deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for learning reward functions underlying behavior, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, the Microsoft Research Faculty Fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across four universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.

July 23, 2021, 9 a.m.

Rosemary Nan Ke

I am a PhD student at Mila, I am advised by Chris Pal and Yoshua Bengio. My research interest are efficient credit assignment, causal learning and model-based reinforcement learning. Here is my homepage

July 23, 2021, 8:30 a.m.

July 23, 2021, 6:30 a.m.

Invited talk: Invited Talk by David Ha

July 23, 2021, 6 a.m.

David Ha

July 23, 2021, 9 a.m.

Dorsa Sadigh

July 23, 2021, 12:30 p.m.

Yisong Yue

Yisong Yue is a Professor of Computing and Mathematical Sciences at Caltech and (via sabbatical) a Principal Scientist at Latitude AI. His research interests span both fundamental and applied pursuits, from novel learning-theoretic frameworks all the way to deep learning deployed in autonomous driving on public roads. His work has been recognized with multiple paper awards and nominations, including in robotics, computer vision, sports analytics, machine learning for health, and information retrieval. At Latitude AI, he is working on machine learning approaches to motion planning for autonomous driving.

July 21, 2021, 8 p.m.

This talk will review recent advances to understand how speech, a unique and defining human behavior, is processed by the cerebral cortex. We will discuss new neuroscientific knowledge on how the brain represents vocal tract movements to give rise to all consonants and vowels, and how this knowledge has been applied to development of a “speech neuroprosthesis” to restore communication for persons living with paralysis.

Edward Chang

July 23, 2021, 6 a.m.

July 24, 2021, 11:30 a.m.

Digital tools have been proven effective to deliver mental health screening and intervention, but uptake is usually very low, severely limiting generalizability of findings and impact of tools. The COVID-19 pandemic has substantially increased the urgency to develop nimble and valid screening instruments and tools that effectively address users' needs. We have developed a framework of digital data triangulation, intervention co-development, and integration with brick and mortar systems, and will present preliminary results.

Daniel Vigo

July 23, 2021, 12:35 p.m.

Member privacy is a priority at LinkedIn and a key consideration in product design. We aim to protect member privacy by using innovative privacy engineering techniques such as differential privacy to provide valuable insights to members and policy makers. The differential privacy team at LinkedIn has developed new algorithms that work with existing real time data analytics systems, allowing us to develop systems at scale while protecting member privacy. This talk will cover several deployments of LinkedIn products that incorporate differential privacy.

Ryan Rogers

July 24, 2021, 12:20 p.m.

July 24, 2021, 11 a.m.

Quantification of causal influence is a non-trivial conceptual problem. Well-known concepts like Granger causality and transfer entropy are arguably correct to detect the presence of causal influence (subject to assumptions like causal sufficiency and positive probability density), but following [2] I argue that taking them as measure for the strength of causal influence is conceptually flawed. To discuss this, I consider the more general question of quantifying the strength of an edge (or a set of edges) in a causal DAG. I describe a few postulates that we [1] would expect from a measure of causal influence and describe the information theoretic casual strength that we proposed in [1]. Reference: [1] D. Janzing, D. Balduzzi, M. Grosse-Wentrup, B. Schölkopf: Quantifying causal influences. Annals of Statistics, 2013. [2] N. Ay and D. Polani: Information flow in causal networks, 2008.

Dominik Janzing

July 20, 2021, 8 a.m.

Modern medicine has given us effective tools to treat some of the most significant and burdensome diseases. At the same time, it is becoming consistently more challenging and more expensive to develop new therapeutics. A key factor in this trend is that the drug development process involves multiple steps, each of which involves a complex and protracted experiment that often fails. We believe that, for many of these phases, it is possible to develop machine learning models to help predict the outcome of these experiments, and that those models, while inevitably imperfect, can outperform predictions based on traditional heuristics. To achieve this goal, we are bringing together high-quality data from human cohorts, while also developing cutting edge methods in high throughput biology and chemistry that can produce massive amounts of in vitro data relevant to human disease and therapeutic interventions. Those are then used to train machine learning models that make predictions about novel targets, coherent patient segments, and the clinical effect of molecules. Our ultimate goal is to develop a new approach to drug development that uses high-quality data and ML models to design novel, safe, and effective therapies that help more people, faster, and at a lower cost.

Daphne Koller

Daphne Koller is CEO and Founder of insitro, a machine-learning enabled drug discovery company transforming the way drugs are discovered and delivered to patients. She is the co-founder of online education platform Engageli and of Coursera, the largest platform for massive open online courses (MOOCs), where she was co-CEO and President. Daphne was the Rajeev Motwani Professor of Computer Science at Stanford University, where she served on the faculty for 18 years. She has also been Chief Computing Officer of Calico, an Alphabet company in the healthcare space. She is the author of over 200-refereed publications appearing in venues such as Science, Cell, and Nature Genetics. Daphne was recognized as one of TIME Magazine¹s 100 most influential people in 2012 and Newsweek¹s 10 most important people in 2010. She has been honored with multiple awards and fellowships during her career including the Sloan Foundation Faculty Fellowship in 1996, the ONR Young Investigator Award in 1998, the Presidential Early Career Award for Scientists and Engineers (PECASE) in 1999, the IJCAI Computers and Thought Award in 2001, the MacArthur Foundation Fellowship in 2004, and the ACM Prize in Computing in 2008. Daphne was inducted into the National Academy of Engineering in 2011 and elected a fellow of the American Association for Artificial Intelligence in 2004, the American Academy of Arts and Sciences in 2014 and of the International Society of Computational Biology in 2017. Her teaching was recognized via the Stanford Medal for Excellence in Fostering Undergraduate Research, and as a Bass University Fellow in Undergraduate Education.

July 22, 2021, 8 a.m.

I present an overview of the different ways machine learning is making an impact in molecular science. In particular I focus on theoretical and computational biophysics at the molecular scale, and how machine learning is revolutionizing molecular simulation techniques. I present some of the methods developed in the last few years, the results that have been obtained and the challenges ahead. I describe in some detail the application of machine learning to the development of molecular models for biological macromolecules at resolutions coarser than atomistic, that can accurately reproduce the behavior of the system as described by atomistic models or experimental measurements.

Cecilia Clementi

Cecilia Clementi is a Professor of Chemistry, and Chemical and Biomolecular Engineering, and Senior Scientist in the Center for Theoretical Biological Physics at Rice University. Cecilia received her Laurea (B.S.) degree in Physics from the University of Florence, Italy, in 1995, and her PhD in Physics from the International School for Advanced Studies (SISSA/ISAS) in Trieste, Italy, in 1998. After a postdoctoral fellowship in the La Jolla Interfaces in Science (LJIS) program at the University of California San Diego, she joined the Rice faculty in 2001, where she leads an interdisciplinary group working on multiscale macromolecular modeling. Cecilia's work has been recognized with multiple awards, such as the Norman Hackerman Award in Chemical Research of the Welch Foundation, and the NSF-CAREER award.

Cecilia is serving as co-Director of the NSF-funded Molecular Sciences Software Institute (MolSSI), where she is responsible for the activities in biomolecular simulation and in international engagement. She is also an Einstein Visiting Fellow at the Freie Universitat in Berlin, Germany.

July 24, 2021, 4 p.m.

Inspired by the demands of real-time subseasonal climate forecasting, we develop optimistic online learning algorithms that require no parameter tuning and have optimal regret guarantees under delayed feedback. Our algorithms -- DORM, DORM+, and AdaHedgeD -- arise from a novel reduction of delayed online learning to optimistic online learning that reveals how optimistic hints can mitigate the regret penalty caused by delay. We pair this delay-as-optimism perspective with a new analysis of optimistic learning that exposes its robustness to hinting errors and a new meta-algorithm for learning effective hinting strategies in the presence of delay. We conclude by benchmarking our algorithms on four subseasonal climate forecasting tasks, demonstrating low regret relative to state-of-the-art forecasting models.

July 20, 2021, 8 p.m.

Cryosphere is the layer in a negative temperature state on earth, with continuous distribution and certain thickness. The earth's cryosphere can be divided into three types as continental, marine, and aerial cryosphere, which includes glacier/ice sheet, permafrost, snow cover, lake and river ice, sea ice, ice shelf, iceberg, and solid precipitation, etc. The cryosphere is one of the five major spheres of the climate system. It plays an important role in the earth system with its huge fresh water reserves, latent heat of phase transitions, carbon storage, and unique species habitats and cultural forms.

The presentation starts with an introduction of IPCC main conclusions on human induced climate change and its extremes since the Industrial Involution, especially recent decades. Cryosphere is a sensitive indicator of climate change. The impacts of rapid cryospheric changes have received increasing concerns since 21st century under the background of global warming, extending the research to the interactions between earth’s multi-spheres, including anthroposphere. As a result, cryospheric science has been rapidly developed into a new interdisciplinary, covering its formation, change processes and mechanism, its interactions with and among atmosphere/hydrosphere/biosphere/lithosphere, the influences and adaptations of cryosphere change impacts, the changing functions for serving regional and global economy and society. Cryospheric science is an inevitable scope of international research on the earth and environmental changes, as well as on human sustainable development.

The study on Chinese cryosphere has been developing rapidly following the scope of Cryospheric Science in the past 20 years, especially in the last decade. It has presented systematic achievements in terms of changes in the cryosphere and their impacts on ecology, hydrology, climate, environment, society and economy, and also obtain systematic understanding of the connotation and extension of the Cryospheric Science, made important contributions to the establishment and development of research framework and disciplinary system of the cryospheric science.

The presentation will also show some case studies on cryosphere using machine learning, such as data mining, permafrost mapping and soil organic carbon estimation, Arctic sea ice prediction, outlet glacier instability estimation of ice sheet, as well as paleoclimatic proxy reconstructions. Machine learning is a promising tool for studying both natural aspects and the socioeconomic aspects when studying cryospheric impacts such as services and hazards. There are complex linkages between cryospheric impacts and UN 2030s’ Sustainable Development Goals (SDGs) over the cryospheric influential regions, it is promising to use big data and machine learning to deepen our knowledge.

Key words: IPCC, cryospheric science, sustainable development, machine learning

Qin Dahe

Xiao Cunde

Dr. Cunde Xiao is the Director of State Key Laboratory of Earth Surface Processes and Resources Economy, Beijing Normal University, China. He graduated from Lanzhou University (China) in 1992, and received Ph.D on glaciology in 1997. He has worked in the fields of polar glaciology and meteorology since then. His major research focus has been ice core studies relating to paleoclimate and paleoenvironment, and present-day cold region meteorological and glaciological processes that impact environmental and climatic changes, recently more on cryospheric functions and their socioeconomic services. Dr. Xiao is the former Vice-president of International Association of Cryospheric Sciences (IACS), IUGG; Review editor of both IPCC AR5 WG1 and Special Report of Ocean and Cryosphere under Changing Climate (SROCC), Council member of International Glaciology Society (IGS), member of the Steering Committee for the international program, Antarctica in the Global Climate System (AGCS) of the Scientific Committee on Antarctica Research (SCAR), member of the Scientific Steering Committee, of the World Climate Research Programme (WCRP)– Climate and the Cryosphere initiative (CliC), member of Expert Committee of Polar and High Mountain Observation, Research and Services (EC-PHORS), WMO. He is now Coordinating Lead Author (CLA) of Chapter 9, IPCC AR6 WG1; He has published more than 170 scientific papers.

July 24, 2021, 2:45 p.m.

Secure aggregation is a critical component in federated learning, which enables the server to learn the aggregate model of the users without observing their local models. Conventionally, secure aggregation algorithms focus only on ensuring theprivacy of individual users in a single training round. We contend that such designs can lead to significant privacy leakages over multiple training rounds, due to partial user selection/participation at each round of federated learning. In fact, we empiricallyshow that the conventional random user selection strategies for federated learning lead to leaking users' individual models within number of rounds linear in the number of users. To address this challenge, we introduce a secure aggregation framework with multi-roundprivacy guarantees. In particular, we introduce a new metric to quantify the privacy guarantees of federated learning over multiple training rounds, and develop a structured user selection strategy that guarantees the long-term privacy of each user (over anynumber of training rounds). Our framework also carefully accounts for the fairness and the average number of participating users at each round. We perform several experiments on various datasets in the IID and the non-IID settings to demonstrate the performanceimprovement over the baseline algorithms, both in terms of privacy protection and test accuracy. We conclude the talk by discussing several open problems in this domain. (This talk is based on the following paper:

July 24, 2021, 7:20 a.m.

Adversarial machine learning is often used as a tool to assess the negative impacts and failure modes of a machine learning system. In this talk, I will present model reprogramming, a new paradigm of data-efficiency transfer learning motivated by studying the adversarial robustness of deep learning models.

July 23, 2021, 7 a.m.

Kelsey Allen

July 21, 2021, 8 a.m.

In this talk, I discuss how approaches that may seem very different (randomized controlled trials and Machine Learning) can in fact be complementary. RCT can serve as a useful benchmark to evaluate the real world performance of ML strategies to recover causal effects. ML methods can be used to investigate treatment effect heterogeneity, sort through a large number of possible treatments, etc. The talk concludes with a wish list for Machine learning specialists.

Esther Duflo

Esther Duflo is the Abdul Latif Jameel Professor of Poverty Alleviation and Development Economics in the Department of Economics at the Massachusetts Institute of Technology and a co-founder and co-director of the Abdul Latif Jameel Poverty Action Lab (J-PAL). In her research, she seeks to understand the economic lives of the poor, with the aim to help design and evaluate social policies. She has worked on health, education, financial inclusion, environment and governance.

Professor Esther Duflo’s first degrees were in history and economics from Ecole Normale Superieure, Paris. She subsequently received a Ph.D. in Economics from MIT in 1999.

Duflo has received numerous academic honors and prizes including 2019 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel (with co-Laureates Abhijit Banerjee and Michael Kremer), the Princess of Asturias Award for Social Sciences (2015), the A.SK Social Science Award (2015), Infosys Prize (2014), the David N. Kershaw Award (2011), a John Bates Clark Medal (2010), and a MacArthur “Genius Grant” Fellowship (2009). With Abhijit Banerjee, she wrote Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty, which won the Financial Times and Goldman Sachs Business Book of the Year Award in 2011 and has been translated into more than 17 languages, and the recently released Good Economics for Hard Times.

Duflo is the Editor of the American Economic Review, a member of the National Academy of Sciences and a Corresponding Fellow of the British Academy.

July 24, 2021, 10:50 a.m.

How can we quantify the accuracy and uncertainty of predictions that we make in online decision problems? Standard approaches, like asking for calibrated predictions or giving prediction intervals using conformal methods give marginal guarantees --- i.e. they offer promises that are averages over the history of data points. Guarantees like this are unsatisfying when the data points correspond to people, and the predictions are used in important contexts --- like personalized medicine.

In this work, we study how to give stronger than marginal ("multivalid") guarantees for estimates of means, moments, and prediction intervals. Guarantees like this are valid not just as averaged over the entire population, but also as averaged over an enormous number of potentially intersecting demographic groups. We leverage techniques from game theory to give efficient algorithms promising these guarantees even in adversarial environments.

July 24, 2021, 11:30 a.m.

July 24, 2021, 12:30 p.m.

This talk gives an overview of recent results in a line of theoretical work that started 3 decades ago in statistical physics. We will first discuss teacher-student setting of the generalized linear regression. We illustrate the presence of the interpolation peak for classification with ridge loss and its vanishing with regularization. We show that, in the spherical perceptron, the optimally regularized logistic regression approaches very closely the Bayes optimal accuracy. We contrast this with the non-convex case of phase retrieval where the canonical empirical risk minimization performs poorly compared to the Bayes-optimal error. We then move towards learning with hidden units and analyze double descent in learning with generic fixed features and any convex loss. The formulas we obtain a generic enough to describe the learning of the last layer of neural networks for realistic data and networks. Finally, for the phase retrieval, we are able to analyze gradient descent in the feature-learning regime of a two-layer neural network where we show that overparametrization allows a considerable reduction of the sample complexity. Concretely, an overparametrized neural network only needs twice the input dimension of samples, while non-overparametrized network needs constant times more, and kernel regression quadratically many samples in the input dimension.

Lenka Zdeborova

July 24, 2021, 1:25 p.m.

I consider two layer neural networks trained with square loss in the linear (lazy) regime. Under overparametrization, gradient descent converges to the minimum norm interpolant, and I consider this as well as the hole ridge regularization path. From a statistical viewpoint, these approaches are random features models, albeit of a special type. They are also equivalent to kernel ridge regression, with a random kernel of rank N*d (where N is the number of hidden neurons, and d the input dimension). I will describe a precise characterization of the generalization error when N, d and the sample size are polynomially related (and for covariates that are uniform on the d-dimensional sphere). I will then discuss the limitation of these approaches. I will explain how sparse random feature models can be learnt efficiently to try to address these limitations. [Based on joint work with Michael Celentano, Song Mei, Theodor Misiakiewicz, Yiqiao Zhong]

Andrea Montanari

July 24, 2021, 3:55 p.m.

A frequent criticism from the statistics community to modern machine learning is the lack of rigorous uncertainty quantification. Instead, the machine learning community would argue that conventional uncertainty quantification based on idealized distributional assumptions may be too restrictive for real data. This paper will make progress in resolving the above inference dilemma. We propose a computationally efficient method to construct nonparametric, heteroskedastic prediction bands for uncertainty quantification, with or without any user-specified predictive model. The data-adaptive prediction band is universally applicable with minimal distributional assumptions, with strong non-asymptotic coverage properties, and easy to implement using standard convex programs. Our approach can be viewed as a novel variance interpolation with confidence and further leverages techniques from semi-definite programming and sum-of-squares optimization. Theoretical and numerical performances for the proposed approach for uncertainty quantification are analyzed.

Tengyuan Liang

July 24, 2021, 4:50 p.m.

The magnitude of the weights of a neural network is a fundamental measure of complexity that plays a crucial role in the study of implicit and explicit regularization. For example, in recent work, gradient descent updates in overparameterized models asymptotically lead to solutions that implicitly minimize the ell2 norm of the parameters of the model, resulting in an inductive bias that is highly architecture-dependent. To investigate the properties of learned functions, it is natural to consider a function space view given by the minimum ell2 norm of weights required to realize a given function with a given network. We call this the “induced regularizer” of the network. Building on a line of recent work, we study the induced regularizer of linear convolutional neural networks with a focus on the role of kernel size and the number of channels. We introduce an SDP relaxation of the induced regularizer, that we show is tight for networks with a single input channel. Using this SDP formulation, we show that the induced regularizer is independent of the number of the output channels for single-input channel networks, and for multi-input channel networks, we show independence given sufficiently many output channels. Moreover, we show that as the kernel size increases, the induced regularizer interpolates between a basis-invariant norm and a basis-dependent norm that promotes sparse structures in Fourier space. Based on joint work with Meena Jagadeesan and Ilya Razenshteyn.

July 24, 2021, 9:05 a.m.

Because the phenomenon of adversarial examples in deep networks poses a serious barrier to the reliable and robust application of this methodology, there has been considerable interest in why it arises. We consider ReLU networks of constant depth with independent Gaussian parameters, and show that small perturbations of input vectors lead to large changes of outputs. Building on insights of Daniely and Schacham (2020) and of Bubeck et al (2021), we show that adversarial examples arise in these networks because the functions that they compute are very close to linear. The main result is for networks with a constant depth, but we also show that some constraint on depth is necessary for a result of this kind: there are suitably deep networks that, with constant probability, compute a function that is close to constant. This, combined with results characterizing benign overfitting in linear regression, suggests two potential mechanisms behind adversarial examples in overparameterized settings, one arising from label noise and the other from random initialization. Joint work with Sébastien Bubeck and Yeshwanth Cherapanamjeri

Peter Bartlett

July 24, 2021, 10 a.m.

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. In this talk, I will discuss some general mathematical principles allowing for efficient optimization in over-parameterized non-linear systems, a setting that includes deep neural networks. I will discuss that optimization problems corresponding to these systems are not convex, even locally, but instead satisfy the Polyak-Lojasiewicz (PL) condition on most of the parameter space, allowing for efficient optimization by gradient descent or SGD. I will connect the PL condition of these systems to the condition number associated with the tangent kernel and show how a non-linear theory for those systems parallels classical analyses of over-parameterized linear equations. As a separate related development, I will discuss a perspective on the remarkable recently discovered phenomenon of transition to linearity (constancy of NTK) in certain classes of large neural networks. I will show how this transition to linearity results from the scaling of the Hessian with the size of the network controlled by certain functional norms. Combining these ideas, I will show how the transition to linearity can be used to demonstrate the PL condition and convergence for a general class of wide neural networks. Finally, I will comment on systems that are ''almost'' over-parameterized, which appears to be common in practice.

Based on joint work with Chaoyue Liu and Libin Zhu

Mikhail Belkin

July 23, 2021, 1:30 p.m.

Novelty detection, i.e., identifying whether a given sample is drawn from outside the training distribution, is essential for reliable machine learning. To this end, there have been many attempts at learning a representation well-suited for novelty detection and designing a score based on such representation. In this talk, I will present a simple, yet effective method named contrasting shifted instances (CSI), inspired by the recent success on contrastive learning of visual representations. Specifically, in addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself. Based on this, we propose a new detection score that is specific to the proposed training scheme. Our experiments demonstrate the superiority of our method under various novelty detection scenarios, including unlabeled one-class, unlabeled multi-class and labeled multi-class settings, with various image benchmark datasets. This is a joint work with Jihoon Tack, Sangwoo Mo and Jongheon Jeong (all from KAIST).

July 23, 2021, 12:30 p.m.

Aggregate evaluations of deep learning models on popular benchmarks have incentivized the creation of bigger models that are more accurate on iid data. As the research community is realizing that these models do not generalize out of distribution, the trend has shifted to evaluations on adversarially constructed, unnatural datasets. However, both these extremities have limitations when it comes to meeting the goals of evaluation. In this talk, I propose that the goal of evaluation is to inform next action to a user in the form of 1) further analysis or 2) model patching. Thinking of evaluation as an iterative process dovetails with these goals. Our work on Robustness Gym (RG) proposes an iterative process of evaluation and explains how that enables a user to iterate on their model development process. I will give two concrete examples in NLP demonstrating how RG supports the aforementioned evaluation goals. Towards the end of the talk, I will discuss some caveats associated with evaluating pre-trained language models (PLMs) and in particular focus on the problem of input contamination, giving examples from our work on SummVis. Using these examples from RG and SummVis, I hope to draw attention to the limitations of current evaluations and the need for a more thorough process that helps us gain a better understanding of our deep learning models.

July 23, 2021, 11:45 a.m.

Machine learning models deployed in the real world constantly face distribution shifts, yet current models are not robust to these shifts; they can perform well when the train and test distributions are identical, but still have their performance plummet when evaluated on a different test distribution. In this talk, I will discuss methods and benchmarks for improving robustness to distribution shifts. First, we consider the problem of spurious correlations and show how to mitigate it with a combination of distributionally robust optimization (DRO) and controlling model complexity---e.g., through strong L2 regularization, early stopping, or underparameterization. Second, we present WILDS, a curated and diverse collection of 10 datasets with real-world distribution shifts, that aims to address the under-representation of real-world shifts in the datasets widely used in the ML community today. We observe that existing methods fail to mitigate performance drops due to these distribution shifts, underscoring the need for new training methods that produce models which are more robust to the types of distribution shifts that arise in practice.

July 23, 2021, 8:15 a.m.

OOD generalization is a very difficult problem. Instead of tackling it head on, this talk argues that, when considering the current strengths and weaknesses of deep learning, we should consider an alternative approach which tries to dodge the problem altogether. If we can develop scalable pre-training methods that can leverage large and highly varied data sources, there is a hope that many examples (which would have been OOD for standard ML datasets) will have at least some relevant training data, removing the need for elusive OOD capabilities.

Alec Radford

July 23, 2021, 6:15 a.m.

I'll talk about one specific problem I have with the field: scale. Many papers fix an architecture and try to improve log-likelihood, comparing to the original base architecture regardless of how much additional compute is used to outperform the original model. Yet, if we adjust for scale—for example, compare an ensemble of size 10 to a model scaled up 10x—we'd see improvements significantly diminish or vanish altogether. Ultimately, we should be examining the frontier of uncertainty-robustness performance as a function of compute. I'll substantiate this perspective with a few works with colleagues. These works advance the frontier with efficient ensembles alongside priors and inductive biases; and we'll examine uncertainty properties of existing giant models.

Dustin Tran

Invited Talk: Invited Talk: Lora Aroyo

July 23, 2021, 8:50 a.m.

July 24, 2021, 7:30 a.m.

Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple models such as k-means and logistic regression. In this work, we propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual and streaming deep learning, as well as active semi-supervised learning.

Joint work with Zalán Borsos, Mojmír Mutny and Marco Tagliasacchi

Andreas Krause

Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center and Chair of the ETH AI Center, and co-founded the ETH spin-off LatticeFlow. Before that he was an Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University (2008) and his Diplom in Computer Science and Mathematics from the Technical University of Munich, Germany (2004). He is a Max Planck Fellow at the Max Planck Institute for Intelligent Systems, an ELLIS Fellow, a Microsoft Research Faculty Fellow and a Kavli Frontiers Fellow of the US National Academy of Sciences. He received the Rössler Prize, ERC Starting Investigator and ERC Consolidator grants, the German Pattern Recognition Award, an NSF CAREER award as well as the ETH Golden Owl teaching award. His research has received awards at several premier conferences and journals, including the ACM SIGKDD Test of Time award 2019 and the ICML Test of Time award 2020. Andreas Krause served as Program Co-Chair for ICML 2018, and currently serves as General Chair for ICML 2023 and as Action Editor for the Journal of Machine Learning Research.

July 23, 2021, 11 a.m.

Olga Troyanskaya

July 24, 2021, 10:30 a.m.

In this talk (Part I and II), we will cover the different functionalities of DECILE ( which include modules like a) SUBMODLIB, b) CORDS, c) TRUST, d) DISTIL, e) SPEAR. SUBMODLIB is a library for submodular optimization which implements a number of submodular optimization algorithms and functions (including the submodular mutual information and conditional gain functions). CORDS is a library for data subset selection and coresets for compute-efficient training of deep models. TRUST is targeted subset selection for personalization and model remediation. DISTIL is an active learning toolkit for deep models and SPEAR is a library for weak supervision via labeling functions. We will also focus on the different SOTA algorithms implemented and the benchmarks enabled through these toolkits.

In Part I, we will cover submodlib (a toolkit for submodular optimization), and CORDS (a toolkit for data subset selection and coresets for efficient training of deep models).


Rishabh Iyer

July 24, 2021, 10:44 a.m.

In this talk (Part I and II), we will cover the different functionalities of DECILE ( which include modules like a) SUBMODLIB, b) CORDS, c) TRUST, d) DISTIL, e) SPEAR. SUBMODLIB is a library for submodular optimization which implements a number of submodular optimization algorithms and functions (including the submodular mutual information and conditional gain functions). CORDS is a library for data subset selection and coresets for compute-efficient training of deep models. TRUST is targeted subset selection for personalization and model remediation. DISTIL is an active learning toolkit for deep models and SPEAR is a library for weak supervision via labeling functions. We will also focus on the different SOTA algorithms implemented and the benchmarks enabled through these toolkits.

In Part II, we will cover TRUST, DISTIL, and SPEAR.


Ganesh Ramakrishnan

July 24, 2021, 6:30 a.m.

When we want to discover patterns in data, we usually use the best available algorithm/software or try to improve it. In recent years we have started exploring a different approach: instead of improving the algorithm, reduce the input data and run the existing algorithm on the reduced data to obtain the desired output much faster. A core-set for a given problem is a semantic compression of its input, in the sense that a solution for the problem with the (small) core-set as input yields an approximate solution to the problem with the original (Big) data. Using existing techniques it can be computed on a streaming input that may be distributed among several machines on device, and in parallel (e.g. clouds or GPUs).

Dan Feldman

July 24, 2021, 2:15 p.m.

Dimitris Papailiopoulos

July 24, 2021, 12:30 p.m.

Filip Hanzely

July 24, 2021, 6:45 a.m.

July 24, 2021, 2:30 p.m.

Data selection methods, such as active learning and core-set selection, improve the data efficiency of machine learning by identifying the most informative data points to label or train on. Across the data selection literature, there are many ways to identify these training examples. However, classical data selection methods are prohibitively expensive to apply in deep learning because of the larger datasets and models. To make these methods tractable, we propose (1) “selection via proxy” (SVP) to avoid expensive training and reduce the computation per example and (2) “similarity search for efficient active learning and search” (SEALS) to reduce the number of examples processed. Both methods lead to order of magnitude performance improvements, making techniques like active learning on billions of unlabeled images practical for the first time.

Cody Coleman

Cody recently completed a computer science PhD at Stanford University, advised by Professors Matei Zaharia and Peter Bailis. His research focuses on democratizing machine learning by reducing the cost of producing state-of-the-art models and creating novel abstractions that simplify machine learning development and deployment. His work spans from performance benchmarking of hardware and software systems (i.e., DAWNBench and MLPerf) to computationally efficient methods for active learning and core-set selection. He completed his B.S. and M.Eng. in electrical engineering and computer science at MIT.

July 24, 2021, 11:30 a.m.

In this talk, I will present novel theoretical results to bridge the gap in theory and practice for interpretable dimensionality reduction aka feature selection. Specifically, I will show that feature selection satisfies a weaker form of submodularity. Because of this connection, for any function, one can provide constant-factor approximation guarantees that are solely dependent on the condition number of the function. Moreover, I will discuss that the "cost of interpretability" accrued because of selecting features as opposed to principal components is not as high as was previously thought to be.

Rajiv Khanna

July 24, 2021, 8 a.m.

While constraints are ubiquitous in artificial intelligence and constraints are also commonly used in machine learning and data mining, the problem of learning constraints from examples has received less attention. I will discuss the problem of constraint learning in general, and present some recent contributions in the learning of various types of constraints and models for combinatorial optimisation. This includes the learning of formulae in Excel, learning SMT formulae, MAX-SAT and Linear Programs.

Luc De Raedt

Invited Talk: Greedy and Its Friends

July 24, 2021, 8:30 a.m.

In this talk, I will introduce 3 close friends of the greedy algorithm who can maximize a general submodular function (monotone or not) subject to very general constraints. They come in different flavors and guarantees.

Amin Karbasi

Amin Karbasi is currently an assistant professor of Electrical Engineering, Computer Science, and Statistics at Yale University. He has been the recipient of the National Science Foundation (NSF) Career Award 2019, Office of Naval Research (ONR) Young Investigator Award 2019, Air Force Office of Scientific Research (AFOSR) Young Investigator Award 2018, DARPA Young Faculty Award 2016, National Academy of Engineering Grainger Award 2017, Amazon Research Award 2018, Google Faculty Research Award 2016, Microsoft Azure Research Award 2016, Simons Research Fellowship 2017, and ETH Research Fellowship 2013. His work has also been recognized with a number of paper awards, including Medical Image Computing and Computer Assisted Interventions Conference (MICCAI) 2017, International Conference on Artificial Intelligence and Statistics (AISTAT) 2015, IEEE ComSoc Data Storage 2013, International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2011, ACM SIGMETRICS 2010, and IEEE International Symposium on Information Theory (ISIT) 2010 (runner-up). His Ph.D. thesis received the Patrick Denantes Memorial Prize 2013 from the School of Computer and Communication Sciences at EPFL, Switzerland.

July 24, 2021, 7 a.m.

In recent years, there have a raising interest on learning under algorithmic triage, a new learning paradigm which seeks the development of machine learning models that operate under different automation levels---models that take decisions for a given fraction of instances and leave the remaining ones to human experts. However, the interplay between the prediction accuracy of the models and the human experts under algorithmic triage is not well understood. In this talk, we will start by formally characterizing under which circumstances a predictive model may benefit from algorithmic triage. In doing so, we will also demonstrate that models trained for full automation may be suboptimal under triage. Then, given any model and desired level of triage, we will show that the optimal triage policy is a deterministic threshold rule in which triage decisions are derived deterministically by thresholding the difference between the model and human errors on a per-instance level. Building upon these results, we will introduce a practical gradient-based algorithm that is guaranteed to find a sequence of triage policies and predictive models of increasing performance. Finally, we will use real data from two important applications, content moderation and scientific discovery, to illustrate our theoretical results and show that the models and triage policies provided by our gradient-based algorithm outperform those provided by several competitive baselines.

Manuel Gomez-Rodriguez

Manuel Gomez Rodriguez is a faculty at Max Planck Institute for Software Systems. Manuel develops human-centric machine learning models and algorithms for the analysis, modeling and control of social, information and networked systems. He has received several recognitions for his research, including an outstanding paper award at NeurIPS’13 and a best research paper honorable mention at KDD’10 and WWW’17. He has served as track chair for FAT* 2020 and as area chair for every major conference in machine learning, data mining and the Web. Manuel has co-authored over 50 publications in top-tier conferences (NeurIPS, ICML, WWW, KDD, WSDM, AAAI) and journals (PNAS, Nature Communications, JMLR, PLOS Computational Biology). Manuel holds a BS in Electrical Engineering from Carlos III University, a MS and PhD in Electrical Engineering from Stanford University, and has received postdoctoral training at the Max Planck Institute for Intelligent Systems.

July 24, 2021, 12:35 p.m.

Digital phenotyping and machine learning technologies have shown the potentials to measure objective behavioral and physiological markers, provide risk assessment for people who might have a high risk of poor mental health and wellbeing, and help better decisions or behavioral changes to support health and wellbeing. I will introduce a series of studies, algorithms, and systems we have developed for measuring, predicting, and supporting personalized health and wellbeing. I will also discuss challenges, learned lessons, and potential future directions in mental health and wellbeing research.

Akane Sano

July 24, 2021, 10:40 a.m.

Social media data is being increasingly used to computationally learn about and infer the mental health states of individuals and populations. Despite being touted as a powerful means to shape interventions and impact mental health recovery, little do we understand about the theoretical, domain, and psychometric validity of this novel information source, or its underlying biases, when appropriated to augment conventionally gathered data, such as surveys and verbal self-reports. This talk presents a critical analytic perspective on the pitfalls of social media signals of mental health, especially when they are derived from “proxy” diagnostic indicators, often removed from the real-world context in which they are likely to be used. Then, to overcome these pitfalls, this talk presents results from two case studies, where machine learning algorithms to glean mental health insights from social media were developed in a context-sensitive and human-centered way, in collaboration with domain experts and stakeholders. The first of these case studies, a collaboration with a health provider, focuses on the individual-perspective, and reveals the ability and implications of using social media data of consented schizophrenia patients to forecast relapse and support clinical decision-making. Scaling up to populations, in collaboration with a federal organization and towards influencing public health policy, the second case study seeks to forecast nationwide rates of suicide fatalities using social media signals, in conjunction with health services data. The talk concludes with discussions of the path forward, emphasizing the need for a collaborative, multi-disciplinary research agenda while realizing the potential of social media data and machine learning in mental health -- one that incorporates methodological rigor, ethics, and accountability, all at once.

Munmun De Chaudhury

July 24, 2021, 7:20 a.m.

There is growing interest in understanding the evolution of depressive symptomatology over time, the dynamics of how symptoms interact during periods of wellness and as one approaches an episode of illness. It is thought that by understanding these dynamics we can develop tools to identify early warning signs of depression before it takes hold. But this sort of research is prohibitively challenging; it requires research participants to actively log their thoughts, feelings, and emotions regularly, over months or even years to capture critical transitions into a depressed state. An alternative is to use sources of data, such as social media posts, that people produce routinely in the course of their everyday life. Recent data has shown this might be possible; depressed individuals use language differently, for example, using more first-person singular pronouns (I, me, my) and more emotional negative words (hurt, ugly, nasty). In a set of two studies, I will present research testing if we can use social media posts to detect depression, I will test how specific such findings are to depression versus other aspects of mental health and finally, if these ‘linguistic symptoms’ can be used to test core theories about the network structure of depression and how it changes during episodes of illness.

Claire Gillan

July 24, 2021, 6:30 a.m.

When outcomes are not completely certain, we have to grapple with risk. Different individuals have characteristically different attitudes to risk - something that has been extensively investigated in psychology and psychiatry, albeit largely using venerable measures that lack certain axiomatically-desirable properties. Here we consider a modern risk measure for modeling human and animal decision-making called conditional value at risk (CVaR) which is particularly apposite because of its preferential focus on worst-case outcomes. We discuss theoretical characteristics of CVaR in single and multi-step decision-making problems, relating our findings to avoidance and worry. This is joint work with Chris Gagne.

Peter Dayan

July 23, 2021, 12:42 p.m.

July 23, 2021, 12:20 p.m.

Margarita Moreno-Betancur

Margarita Moreno-Betancur is a Senior Research Fellow at the University of Melbourne and the Murdoch Children's Research Institute, supported by an Australian Research Council fellowship. She leads a research team combining methodological research in causal inference and missing data with collaborations in studies of child and adolescent health.

July 23, 2021, 10:45 a.m.

Frederick Eberhardt

July 23, 2021, 10:20 a.m.

Distribution-Free Assessment of Population Overlap in Observational Studies

Overlap in baseline covariates between treated and control groups, also known as positivity or common support, is a common assumption in observational causal inference. Assessing this assumption is often ad hoc, however, and can give misleading results. For example, the common practice of examining the empirical distribution of estimated propensity scores is heavily dependent on model specification and has poor uncertainty quantification. In this paper, we propose a formal statistical framework for assessing the extrema of the population propensity score; e.g., the propensity score lies in [0.1, 0.9] almost surely. We develop a family of upper confidence bounds, which we term O-values, for this quantity. We show these bounds are valid in finite samples so long as the observations are independent and identically distributed, without requiring any further modeling assumptions on the data generating process. We also use extensive simulations to show that these bounds are reasonably tight in practice. Finally, we demonstrate this approach using several benchmark observational studies, showing how to build our proposed method into the observational causal inference workflow.

Avi Feller

Lihua Lei

July 23, 2021, 8:27 a.m.

Optimal Dynamic Treatment Rule Estimation and Evaluation with Application to Criminal Justice Interventions in the United States

The optimal dynamic treatment rule (ODTR) framework offers an approach for understanding which kinds of individuals respond best to specific interventions. Recently, there has been a proliferation of methods for estimating the ODTR. One such method is an extension of the SuperLearner algorithm – an ensemble method to optimally combine candidate algorithms extensively used in prediction problems – to ODTRs. Following the "Causal Roadmap," in this talk we causally and statistically define the ODTR, and different parameters to evaluate it. We show how to estimate the ODTR with SuperLearner and evaluate it using cross-validated targeted maximum likelihood estimation. We apply the ODTR SuperLearner to the "Interventions" study, a randomized trial that is currently underway aimed at reducing recidivism among justice-involved adults with mental illness in the United States. Specifically, we show preliminary results for the ODTR SuperLearner applied to this data, which aims to learn for whom Cognitive Behavioral Therapy (CBT) treatment works best to reduce recidivism, instead of Treatment As Usual (TAU; psychiatric services). This is joint work with Drs. Maya Petersen, Mark van der Laan, and Jennifer Skeem.

Lina Montoya

July 23, 2021, 8 a.m.

Noa Dagan

Noam Barda

July 23, 2021, 11:01 a.m.

Kim Montgomery

Data scientist, applied mathematician, Kaggle grandmaster.

July 23, 2021, 4 p.m.

Kumar Chellapilla

July 23, 2021, 2:10 p.m.

Dawn Song

Dawn Song is a Professor in the Department of Electrical Engineering and Computer Science at UC Berkeley. Her research interest lies in deep learning, security, and blockchain. She has studied diverse security and privacy issues in computer systems and networks, including areas ranging from software security, networking security, distributed systems security, applied cryptography, blockchain and smart contracts, to the intersection of machine learning and security. She is the recipient of various awards including the MacArthur Fellowship, the Guggenheim Fellowship, the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, the MIT Technology Review TR-35 Award, the George Tallman Ladd Research Award, the Okawa Foundation Research Award, the Li Ka Shing Foundation Women in Science Distinguished Lecture Series Award, the Faculty Research Award from IBM, Google and other major tech companies, and Best Paper Awards from top conferences in Computer Security and Deep Learning. She obtained her Ph.D. degree from UC Berkeley. Prior to joining UC Berkeley as a faculty, she was a faculty at Carnegie Mellon University from 2002 to 2007.

July 23, 2021, 3:20 p.m.

Alex Ratner

July 23, 2021, 11 a.m.

July 23, 2021, 10:20 a.m.

Speaker email:

July 23, 2021, 9:50 a.m.

The number of applications relying on inference from Machine Learning (ML) models is already large and expected to keep growing. Facebook, for instance, serves tens-of-trillions of inference queries per day. Distributed inference dominates ML production costs: on AWS, it accounts for over 90% of ML infrastructure cost. Despite existing work in machine learning inference serving, ease-of-use and cost efficiency remain challenges at large scales. Developers must manually search through thousands of model-variants—versions of already-trained models that differ in hardware, resource footprints, latencies, costs, and accuracies—to meet the diverse application requirements. Since requirements, query load, and applications themselves evolve over time, these decisions need to be made dynamically for each inference query to avoid excessive costs through naive autoscaling. To avoid navigating through the large and complex trade-off space of model-variants, developers often fix a variant across queries, and replicate it when load increases. However, given the diversity across variants and hardware platforms in the cloud, a lack of understanding of the trade-off space can incur significant costs to developers.

In this talk, I will primarily focus on INFaaS, an automated model-less system for distributed inference serving, where developers simply specify the performance and accuracy requirements for their applications without needing to specify a specific model-variant for each query. INFaaS generates model-variants from already trained models, and efficiently navigates the large trade-off space of model-variants on behalf of developers to meet application-specific objectives: (a) for each query, it selects a model, hardware architecture, and model optimizations, (b) it combines VM-level horizontal autoscaling with model-level autoscaling, where multiple, different model-variants are used to serve queries within each machine. By leveraging diverse variants and sharing hardware resources across models, INFaaS achieves significant improvement in performance (throughput and latency of model serving) while saving costs compared to existing inference serving systems. I will conclude the talk with a brief discussion on future directions.

Neeraja J Yadwadkar

July 23, 2021, 9:10 a.m.

Speaker: Shalmali Joshi, Postdoctoral Fellow at the Center for Research on Computation on Society, Harvard University (SEAS)

Shalmali Joshi

July 23, 2021, 4:40 a.m.

The automated design of chips is facing growing challenges due to a high volume of smartphones, the increasing functionality, and the corresponding heterogeneity of the chips. In this talk, I will survey how machine learning has recently emerged as a core technique that promises to rescue the reducing gains in power performance and area in this field. In particular, I will focus on the challenges in deploying learning algorithms in electronic design automation and outline the solution that we take at Qualcomm which combines machine learning with combinatorial optimization solvers.

Speaker: Roberto Bondesan, Qualcomm

Roberto Bondesan

July 23, 2021, 2:50 a.m.

Can machine learning help to reduce costs and facilitate access to justice within the legal system? We first consider constitutional constraints to which deployment of ML in the legal system are subject; in particular, the protection of fundamental rights and the need to give reasons in legal decisions. We then turn to the technical state of the art with ML analysis of caselaw decisions. It is possible to predict outcomes of cases, given a set of facts, with a high degree of accuracy, but the explainability of these methods is limited. The research frontier therefore explores ways to provide legal reasons for case predictions.

Speaker: John Armour is Professor of Law and Finance at Oxford University and a Fellow of the British Academy and the European Corporate Governance Institute.

John Armour

John Armour is Professor of Law and Finance at Oxford University and a Fellow of the British Academy and the European Corporate Governance Institute. He serves as an Executive Editor of the Journal of Corporate Law Studies and the Journal of Law, Finance and Accounting, and has been involved in policy-related projects commissioned by the UK’s Department of Trade and Industry (now BEIS), Financial Services Authority (now FCA) and Insolvency Service, the Commonwealth Secretariat, and the World Bank. He served as a member of the European Commission’s Informal Company Law Expert Group from 2014-19.

July 23, 2021, 2:10 a.m.

Speaker: Engineer Bainomugisha

Bio: I am an Associate Professor of Computer Science and the Chair of the Department of Computer Science at Makerere University. My research focuses on Computer Science-driven solutions to the prevailing world challenges. I am also passionate about contributing to quality Computer Science education that is of sufficient breath and depth, practical and fast enough. Currently, I lead several innovative and research initiatives that aim to create and apply computational methods and tools that can improve the quality of life especially in the developing world setting.

Engineer Bainomugisha

Engineer Bainomugisha is an Associate Professor and Chair, Computer Science at Makerere University. Engineer leads AirQo research team ( that develops and deploys a network of low-cost air quality monitoring and use machine learning for modelling and analysis. Engineer is also co-founder of Sunbird AI (, an African technology initiative to create open, practical AI systems for community benefit.

July 24, 2021, 10:45 a.m.

Todd Coleman

July 24, 2021, 2:30 p.m.

Invited Talk: Invited Talk: David Tse

July 24, 2021, 5:15 p.m.

David Tse

July 24, 2021, 8:45 a.m.

Alexandros Dimakis

Alex Dimakis is an Associate Professor at the Electrical and Computer Engineering department, University of Texas at Austin. He received his Ph.D. in electrical engineering and computer sciences from UC Berkeley.

He received an ARO young investigator award in 2014, the NSF Career award in 2011, a Google faculty research award in 2012 and the Eli Jury dissertation award in 2008. He is the co-recipient of several best paper awards including the joint Information Theory and Communications Society Best Paper Award in 2012. His research interests include information theory, coding theory and machine learning.

July 24, 2021, 1:45 p.m.

July 24, 2021, 4 p.m.

Lalitha Sankar

July 23, 2021, 2 p.m.

Su-In Lee

July 23, 2021, 12:30 p.m.

Alan L Yuille

July 23, 2021, 6:30 a.m.

Mihaela van der Schaar

Professor van der Schaar is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge, a Turing Faculty Fellow at The Alan Turing Institute in London, and Chancellor's Professor at UCLA. She was elected IEEE Fellow in 2009. She has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), an NSF Career Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. She holds 35 granted USA patents. In 2019, she was identified by National Endowment for Science, Technology and the Arts as the female researcher based in the UK with the most publications in the field of AI. She was also elected as a 2019 "Star in Computer Networking and Communications". Her current research focus is on machine learning, AI and operations research for healthcare and medicine. For more details, see her website:

July 24, 2021, 11 a.m.

In this talk, we introduce several features of Summary Analytics, a new company that makes summarization-like abilities accessible to the masses, and scalable to petabyte-size datasets. We demonstrate wide applicability to different data modalities via a number of case studies. We demonstrate scalability by showing that it is possible to summarize 100 million records, each with 1700 features, down to 1000 records in 60 seconds using one CPU thread, and 14 seconds using four CPU threads, on a 2019-era laptop.

Jeff Bilmes

July 24, 2021, 2 p.m.

Large datasets have been crucial to the success of modern machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, in many real-world applications such as medical diagnosis, self-driving cars, and fraud detection, big data contains highly imbalanced classes and noisy labels. In such cases, training on the entire data does not result in a high-quality model.

In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the representative subsets for learning from massive datasets. Training on representative subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness against noisy labels. I will discuss how we can develop theoretically rigorous techniques that provide strong guarantees for the quality of the extracted subsets, as well as the learned models’ quality and robustness against noisy labels. I will also show the effectiveness of such methods in practice for data-efficient and robust learning.

Baharan Mirzasoleiman

July 24, 2021, 8:20 a.m.

Anxiety is associated with elevated self-report of aversion to uncertainty and ambiguity. However there has been relatively little attempt to characterize the underlying mechanisms. Over recent years, computational modelling has been used to advance our understanding of human decision-making and the brain mechanisms that support it. This approach can help us to formalize and understand how choice behaviours can be optimally adapted to different situations and the ways in which individuals may deviate from optimal behaviour.
In everyday life, our decision-making often takes place under some form of uncertainty. We can distinguish ‘first-order’ uncertainty which occurs when a given action only leads to a given outcome on a proportion of occasions from ‘second-order’ uncertainty, which describes uncertainty regarding the action-outcome contingency itself. Two sources of second-order uncertainty are contingency volatility and contingency ambiguity. In experiment 1, we manipulated contingency volatility and revealed that elevated trait anxiety is linked to a deficit in adjusting probabilistic learning to changes in volatility and also to reduced peripheral (pupil dilation) responses to volatility. In experiment 2, through bifactor modelling of Internalizing symptoms and hierarchical modelling of task performance, we determined that this difficulty in optimizing probabilistic learning under volatility is common to both anxiety and depression. In experiment 3, we investigated another source of second order uncertainty. Here, we manipulated the level of ambiguity – or missing information – present on each trial. High trait anxious individuals showed elevated ambiguity aversion, being especially sensitive to increases in the amount of missing information when choosing between two options. Analysis of fMRI data revealed that participants show elevated activity in the dorsal anterior cingulate and inferior frontal sulcus at time of choice on trials with high missing information when they subsequently engaged with versus avoided the ambiguous option; this pattern was strongest in high trait anxious individuals. One possibility is that these frontal regions support rational evaluation of alternate actions as opposed to simple heuristic-based avoidance of options characterized by high second-order uncertainty.

Sonia J Bishop

July 23, 2021, 6:46 a.m.

Ellen Vitercik

Ellen Vitercik is a PhD student in computer science at Carnegie Mellon University. Her primary research interests are artificial intelligence, machine learning, theoretical computer science, and computational economics. Her honors include a National Science Foundation Graduate Research Fellowship and a Microsoft Research Women's Fellowship.

July 24, 2021, 8 a.m.

Maxim Raginsky

July 24, 2021, 10 a.m.

July 23, 2021, 8:30 a.m.

Abhijit Guha Roy

Jim Winkens