### Invited Talk: "Latent Space Optimization with Deep Generative Models"

#### Jose Miguel Hernandez-Lobato

### Invited talk: From random matrices to kernel quadrature: how repulsiveness can speed up Monte Carlo integration

Eigenvalues of many models of random matrices tend to repell each other. They sometimes repell so much that sample averages over these eigenvalues converge much faster than if the eigenvalues were i.i.d. I will first show how to transform this kind of result into a generic importance sampler with mean square error decreasing as $N^{-1-1/d}$, where $N$ is the number of integrand evaluations, and $d$ is the ambient dimension. This result crucially depends on a repulsive point process for the integration nodes, called an orthogonal polynomial ensemble, itself a particular case of determinantal point process (DPP). With more assumptions on the integrand and more general DPPs, I will then show how to obtain faster Monte Carlo rates. Further generalizing to mixtures of DPPs, I will finally show how to obtain tight integration rates for integrands in a large class of reproducing kernel Hilbert spaces. This last result involves a continuous equivalent to volume sampling, a discrete point process of recent interest in numerical linear algebra. This talk is intended to connect to a few other talks of the day, and is based on the following papers: https://arxiv.org/abs/1605.00361 https://arxiv.org/abs/1906.07832 https://arxiv.org/abs/2002.09677

#### Rémi Bardenet

### Invited talk: Scaling DPP MAP Inference

DPP MAP inference, the problem of finding the highest probability set under the distribution defined by a DPP, is of practical importance for a variety of areas such as recommender systems, active learning, and data compression. Unfortunately, finding the exact MAP solution is NP-hard. Often though, the standard greedy submodular maximization algorithm works well in practice for approximating the solution. In this talk, we discuss ways to speed up this simple greedy algorithm, as well as slower, but more accurate alternatives to it. We also discuss how to scale greedy for customized DPPs, where we want to solve the MAP problem multiple times with different weightings of item features. We conclude with a brief note on the complexity of MAP for nonsymmetric DPPs, where we show that greedy scales fairly well if we assume a particular kernel decomposition.

#### Anca Dragan

#### Pietro Perona

### Invited talk: Negative Dependence and Sampling

Probability distributions with strong notions of negative dependence arise in various forms in machine learning. Examples include diversity-inducing probabilistic models, interpretability, exploration and active learning, and randomized algorithms. While, perhaps surprisingly, being more delicate than its positive counterpart, negative dependence enjoys rich mathematical connections and properties that offer a promising toolbox for machine learning. In this talk, I will summarize some recently important notions of negative dependence, and their implications for sampling algorithms. These results exploit connections to the geometry of polynomials, log concavity, and submodular optimization. We will conclude with an example application of sampling minibatches for optimization.

#### Stefanie Jegelka

### Invited Talk: Invited Talk 4: Debora Marks

Presentation of the invited talk by Debora (Live only)

#### Workshop CompBio

Machine learning has found increasing use in the real world, and yet a framework for productionizing machine learning algorithms is lacking. This talk discusses how companies can bridge the gap between research and production in machine learning. It starts with the key differences between the research and production environments: data, goals, compute requirements, and evaluation metrics. It also breaks down the different phases of a machine learning production cycle, the infrastructure currently available for the process, and the industry best practices.

Live presentation

A major challenge in deploying machine learning algorithms for decision-making problems is the lack of guarantee for the performance of their resulting policies, especially those generated during the initial exploratory phase of these algorithms. Online decision-making algorithms, such as those in bandits and reinforcement learning (RL), learn a policy while interacting with the real system. Although these algorithms will eventually learn a good or an optimal policy, there is no guarantee for the performance of their intermediate policies, especially at the very beginning, when they perform a large amount of exploration. Thus, in order to increase their applicability, it is important to control their exploration and to make it more conservative.

To address this issue, we define a notion of safety that we refer to as safety w.r.t. a baseline. In this definition, a policy considered to be safe if it performs at least as well as a baseline, which is usually the current strategy of the company. We formulate this notion of safety in bandits and RL and show how it can be integrated into these algorithms as a constraint that must be satisfied uniformly in time. We derive contextual linear bandits and RL algorithms that minimize their regret, while ensure that at any given time, their expected sum of rewards remains above a fixed percentage of the expected sum of rewards of the baseline policy. This fixed percentage depends on the amount of risk that the manager of the system is willing to take. We prove regret bounds for our algorithms and show that the cost of satisfying the constraint (conservative exploration) can be controlled. Finally, we report experimental results to validate our theoretical analysis. We conclude the talk by discussing a few other constrained bandit formulations.

#### Mohammad Ghavamzadeh

Successful deployment of ML models tends to result from a good fit of the technology and the context. In this talk I will focus on the African context which is synonymous with developing context but I want to argue there is a difference. I will expound on the opportunities and challenges that this unique context provides and the assumptions made in deploying in such a context and how well they fit. Another angle of the talk will be on deployment with a view to influence societal good which may be different from deployment in a production system. I will also draw insights from some projects I have been engaged in towards this end.

Live presentation

#### Ernest Mwebaze

### Invited Talk: Doing Some Good with Machine Learning

### Lester Mackey

This is the story of my assorted attempts to do some good with machine learning. Through its telling, I’ll highlight several models of organizing social good efforts, describe half a dozen social good problems that would benefit from our community's attention, and present both resources and challenges for those looking to do some good with ML.

#### Panelists

Ricard Gavalda |
Carla Gomes |
Rashida Richardson |

##### Speaker Bio

Lester Mackey is a machine learning researcher at Microsoft Research, where he develops new tools, models, and theory for large-scale learning tasks driven by applications from healthcare, climate, recommender systems, and the social good. Lester moved to Microsoft from Stanford University, where he was an assistant professor of Statistics and (by courtesy) of Computer Science. He earned his PhD in Computer Science and MA in Statistics from UC Berkeley and his BSE in Computer Science from Princeton University. He co-organized the second place team in the $1M Netflix Prize competition for collaborative filtering, won the $50K Prize4Life ALS disease progression prediction challenge, won prizes for temperature and precipitation forecasting in the yearlong real-time $800K Subseasonal Climate Forecast Rodeo, and received a best student paper award at the International Conference on Machine Learning.

#### Lester Mackey

Lester Mackey is a machine learning researcher at Microsoft Research, where he develops new tools, models, and theory for large-scale learning tasks driven by applications from healthcare, climate, recommender systems, and the social good. Lester moved to Microsoft from Stanford University, where he was an assistant professor of Statistics and (by courtesy) of Computer Science. He earned his PhD in Computer Science and MA in Statistics from UC Berkeley and his BSE in Computer Science from Princeton University. He co-organized the second place team in the \$1M. Netflix Prize competition for collaborative filtering, won the \$50K Prise4Life ALS disease progression prediction challenge, won prizes for temperature and precipitation forecasting in the yearlong real-time \$800K Subseasonal Climate Forecast Rodeo, and received a best student paper award at the International Conference on Machine Learning.

### Invited talk: Exponentially Faster Algorithms for Machine Learning

In this talk I’ll describe a novel approach that yields algorithms whose parallel running time is exponentially faster than any algorithm previously known for a broad range of machine learning applications. The algorithms are designed for submodular function maximization which is the algorithmic engine behind applications such as clustering, network analysis, feature selection, Bayesian inference, ranking, speech and document summarization, recommendation systems, hyperparameter tuning, and many others. Since applications of submodular functions are ubiquitous across machine learning and data sets become larger, there is consistent demand for accelerating submodular optimization. The approach we describe yields simple algorithms whose parallel runtime is logarithmic in the size of the data rather than linear. I’ll introduce the frameworks we recently developed and present experimental results from various application domains.

#### Yaron Singer

Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness, most notably random sampling and random projection methods, to develop improved algorithms for ubiquitous matrix problems, such as those that arise in scientific computing, data science, machine learning, etc. A seemingly different topic, but one which has a long history in pure and applied mathematics, is that of Determinantal Point Processes (DPPs), which are stochastic point processes, the probability distribution of which is characterized by sub-determinants of some matrix. Recent work has uncovered deep and fruitful connections between DPPs and RandNLA. For example, random sampling with a DPP leads to new kinds of unbiased estimators for classical RandNLA tasks, enabling more refined statistical and inferential understanding of RandNLA algorithms; a DPP is, in some sense, an optimal randomized method for many RandNLA problems; and a standard RandNLA technique, called leverage score sampling, can be derived as the marginal distribution of a DPP. This work will be reviewed, as will recent algorithmic developments, illustrating that, while not quite as efficient as simply applying a random projection, these DPP-based algorithms are only moderately more expensive.

#### Michael Mahoney

### Invited talk: Searching for Diverse Biological Sequences

A central challenge in biotechnology is to be able to predict functional properties of a protein from its sequence, and thus (i) discover new proteins with specific functionality and (ii) better understand the functional effect of genomic mutations. Experimental breakthroughs in our ability to read and write DNA allows data on the relationship between sequence and function to be rapidly acquired. This data can be used to train and validate machine learning models that predict protein function from sequence. However, the cost and latency of wet-lab experiments requires methods that find good sequences in few experimental rounds, where each round contains large batches of sequence designs. In this setting, model-based optimization allows us to take advantage of sample inefficient methods to find diverse optimal sequence candidates to be tested in the wet-lab. These requirements are illustrated by a collaboration that involves the design and experimental validation of AAV capsid protein variants that assemble integral capsids and package their genome, for use in gene therapy applications.

#### Lucy Colwell

### Invited Talk: Beyond Being Accurate: Solving Real-World Recommendation Problems with Neural Modeling

#### Ed Chi

Ed H. Chi is a Principal Scientist at Google, leading several machine learning research teams focusing on neural modeling, inclusive ML, reinforcement learning, and recommendation systems in Google Brain team. He has delivered significant improvements for YouTube, News, Ads, Google Play Store at Google with >230 product launches in the last 6 years. With 39 patents and over 120 research articles, he is also known for research on user behavior in web and social media.

Prior to Google, he was the Area Manager and a Principal Scientist at Palo Alto Research Center's Augmented Social Cognition Group, where he led the team in understanding how social systems help groups of people to remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota. Recognized as an ACM Distinguished Scientist and elected into the CHI Academy, he recently received a 20-year Test of Time award for research in information visualization. He has been featured and quoted in the press, including the Economist, Time Magazine, LA Times, and the Associated Press. An avid swimmer, photographer and snowboarder in his spare time, he also has a blackbelt in Taekwondo.

### Invited Talk: The Unsung Heroes of Music Recommendation: an Essay

#### Matthias Mauch

### Invited Talk: Human and Machine Learning for Assistive Autonomy

### Brenna Argall

As need increases, access decreases. It is a paradox that as human motor impairments become more severe, and increasing assistance needs are paired with decreasing motor abilities, the very machines created to provide this assistance become less and less accessible to operate with independence. My lab addresses this paradox by incorporating robotics autonomy and intelligence into physically-assistive machines: leveraging robotics autonomy, to advance human autonomy. Achieving the correct allocation of control between the human and the autonomy is essential, and critical for adoption. The allocation must be responsive to individual abilities and preferences, that moreover can be changing over time, and robust to human-machine information flow that is filtered and masked by motor impairment and control interface. As we see time and again in our work and within the field: customization and adaptation are key, and so the opportunities for machine learning are clear. However, the manner of its implementation is not. In this talk, I will discuss the needs of and need for machine learning within the domain of assistive machines that bridge gaps in human function, and overview ongoing efforts within my lab that aim to tackle adaptation and learning in its many forms.

#### Panelists

Aude Billard |
Emma Brunskill |
Finale Doshi-Velez |

##### Speaker Bio

Brenna Argall is an associate professor of Mechanical Engineering, Computer Science, and Physical Medicine & Rehabilitation at Northwestern University. She is director of the assistive & rehabilitation robotics laboratory (argallab) at the Shirley Ryan AbilityLab (formerly the Rehabilitation Institute of Chicago), the #1 ranked rehabilitation hospital in the United States. The mission of the argallab is to advance human ability by leveraging robotics autonomy. Argall is a 2016 recipient of the NSF CAREER award, and was named one of the 40 under 40 by Crain’s Chicago Business. Her Ph.D. in Robotics (2009) was received from the Robotics Institute at Carnegie Mellon University, as well as her B.S. in Mathematics (2002). Prior to joining Northwestern and RIC, she was a postdoctoral fellow (2009-2011) at the École Polytechnique Fédérale de Lausanne (EPFL), and prior to graduate school (2002-2004) she held a Computational Biology position at the National Institutes of Health (NIH). More recently, she was a visiting fellow at the Wyss Center for Bio and Neuroengineering in Geneva, Switzerland (2019).

#### Brenna Argall

Brenna Argall is an associate professor of Mechanical Engineering, Computer Science, and Physical Medicine & Rehabilitation at Northwestern University. She is director of the assistive & rehabilitation robotics laboratory (argallab) at the Shirley Ryan AbilityLab (formerly the Rehabilitation Institute of Chicago), the #1 ranked rehabilitation hospital in the United States. The mission of the argallab is to advance human ability by leveraging robotics autonomy. Argall is a 2016 recipient of the NSF CAREER award, and was named one of the 40 under 40 by Crain’s Chicago Business. Her Ph.D. in Robotics (2009) was received from the Robotics Institute at Carnegie Mellon University, as well as her B.S. in Mathematics (2002). Prior to joining Northwestern and RIC, she was a postdoctoral fellow (2009-2011) at the École Polytechnique Fédérale de Lausanne (EPFL), and prior to graduate school (2002-2004) she held a Computational Biology position at the National Institutes of Health (NIH). More recently, she was a visiting fellow at the Wyss Center for Bio and Neuroengineering in Geneva, Switzerland (2019).

We explore the art of identifying and verifying assumptions as we build and deploy data science algorithms into production systems. These assumptions can take many forms, from the typical “have we properly specified the objective function?” to the much thornier “does my partner in engineering understand what data I need audited?”. Attendees from outside industry will get a glimpse of the complications that arise when we fail to tend to assumptions in deploying data science in production systems; those on the inside will walk away with some practical tools to increase the chances of successful deployment from day one.

#### Nevena Lalic

### Invited talk: System-wide Monitoring Architectures with Explanations

I present a new architecture for detecting and explaining complex system failures. My contribution is a system-wide monitoring architecture, which is composed of introspective, overlapping committees of subsystems. Each subsystem is encapsulated in a "reasonableness" monitor, an adaptable framework that supplements local decisions with commonsense data and reasonableness rules. This framework is dynamic and introspective: it allows each subsystem to defend its decisions in different contexts--to the committees it participates in and to itself.

For reconciling system-wide errors, I developed a comprehensive architecture that I call "Anomaly Detection through Explanations" (ADE). The ADE architecture contributes an explanation synthesizer that produces an argument tree, which in turn can be traced and queried to determine the support of a decision, and to construct counterfactual explanations. I have applied this methodology to detect incorrect labels in semi-autonomous vehicle data, and to reconcile inconsistencies in simulated anomalous driving scenarios.

In conclusion, I discuss the difficulties in /evaluating/ these types of monitoring systems. I argue that meaningful evaluation tasks should be dynamic: designing collaborative tasks (between a human and machine) that require /explanations/ for success.

#### Leilani Gilpin

#### Tom Rainforth

#### Aaditya Ramdas

Aaditya Ramdas is an assistant professor in the Departments of Statistics and Machine Learning at Carnegie Mellon University.

These days, he has 3 major directions of research: 1. selective and simultaneous inference (interactive, structured, post-hoc control of false discovery/coverage rate,…), 2. sequential uncertainty quantification (confidence sequences, always-valid p-values, bias in bandits,…), and 3. assumption-free black-box predictive inference (conformal prediction, calibration,…).

### Invited Talk: Invited Talk 3: Thomas Fuchs

Thomas Fuchs: Clinical-grade Artificial Intelligence: Hype and Hope for Cancer Care (Live only)

#### Workshop CompBio

### Invited talk: Diversity in reinforcement learning

Reinforcement learning has seen major success in games and other artificial environments, but its applications in industries and real life are still limited. This limited applicability is partly due to the requirement of the large amount of the training data that needs to be collected through trial and error as well as the difficulty in effectively dealing with multiple or many agents. Diversity and negative dependence are a promising approach to resolve some of the major challenges in today’s reinforcement learning and have gained increasing attention in recent years. In this talk, we will briefly review some of the approaches to introducing diversity in reinforcement learning with a focus on the use of determinantal point processes for effective multi-agent reinforcement learning.

#### Takayuki Osogami

### Invited Talk: Quantum Machine Learning : Prospects and Challenges

### Iordanis Kerenidis

We will review recent work on Quantum Machine Learning and discuss the prospects and challenges of applying this new exciting computing paradigm to machine learning applications.

#### Panelists

Julia Kempe |
Krysta Svore |
Ronald de Wolf |

##### Speaker Bio

Iordanis Kerenidis (CNRS and QC Ware) received his Ph.D. from the Computer Science Department at the University of California, Berkeley, in 2004. After a two-year postdoctoral position at the Massachusetts Institute of Technology, he joined the Centre National de Recherche Scientifique in Paris as a permanent researcher. He has been the coordinator of a number of EU-funded projects including an ERC Grant, and he is the founder and director of the Paris Centre for Quantum Computing. His research is focused on quantum algorithms for machine learning and optimization, including work on recommendation systems, classification and clustering. He is currently working as the Head of Quantum Algorithms Int. at QC Ware Corp.

#### Iordanis Kerenidis

Iordanis Kerenidis (CNRS and QC Ware) received his Ph.D. from the Computer Science Department at the University of California, Berkeley, in 2004. After a two-year postdoctoral position at the Massachusetts Institute of Technology, he joined the Centre National de Recherche Scientifique in Paris as a permanent researcher. He has been the coordinator of a number of EU-funded projects including an ERC Grant, and he is the founder and director of the Paris Centre for Quantum Computing. His research is focused on quantum algorithms for machine learning and optimization, including work on recommendation systems, classification and clustering. He is currently working as the Head of Quantum Algorithms Int. at QC Ware Corp.

### Invited Talk: Invited Talk 1: Fabian Theis

Latent space learning and data integration in single-cell genomics