Workshop
Negative Dependence and Submodularity: Theory and Applications in Machine Learning
Zelda Mariet · Michal Derezinski · Mike Gartrell
Keywords: Recommender Systems submodularity negative dependence subset selection experimental design determinantal point processes
Models of negative dependence and submodularity are increasingly important in machine learning. Whether selecting training data, finding an optimal experimental design, exploring in reinforcement learning and Bayesian optimization, or designing recommender systems, selecting highquality yet diverse items has become a core challenge. This workshop aims to bring together researchers who, using theoretical or applied techniques, leverage negative dependence and submodularity in their work. Expanding upon last year's workshop, we will highlight recent developments in the rich mathematical theory of negative dependence, cover novel critical applications, and discuss the most promising directions for future research.
Schedule
Sat 5:00 a.m.  5:15 a.m.

Opening remarks
(
Talk
)

Zelda Mariet · Mike Gartrell · Michal Derezinski 🔗 
Sat 5:15 a.m.  5:45 a.m.

Diversity in reinforcement learning
(
Invited talk
)
link
SlidesLive Video Reinforcement learning has seen major success in games and other artificial environments, but its applications in industries and real life are still limited. This limited applicability is partly due to the requirement of the large amount of the training data that needs to be collected through trial and error as well as the difficulty in effectively dealing with multiple or many agents. Diversity and negative dependence are a promising approach to resolve some of the major challenges in today’s reinforcement learning and have gained increasing attention in recent years. In this talk, we will briefly review some of the approaches to introducing diversity in reinforcement learning with a focus on the use of determinantal point processes for effective multiagent reinforcement learning. 
Takayuki Osogami 🔗 
Sat 5:45 a.m.  5:50 a.m.

Diversity in reinforcement learning
(
Q&A
)

🔗 
Sat 5:50 a.m.  6:20 a.m.

From random matrices to kernel quadrature: how repulsiveness can speed up Monte Carlo integration
(
Invited talk
)
link
SlidesLive Video
Eigenvalues of many models of random matrices tend to repell each other. They sometimes repell so much that sample averages over these eigenvalues converge much faster than if the eigenvalues were i.i.d. I will first show how to transform this kind of result into a generic importance sampler with mean square error decreasing as $N^{11/d}$, where $N$ is the number of integrand evaluations, and $d$ is the ambient dimension. This result crucially depends on a repulsive point process for the integration nodes, called an orthogonal polynomial ensemble, itself a particular case of determinantal point process (DPP). With more assumptions on the integrand and more general DPPs, I will then show how to obtain faster Monte Carlo rates. Further generalizing to mixtures of DPPs, I will finally show how to obtain tight integration rates for integrands in a large class of reproducing kernel Hilbert spaces. This last result involves a continuous equivalent to volume sampling, a discrete point process of recent interest in numerical linear algebra. This talk is intended to connect to a few other talks of the day, and is based on the following papers:
https://arxiv.org/abs/1605.00361
https://arxiv.org/abs/1906.07832
https://arxiv.org/abs/2002.09677

Rémi Bardenet 🔗 
Sat 6:20 a.m.  6:25 a.m.

From random matrices to kernel quadrature: how repulsiveness can speed up Monte Carlo integration
(
Q&A
)

🔗 
Sat 6:25 a.m.  6:40 a.m.

Ensemble Kernel Methods, Implicit Regularization and Determinantal Point Processes
(
Contributed talk
)
By using the framework of Determinantal Point Processes (DPPs), some theoretical results concerning the interplay between diversity and regularization can be obtained. In this paper we show that sampling subsets with kDPPs results in implicit regularization in the context of ridgeless Kernel Regression. Furthermore, we leverage the common setup of stateoftheart DPP algorithms to sample multiple small subsets and use them in an ensemble of ridgeless regressions. Our first empirical results indicate that ensemble of ridgeless regressors can be interesting to use for datasets including redundant information. 
Joachim Schreurs · Michaël Fanuel · Johan Suykens 🔗 
Sat 7:10 a.m.  7:40 a.m.

Scaling DPP MAP Inference
(
Invited talk
)
link
SlidesLive Video DPP MAP inference, the problem of finding the highest probability set under the distribution defined by a DPP, is of practical importance for a variety of areas such as recommender systems, active learning, and data compression. Unfortunately, finding the exact MAP solution is NPhard. Often though, the standard greedy submodular maximization algorithm works well in practice for approximating the solution. In this talk, we discuss ways to speed up this simple greedy algorithm, as well as slower, but more accurate alternatives to it. We also discuss how to scale greedy for customized DPPs, where we want to solve the MAP problem multiple times with different weightings of item features. We conclude with a brief note on the complexity of MAP for nonsymmetric DPPs, where we show that greedy scales fairly well if we assume a particular kernel decomposition. 
Jennifer Gillenwater 🔗 
Sat 7:40 a.m.  7:45 a.m.

Scaling DPP MAP Inference Q&A
(
Q&A
)

🔗 
Sat 7:45 a.m.  8:15 a.m.

Negative Dependence and Sampling
(
Invited talk
)
link
SlidesLive Video Probability distributions with strong notions of negative dependence arise in various forms in machine learning. Examples include diversityinducing probabilistic models, interpretability, exploration and active learning, and randomized algorithms. While, perhaps surprisingly, being more delicate than its positive counterpart, negative dependence enjoys rich mathematical connections and properties that offer a promising toolbox for machine learning. In this talk, I will summarize some recently important notions of negative dependence, and their implications for sampling algorithms. These results exploit connections to the geometry of polynomials, log concavity, and submodular optimization. We will conclude with an example application of sampling minibatches for optimization. 
Stefanie Jegelka 🔗 
Sat 8:15 a.m.  8:20 a.m.

Negative Dependence and Sampling
(
Q&A
)

🔗 
Sat 8:20 a.m.  8:35 a.m.

Mode Finding for SLC Distributions via Regularized Submodular Maximization
(
Contributed talk
)
In this paper, we propose scalable methods for maximizing a regularized submodular function $f = g \ell$ expressed as the difference between a monotone submodular function $g$ and a modular function $\ell$. Indeed, submodularity is inherently related to the notions of diversity, coverage, and representativeness. In particular, finding the mode (i.e., the most likely configuration) of many popular probabilistic models of diversity, such as determinantal point processes, submodular probabilistic models, and strongly logconcave distributions, involves maximization of (regularized) submodular functions. Since a regularized function$ f$ can potentially take on negative values, the classic theory of submodular maximization, which heavily relies on the nonnegativity assumption of submodular functions, may not be applicable. To circumvent this challenge, we develop the first streaming and distributed algorithm for maximizing regularized submodular functions. Furthermore, we show that how can we find the mode of a strongly logconcave (SLC) distribution by regularized submodular maximization.

Ehsan Kazemi · Amin Karbasi · Moran Feldman 🔗 
Sat 9:30 a.m.  10:15 a.m.

Poster session
(
click for zoom links
)
Submodular maximization[ protected link dropped ]
Determinantal point processes[ protected link dropped ]
Negative dependence for inference and bipartite matching[ protected link dropped ]

🔗 
Sat 10:15 a.m.  10:45 a.m.

Exponentially Faster Algorithms for Machine Learning
(
Invited talk
)
In this talk I’ll describe a novel approach that yields algorithms whose parallel running time is exponentially faster than any algorithm previously known for a broad range of machine learning applications. The algorithms are designed for submodular function maximization which is the algorithmic engine behind applications such as clustering, network analysis, feature selection, Bayesian inference, ranking, speech and document summarization, recommendation systems, hyperparameter tuning, and many others. Since applications of submodular functions are ubiquitous across machine learning and data sets become larger, there is consistent demand for accelerating submodular optimization. The approach we describe yields simple algorithms whose parallel runtime is logarithmic in the size of the data rather than linear. I’ll introduce the frameworks we recently developed and present experimental results from various application domains. 
Yaron Singer 🔗 
Sat 10:45 a.m.  10:50 a.m.

Exponentially Faster Algorithms for Machine Learning
(
Q&A
)

🔗 
Sat 10:50 a.m.  11:20 a.m.

Searching for Diverse Biological Sequences
(
Invited talk
)
link
SlidesLive Video A central challenge in biotechnology is to be able to predict functional properties of a protein from its sequence, and thus (i) discover new proteins with specific functionality and (ii) better understand the functional effect of genomic mutations. Experimental breakthroughs in our ability to read and write DNA allows data on the relationship between sequence and function to be rapidly acquired. This data can be used to train and validate machine learning models that predict protein function from sequence. However, the cost and latency of wetlab experiments requires methods that find good sequences in few experimental rounds, where each round contains large batches of sequence designs. In this setting, modelbased optimization allows us to take advantage of sample inefficient methods to find diverse optimal sequence candidates to be tested in the wetlab. These requirements are illustrated by a collaboration that involves the design and experimental validation of AAV capsid protein variants that assemble integral capsids and package their genome, for use in gene therapy applications. 
Lucy Colwell 🔗 
Sat 11:20 a.m.  11:25 a.m.

Searching for Diverse Biological Sequences
(
Q&A
)

🔗 
Sat 11:25 a.m.  11:40 a.m.

Constrained Maximization of Lattice Submodular Functions
(
Contributed talk
)
Submodular optimization over the integer lattice has many applications in machine learning. Although the constrained maximization of submodular functions with coordinatewise concavity (also called {\em DRSubmodular} functions) is well studied, the maximization of {\em general} lattice submodular functions is considerably more challenging. In this work, we first show that we can optimize lattice submodular functions subject to a discrete (integer) polymatroid constraint using a recently proposed extension, called the Generalized Multilinear Extension. Then, we establish a bound on the rounding error for the discrete polymatroid constraint, which depends on the “distance” between the lattice submodular function to a DRSubmodular function. Lastly, we demonstrate the effectiveness of our algorithm on a Bayesian experimental design problem with repetition and a concave cost. 
Aytunc Sahin · Joachim Buhmann · Andreas Krause 🔗 
Sat 12:10 p.m.  12:40 p.m.

Determinantal Point Processes in Randomized Numerical Linear Algebra
(
Invited talk
)
link
SlidesLive Video Randomized Numerical Linear Algebra (RandNLA) is an area which uses randomness, most notably random sampling and random projection methods, to develop improved algorithms for ubiquitous matrix problems, such as those that arise in scientific computing, data science, machine learning, etc. A seemingly different topic, but one which has a long history in pure and applied mathematics, is that of Determinantal Point Processes (DPPs), which are stochastic point processes, the probability distribution of which is characterized by subdeterminants of some matrix. Recent work has uncovered deep and fruitful connections between DPPs and RandNLA. For example, random sampling with a DPP leads to new kinds of unbiased estimators for classical RandNLA tasks, enabling more refined statistical and inferential understanding of RandNLA algorithms; a DPP is, in some sense, an optimal randomized method for many RandNLA problems; and a standard RandNLA technique, called leverage score sampling, can be derived as the marginal distribution of a DPP. This work will be reviewed, as will recent algorithmic developments, illustrating that, while not quite as efficient as simply applying a random projection, these DPPbased algorithms are only moderately more expensive. 
Michael Mahoney 🔗 
Sat 12:40 p.m.  12:45 p.m.

Determinantal Point Processes in Randomized Numerical Linear Algebra
(
Q&A
)

🔗 
Sat 12:45 p.m.  1:00 p.m.

On the Relationship Between Probabilistic Circuits and Determinantal Point Processes
(
Contributed talk
)
Scaling probabilistic models to large realistic problems and datasets is a key challenge in machine learning. Central to this effort is the development of tractable probabilistic models (TPMs): models whose structure guarantees efficient probabilistic inference algorithms. The current landscape of TPMs is fragmented: there exist various kinds of TPMs with different strengths and weaknesses. Two of the most prominent classes of TPMs are determinantal point processes (DPPs) and probabilistic circuits (PCs). This paper provides the first systematic study of their relationship. We propose a unified analysis and shared language for discussing DPPs and PCs. Then we establish theoretical barriers for the unification of these two families, and prove that there are cases where DPPs have no compact representation as a class of PCs. We close with a perspective on the central problem of unifying these models. 
Honghua Zhang · Steven Holtzen · Guy Van den Broeck 🔗 
Sat 1:00 p.m.  1:45 p.m.

Panel discussion
(
Panel discussion
)

🔗 
Sat 1:45 p.m.  2:00 p.m.

Closing remarks
(
Talk
)

Zelda Mariet · Michal Derezinski · Mike Gartrell 🔗 