Session
Optimization (Combinatorial) 1
Weakly Submodular Maximization Beyond Cardinality Constraints: Does Randomization Help Greedy?
Lin Chen · Moran Feldman · Amin Karbasi
Submodular functions are a broad class of set functions that naturally arise in many machine learning applications. Due to their combinatorial structures, there has been a myriad of algorithms for maximizing such functions under various constraints. Unfortunately, once a function deviates from submodularity (even slightly), the known algorithms may perform arbitrarily poorly. Amending this issue, by obtaining approximation results for functions obeying properties that generalize submodularity, has been the focus of several recent works. One such class, known as weakly submodular functions, has received a lot of recent attention from the machine learning community due to its strong connections to restricted strong convexity and sparse reconstruction. In this paper, we prove that a randomized version of the greedy algorithm achieves an approximation ratio of $(1 + 1/\gamma )^{-2}$ for weakly submodular maximization subject to a general matroid constraint, where $\gamma$ is a parameter measuring the distance from submodularity. To the best of our knowledge, this is the first algorithm with a non-trivial approximation guarantee for this constrained optimization problem. Moreover, our experimental results show that our proposed algorithm performs well in a variety of real-world problems, including regression, video summarization, splice site detection, and black-box interpretation.
Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams
Ashkan Norouzi-Fard · Jakub Tarnawski · Slobodan Mitrovic · Amir Zandieh · Aidasadat Mousavifar · Ola Svensson
Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thuswe need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elementsarrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.
Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints
Ehsan Kazemi · Morteza Zadimoghaddam · Amin Karbasi
Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against \textit{any} number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80\%$ of data deletion.
Data Summarization at Scale: A Two-Stage Submodular Approach
Marko Mitrovic · Ehsan Kazemi · Morteza Zadimoghaddam · Amin Karbasi
The sheer scale of modern datasets has resulted in a dire need for summarization techniques that can identify representative elements in a dataset. Fortunately, the vast majority of data summarization tasks satisfy an intuitive diminishing returns condition known as submodularity, which allows us to find nearly-optimal solutions in linear time. We focus on a two-stage submodular framework where the goal is to use some given training functions to reduce the ground set so that optimizing new functions (drawn from the same distribution) over the reduced set provides almost as much value as optimizing them over the entire ground set. In this paper, we develop the first streaming and distributed solutions to this problem. In addition to providing strong theoretical guarantees, we demonstrate both the utility and efficiency of our algorithms on real-world tasks including image summarization and ride-share optimization.