Timezone: »

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams
Ashkan Norouzi-Fard · Jakub Tarnawski · Slobodan Mitrovic · Amir Zandieh · Aidasadat Mousavifar · Ola Svensson

Wed Jul 11 09:15 AM -- 12:00 PM (PDT) @ Hall B #149

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

Author Information

Ashkan Norouzi-Fard (EPFL)
Jakub Tarnawski (EPFL)

I am a doctoral student in the Theory of Computation laboratory at the EPFL, fortunate to have Ola Svensson as my advisor. I am broadly interested in theoretical computer science and combinatorial optimization, particularly in graph algorithms and approximation algorithms. I received my MSc in Mathematics and Computer Science from the University of Wrocław, Poland.

Slobodan Mitrovic (EPFL)
Amir Zandieh (EPFL)
Aidasadat Mousavifar (EPFL)
Ola Svensson (EPFL)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors