We develop and analyze a novel utility function and a fast optimization algorithm for subset selection in sequential data that incorporates the dynamic model of data. We propose a cardinality-constrained sequential facility location function that finds a fixed number of representatives, where the sequence of representatives is compatible with the dynamic model and well encodes the data. As maximizing this new objective function is NP-hard, we develop a fast greedy algorithm based on submodular maximization. Unlike the conventional facility location, the computation of the marginal gain in our case cannot be done by operations on each item independently. We exploit the sequential structure of the problem and develop an efficient dynamic programming-based algorithm that computes the marginal gain exactly. We investigate conditions on the dynamic model, under which our utility function is ($\epsilon$-approximately) submodualr, hence, the greedy algorithm comes with performance guarantees. By experiments on synthetic data and the problem of procedure learning from instructional videos, we show that our framework significantly improves the computational time, achieves better objective function values and obtains more coherent summaries.
Ehsan Elhamifar (Northeastern University)
Ehsan Elhamifar is an Assistant Professor in the Khoury College of Computer Sciences and is the director of the Mathematical, Computational and Applied Data Science (MCADS) Lab at Northeastern University. He is a recipient of the DARPA Young Faculty Award and the NSF Career Research Initiation Initiative Award. Previously, he was a postdoctoral scholar in the Electrical Engineering and Computer Science department at UC Berkeley. He obtained his PhD from the Electrical and Computer Engineering department at the Johns Hopkins University. Dr. Elhamifar's research areas are machine learning, computer vision and optimization. He develops scalable, robust and provable algorithms that can address challenges of complex and massive high-dimensional data. He use these tools to solve real-world challenging problems, including big data summarization, procedure learning from instructional data, large-scale recognition with limited labeled data and active learning for visual data.
Related Events (a corresponding poster, oral, or spotlight)
2019 Oral: Sequential Facility Location: Approximate Submodularity and Greedy Algorithm »
Thu Jun 13th 04:25 -- 04:30 PM Room Room 104