We develop and analyze a novel utility function and a fast optimization algorithm for subset selection in sequential data that incorporates the underlying dynamic model of data. We propose a capacitated sequential facility location function that finds a fixed number of representatives that well encode the data, where the sequence of representatives are compatible with the dynamic model of data. As maximizing this new objective function is NP-hard, we develop a fast greedy algorithm based on submodular maximization. Unlike the conventional facility location, the computation of the marginal gain in our case cannot be done by operations on each item independently. We exploiting the sequential structure of the problem and develop an efficient dynamic programming-based algorithm that exactly computes the marginal gain. We investigate conditions on the dynamic transition model of data, under which our utility function is submodular or $\epsilon$-approximately submodualr, hence, the greedy algorithm comes with performance guarantees. By experiments on synthetic data and the real-world problem of procedure learning from instructional videos, we show that our framework not only significantly improves the computational time, but also achieves better objective function values and obtains more coherent summaries.
Ehsan Elhamifar (Northeastern University)
Ehsan Elhamifar is an Assistant Professor in the Khoury College of Computer Sciences and is the director of the Mathematical, Computational and Applied Data Science (MCADS) Lab at Northeastern University. He is a recipient of the DARPA Young Faculty Award and the NSF Career Research Initiation Initiative Award. Previously, he was a postdoctoral scholar in the Electrical Engineering and Computer Science department at UC Berkeley. He obtained his PhD from the Electrical and Computer Engineering department at the Johns Hopkins University. Dr. Elhamifar's research areas are machine learning, computer vision and optimization. He develops scalable, robust and provable algorithms that can address challenges of complex and massive high-dimensional data. He use these tools to solve real-world challenging problems, including big data summarization, procedure learning from instructional data, large-scale recognition with limited labeled data and active learning for visual data.
Related Events (a corresponding poster, oral, or spotlight)
2019 Poster: Sequential Facility Location: Approximate Submodularity and Greedy Algorithm »
Thu Jun 13th 06:30 -- 09:00 PM Room Pacific Ballroom