in

Workshop: Subset Selection in Machine Learning: From Theory to Applications

Abstract:
Classes of set functions along with a choice of ground set are a bedrock to determine and develop corresponding variants of greedy algorithms to suitably obtain approximate and efficient solutions for combinatorial optimization. The class of constrained submodular optimization has seen huge advances at the intersection of good computational efficiency, versatility and approximation guarantees while unconstrained submodular optimization is NP-hard. What is an alternative to situations when submodularity does not hold? Can efficient and globally exact solutions be obtained? We introduce one such new frontier: The class of quasi-concave set functions induced as a dual class to monotone linkage functions. We provide a parallel algorithm with a time complexity over $n$ processors of $\mathcal{O}(n^2g) +\mathcal{O}(\log{\log{n}})$ where $n$ is the cardinality of the ground set and $g$ is the complexity to compute the monotone linkage function that induces a corresponding quasi-concave set function via a duality. The complexity reduces to $\mathcal{O}(gn\log(n))$ on $n^2$ processors and to $\mathcal{O}(gn)$ on $n^3$ processors. Our approach reduces the currently existing cubic computational complexity to those mentioned above. Our algorithm provides a globally optimal solution to a maxi-min problem as opposed to submodular optimization which is approximate. We show a potential for widespread applications via an example of diverse feature subset selection with exact global maxi-min guarantees upon showing that a statistical dependency measure called distance correlation can be used to induce a quasi-concave set function.