Oral
Faster Algorithms for Binary Matrix Factorization
Ravi Kumar · Rina Panigrahy · Ali Rahimi · David Woodruff

Thu Jun 13th 04:40 -- 05:00 PM @ Seaside Ballroom

We give faster approximation algorithms for well-studied variants of Boolean Matrix Factorization, where we are given a binary matrix $A \in \mathbb{R}^{m \times n}$ and would like to find binary matrices $U \in {0,1}^{m \times k}$ and $V \in {0,1}^{k \times n}$ so as to minimize $\|U \cdot V - A\|_F^2$. In the first setting, $U \cdot V$ denotes multiplication over the integers, and We give an algorithm outputting a constant factor approximation in $2^{O(k^2 \log k)} \textrm{poly}(mn)$ time, improving the previous $\min(2^{2^k}, 2^n) \textrm{poly}(mn)$ time.

Our techniques generalize to finding $U \in {0,1}^{m \times k}$ and $V \in {0,1}^{k \times n}$ so as to approximately minimize $\|U \cdot V - A\|p^p$, where $p \geq 1$ is any real number, in $2^{O(k^{\lceil p/2 \rceil + 1}\log k)} \textrm{poly}(mn)$ time. For $p = 1$, our results have a graph-theoretic interpretation. Namely, given an unweighted bipartite graph $G$ with $m$ vertices in the left part and $n$ vertices in the right part, how well can one approximate $G$ as a disjoint union $H$ of $k$ bipartite cliques (bicliques)? A natural notion of approximation is to find a disjoint union $H$ of $k$ bicliques so as to minimize the number of edges in the symmetric difference, i.e., to find $H$ with $$|E(G)\setminus E(H)| + |E(H)\setminus E(G)| \leq C \min{\textrm{a disjoint union }H' \textrm{ of } k \textrm{ cliques}} |E(G)\setminus E(H')| + |E(H') \setminus E(G)|,$$ where $C > 1$ is a constant approximation factor. It is known that deciding if a bipartite graph is equal to the disjoint union of $k$ bicliques requires $2^{\Omega(k)} \cdot \textrm{poly}(mn)$ time under standard complexity-theoretic assumptions, and so this lower bound also applies to the approximate case. While there are algorithms to decide if $G$ can be expressed as the disjoint union of $k$ bicliques in $2^{O(k^2)} \cdot \textrm{poly}(mn)$ time, surprisingly, no algorithms were known for the approximate version of the problem. We give the first constant factor approximation algorithm for this problem, running in $2^{O(k^2 \log k)} \textrm{poly}(mn)$ time.

Finally, we give the fastest known bicriteria constant factor approximation when the multiplication $U \cdot V$ is over the finite field $GF(2)$. We achieve $2^{O(k^3)} \poly(mn)$ time to output binary rank $O(k \log m)$ matrices $U$ and $V$ whose cost is as good as the best rank-$k$ approximation, improving the previous $\min(2^{2^k}mn, \min(m,n)^{O(\poly(k))} \textrm{poly}(mn))$ time.