Oral
Faster Algorithms for Binary Matrix Factorization
Ravi Kumar · Rina Panigrahy · Ali Rahimi · David Woodruff
Abstract:
We give faster approximation algorithms for
well-studied variants of Boolean Matrix Factorization,
where we are given a binary matrix $A \in \mathbb{R}^{m \times n}$
and would like to find binary matrices $U \in \{0,1\}^{m \times k}$
and $V \in \{0,1\}^{k \times n}$ so as to minimize $\|U \cdot V - A\|_F^2$.
In the first setting,
$U \cdot V$ denotes multiplication over the integers, and
We give an algorithm outputting
a constant factor approximation in $2^{O(k^2 \log k)} \textrm{poly}(mn)$ time, improving the previous $\min(2^{2^k}, 2^n) \textrm{poly}(mn)$ time.
Our techniques generalize to finding
$U \in \{0,1\}^{m \times k}$
and $V \in \{0,1\}^{k \times n}$ so as to approximately
minimize $\|U \cdot V - A\|_p^p$,
where $p \geq 1$ is any real number, in $2^{O(k^{\lceil p/2 \rceil + 1}\log k)} \textrm{poly}(mn)$ time.
For $p = 1$, our results have a graph-theoretic
interpretation. Namely,
given an unweighted bipartite graph $G$ with $m$ vertices in the left part and $n$ vertices in the right part,
how well can one approximate
$G$ as a disjoint union $H$ of $k$ bipartite cliques (bicliques)? A natural notion of approximation is to find a disjoint union $H$ of $k$ bicliques
so as to
minimize the number of edges in the symmetric difference, i.e.,
to find $H$ with
$$|E(G)\setminus E(H)| + |E(H)\setminus E(G)|
\leq C \min_{\textrm{a disjoint union }H' \textrm{ of } k \textrm{ cliques}}
|E(G)\setminus E(H')| + |E(H') \setminus E(G)|,$$
where $C > 1$ is a constant approximation factor.
It is known
that deciding if a bipartite graph is equal to the disjoint union of $k$
bicliques requires $2^{\Omega(k)} \cdot \textrm{poly}(mn)$ time under standard complexity-theoretic
assumptions, and so this lower bound also applies to the approximate case.
While there are algorithms to decide if $G$
can be expressed as the disjoint union of $k$ bicliques in $2^{O(k^2)} \cdot \textrm{poly}(mn)$ time,
surprisingly, no algorithms were known for the approximate version
of the problem. We give the first constant factor approximation
algorithm for this problem, running in $2^{O(k^2 \log k)} \textrm{poly}(mn)$ time.
Finally, we give the fastest known bicriteria constant factor approximation
when the multiplication $U \cdot V$ is over the finite field $GF(2)$.
We achieve $2^{O(k^3)} \poly(mn)$ time
to output binary rank $O(k \log m)$ matrices $U$ and $V$ whose cost is
as good as the best rank-$k$ approximation, improving the previous
$\min(2^{2^k}mn, \min(m,n)^{O(\poly(k))} \textrm{poly}(mn))$ time.
Chat is not available.
Successful Page Load