Paper ID: 380
Title: The Information-Theoretic Requirements of Subspace Clustering with Missing Data

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors consider the Subspace Clustering with Missing Data (SCMD) problem, where the columns of a d \times N matrix X come from a union of subspaces each of dimension at most r. Moreover each column is only partially observed, and we see \ell of its entries. The main contributions of this paper are two-fold. The authors give a tight bound on the sample complexity and show that it suffices for \ell \geq 2r and for each subspace to have at least O(rd) columns generated from it. The authors also use their tools to develop an algorithm for certifying that a set of subspaces is actually the true set of subspaces from which X was generated.

Clarity - Justification: This paper is very well written, and the authors have done a nice job showcasing how the steps of the proof fit together and providing examples to illustrate some complicated definitions that would otherwise be hard to follow. 

Significance - Justification: The SCMD problem has received considerable attention, and the authors resolve its sample complexity here. The tools they develop may also be of practical interest, since there are a great number of heuristics for solving this problem, and one may be able to use their algorithm for certifying that the output is correct as a post-processing step. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I think this paper is already quite well written. As a constructive comment, it seems to me that condition (i) in definition 2 has some relation to the notion of expansion, if one defines a bipartite graph based on the patter of ones and zeros in \Omega^k. Then it seems like the role that condition (i) plays in ensuring that false subspaces do not fit the incomplete columns accidentally (under some conditions on which entries are observed) is similar to how expansion is used to ensure that the correct code-word is decoded in expander codes.   It would also be nice if the authors could explain how their approach for bounding the sample complexity relates to the bounds on the sample complexity of the easier problem of matrix completion (where there is only one subspace). The authors give tight bounds here, that have no extra log-factors either, but I am not sure the sample complexity of matrix completion is known except upto extra logarithmic factors. Is this because the sampling model requires that each column has at least \ell observations, in the SCMD problem? Some further discussion on this early on in the paper would be enlightening. 

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper is of theoretical interest which closes the gap of sufficient and necessary conditions for the sample complexity of subspace clustering in the regime of missing data. The derived result is towards information theoretically optimal. The authors also dedicate some efforts to intuitively interpret their model and problem structure.

Clarity - Justification: Well-written paper with illustrative examples.

Significance - Justification: Towards optimal result on the underlying problem

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper improves the sufficient condition for the sample complexity of SC. The results derived here are significant. The framework looks transparent due to many illustrative examples. I have some minor concerns as follows:  - For assumption A1, is there any specific requirement for \nu_k? Why should is be a continuous distribution? This seems inconsistent with the results of compressed sensing, where sensing matrices can either be continuous (e.g. Gaussian) or discrete (e.g. Bernoulli).   - From Figure 1, Assumption A1 seems closely related to the incoherence assumption in the literature, which, loosely speaking, presumes that each sample should be encoded with sufficient information with respect to the underlying bases. Can such connection be made more clear?

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper considers the problem of noiseless subspace clustering with missing data, for the case when the dimensions of subspaces are the same and known.  The only assumption is the ``general position’’ assumption, which is mild and is shown to be necessary for identifiability even in the fully observed setting.  Main contributions are: - Characterization of the identifiability of union-of-subspaces structures for both iid uniform random observation and deterministic observation pattern. - An efficient ``algebraic’’ algorithm that can be used to certify whether a given subspace is part of the collections of subspaces are correct or not. This verification requires data splitting.  The results are new and technically sound as far as I can tell.  The bounds are sharp and the story is more or less complete. Although it doesn’t really address many of the important open problems for subspace clustering with missing data, I feel the paper made good contribution and suitable for publication at ICML. 

Clarity - Justification: Results are very clearly presented. Connections and overlaps with other papers need to be further justified.

Significance - Justification: New theoretical understanding of an old problem. The understanding for this particular setting is complete, but there are concerns whether this setting of recovering the full subspaces is relevant for the clustering problem to begin with. Also it does not propose an algorithm to achieve the given sample complexity. (see below for details)

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Pros: - Sharp deterministic condition. - Well-written. Easy to follow. Cons: - Studies only the identifiability. Therefore the claim that the work characterizes the sample complexity is a bit far reaching, as no algorithm is proposed to provably cluster the data points correctly.  - Assumes dimension of each subspace r to be known. - Seems to be sensitive to noise (most such algebraic algorithm does).  Two high level comments:  - Clustering sample complexity is not the same as subspace recovery sample complexity (considered in this paper). For instance, if all data points are observed on substantially overlapping coordinates, then regardless how large the ambient dimension is, the clustering sample complexity would be in the order of rN (see, e.g., Wang, Wang and Singh, ICML’15).  In many problems where clustering the final goal, identifiability of subspaces might not be useful. Having a few degree of freedom that is left unspecified about the subspace each cluster spans is often not a big deal. - The missing data problem for subspace clustering is challenging mainly because it is hard to construct a polynomial time algorithm for it. An exponential time algorithm that achieves the information-theoretic limit is to simply exhaustively find all subsets with identifiable observation pattern, run matrix completion on each and then collect the resulting subspaces.  Other comments: 1. Similar deterministic characterization of valid observation pattern has appeared before, e.g., Kiraly and Tomioka (ICML-2012) and Pimentel-Alarcon et. al. (2015a) which both address the issue matrix completion problem. In terms of identifiability, the distinction between one subspace or K-subspaces are very little. Therefore it will help the readers if the authors can clearly describes what are the differences and what are the novel ideas in this submission. 2. It looks like Algorithm 2 appeared earlier in Pimentel-Alarcon et. al., 2015b. Again, it’s important to point out which part of the results are novel. 3. The general position assumption A1 in the subspace clustering context first appeared in Nasihatkon, Behrooz, and Richard Hartley. "Graph connectivity in sparse subspace clustering." CVPR’11 And a robustified version of the general position condition was studied in: Wang, Wang and Singh “Graph Connectivity in Noisy Sparse Subspace Clustering” AISTATS’16. 

=====