Paper ID: 409
Title: Correlation Clustering and Biclustering with Locally Bounded Errors

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper provides a theoretical treatment of a generalized correlation clustering problem, where instead of looking to minimize the total sum of errors (across clusters), we instead look at minimizing some other function of the error vector. The authors restrict their attention to a class of problems where the relaxed (continuous) problem is approximately-solvable, and give a rounding algorithm to get a discrete solution. 

Clarity - Justification: The paper is generally well written. 

Significance - Justification: This is a natural, yet not yet studied, extension to the correlation clustering problem. The contribution would have been stronger still if the paper included some empirical evaluation of the approach (that said, without baselines, an empirical evaluation would have been difficult). 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The authors define a natural extension to the correlation clustering problem: consider the vector of errors (with one coordinate for each vector). Standard correlation clustering asks to minimize the L_1 norm of the vector (sum of all the coordinates). The authors ask why not pick a different function, f, e.g. L_2 norm, L_\infty, etc.   They then proceed in a manner similar to other CC papers, first getting a fractional solution  via a relaxation (feasible when f is convex), and then rounding the solution. The core of the contribution lies in the rounding approach -- the main difficulty being that the proof must charge the errors due to the rounding 'locally' to the vertex and its neighbors, and cannot maintain an overall bank account for the whole solution.   

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper introduces a new min-max objective variation of the Correlation Clustering problem. As in correlation clustering, a complete graph G(V,E) have labels that are + or -, depending on whether that pair of items should be in the same cluster or not.  The goal is to find a clustering of the points, that minimizes over all vertices the maximum number of disagreements. This is a min-max version of the usual sum objective in Correlation Clustering. The contributions of the paper are purely theoretical: the paper gives a constant factor approximation to the objective based on rounding a natural LP for this problem. They also give similar approximations for some extensions.  

Clarity - Justification: The paper is written well in general. However, the paper seems written for a CS theorist, and not for the general ML audience (particularly proof of Theorem 5). 

Significance - Justification: While I find the results interesting, I find the paper more suited for a TCS audience ( e.g. SODA or APPROX) as opposed to an ML conference. This is the main reason why I recommend a Reject.   1. This paper introduces the problem for the first time. The authors don't make much effort to explain why it is important/ useful for an ML audience. Has such a problem been considered in the ML community before? They have a paragraph in the introduction that says why it is plausible that it is useful. But it needs to be a more concrete application that demonstrates why it would work better than usual community detection/ correlation clustering.  2.  There are no experiments to demonstrate that this algorithm is useful/ effective in practice. Unlike the work of [Ailon, Charikar, Newman] whose algorithm was simple and beautiful, this algorithm has some involved LP rounding. Considering it is not a theoretical contribution that is foundational to ML, I feel experiments are necessary.   I think it would make a much more compelling submission if they apply this problem and problem to a real-world application, and demonstrate the effectiveness on real datasets. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The authors should do a better literature review of work that involves such locally bounded errors. For example, the work of [Bansal et al.] on Min-Max Graph Partitioning finds a clustering of an unlabeled graph into k roughly equal pieces that minimizes the maximum expansion of the individual pieces.   (The rest of the comments are covered in previous sections).  

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposed a rounding algorithm that converts a  fractional clustering of a complete graph to a discrete clustering, such that the node-wise error of the discrete clustering is upper bounded by a constant factor of the node-wise error of the fractional clustering. 

Clarity - Justification: The authors could add some illustrative examples, which would help the readers to understand their algorithm. 

Significance - Justification: The authors start with the goal of developing algorithms for general loss functions for correlation clustering. Their solution is a two-step procedure assuming the loss function is convex: 1) first use any generic convex optimization procedure (not discussed in this paper) to get a fractional clustering, and then 2) use the proposed round algorithm to convert the fraction clustering to a discrete clustering.   The main result of this paper is to show that the nodewise rounding error is upper bounded by a constant. This may be a sharp bound for the original loss function if the loss is the minimax loss, i.e., L-infinity norm. But for other loss functions, controlling the nodewise round error does not imply a sharp bound for the original loss function f.   Although I'm not familiar with Charikar et al. 2003 and 2005, based on the discussion in the 2nd paragraph in Sec 3, I feel that the contribution of this newly proposed algorithm is negligible.   There is no empirical studies or experiments to demonstrate the performance or potential application of the proposed algorithm. So I'm unable to judge the practical significance of the proposed method.   The authors kept mentioning an extended version of this paper. Is this extended version published somewhere? If it's published or the authors intend to submit it for publication in the future, what's the new/additional intellectual contribution of this ICML paper?  

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): See the comments at 5. 

=====

Review #4
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): The authors consider a generalization of correlation clustering. In correlation clustering, given a complete graph with edges labeled + and -, the objective is to partition the graph into clusters while minimizing the -'s inside a cluster and the +'s between clusters. There is a 2.06 approximation known, and a 3 approximation for complete bipartite graphs. In this paper, the authors consider a more general objective, which is intuitively any convex function of the error on each vertex. One representative case is to minimize the maximum error over all vertices. Motivation for this objective, is in any clustering application (community detection, recommender systems) with individual vertex quality control. They give an LP-rounding algorithm which yields a 48 approximation to the optimum. For the case of a complete bipartite graph, they give a 10-approximation, where the guarantee is only for one side of the graph. Motivation for this problem is in recommender systems in which one side of the graph represents customers, and the other side represents movies. Thus, quality control is only needed for the customers.  Their techniques are as follows. Given an objective function f and a complete graph, the first step is to solve the convex program to output a fractional solution. f must be convex for this step to work (if f is linear, the fractional solution is optimal, otherwise it may be a (1+epsilon) multiple of optimal). Then they give a rounding algorithm which returns a 48-approximation to the fractional solution, which works under mild assumptions on f. It is a pivoting algorithm which is similar to the previous algorithm of Charikar et al., '03, '05. Their algorithm for the bipartite case, is similar to the previous argument (the authors note the analysis is actually simpler because they only make guarantees for one side of the bipartite graph).

Clarity - Justification: The paper was clear and easy to follow.

Significance - Justification: The authors show a constant approximation for a new more general model for correlation clustering, which includes a natural minimax problem. They also show a constant approximation for a one-sided bipartite graph problem, and there are applications for the one-sided guarantee. The techniques are not completely novel.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The strengths are a new interesting model, which includes natural problems that are motivated. The one-sided 10-approximation is interesting. The weaknesses are that the techniques are not completely novel, and the constants are a bit high. The strengths outweigh the weaknesses.

=====