We thank the reviewers for their valuable feedback.$
Reviewer 1 and Reviewer 4 both pointed out that the paper would have been strengthened by empirical evaluation. Without compromising confidentiality, we note that the authors have used these techniques on several problems in bioinformatics, including cancer driver gene detection and microbial phylogenetics; we allude to these applications when we mention that the broader class of objective functions is relevant to bioinformatics. We also mentioned recommender systems as a motivating application for minimax biclustering. However, the focus of this paper is to present the theoretical framework and tools, and we feel that the paper would be cluttered by an overemphasis on applications, some of which (bioinformatics, in particular) would require extensive background exposition.

Reviewer 4 stated that the paper is more suited to a theoretical CS audience than to ICML. However, machine learning conferences, including NIPS and ICML itself, have published many papers about correlation clustering, so the topic of correlation clustering is undoubtedly relevant to ICML. Furthermore, the conference attendees may find correlation clustering, especially the broader framework analyzed in this paper, to be an interesting alternative to probabilistic community detection approaches.

We gratefully acknowledge Reviewer 4's reference to the work of [Bansal et al.] and will update our literature review accordingly. We note that the paper of [Bansal et al] referred to by Reviewer 4 studies a model in which the number of clusters is fixed ahead of time, rather than being determined as part of the solution as in correlation clustering. The minimax objective function studied by [Bansal et al.] is a maximum taken over the clusters, rather than over the vertices as in this work.

Reviewer 4 contrasted the algorithm in this paper with the simpler algorithm of Ailon, Charikar, and Newman. As we pointed out in the introduction, the Ailon--Charikar--Newman algorithm breaks down in the minimax context, and there are significant obstacles to modifying the algorithm for this setting. (This issue was analyzed in the supplemental material in more detail, but we acknowledge that the reviewers were not required to look at it.)

Reviewer 6 stated that the techniques used in this paper were not entirely novel and that the approximation constants were rather high. It is true that we adapt the algorithm of Charikar, Guruswami and Wirth and that the changes to the algorithm itself appear relatively minor, consisting of a rule for choosing the pivot vertex (where previously the pivot vertex was arbitrary). However, our proof requires a significantly more detailed analysis of the algorithm's behavior in order to establish error bounds that are local rather than global. We have focused on overcoming these difficulties to obtain a constant-factor approximation, rather than trying to optimize the approximation constants. In light of the fact that the Ailon--Charikar--Newman algorithm fails in the minimax setting, it is not obvious that a constant-factor approximation should be possible at all.