Paper ID: 1031
Title: A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

Review #1
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a convex relaxation for Multiple Sequence Alignment and Motif Discovery, which are both NP-hard.  The approach is based on atomic norm and an optimization algorithm, Greedy Direction Method of Multiplier, was developed to solve it. Experimental results show the superiority of the method compared with standard tools.

Clarity - Justification: This paper gives a detailed introduction to multiple sequence alignment and motif discovery.  It took me sometime to digest, but it's readable.

Significance - Justification: See detailed comments

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall, the paper is well-written.  My major concerns are technical novelty and optimization bounds.  1. The relaxation technique is standard---via convex hull.  To optimize any linear objective over a discrete domain, one can always relax the domain to its convex hull.  For the tractability of LMO, the relaxation used in Eq 7 is not the convex hull of the discrete domain defined in the constraints in Eq 6.  Instead, Eq 7 uses the intersection of the convex hull of each of the constraint in Eq 6.  This is a further relaxation because the intersection of the convex hull of X and Y is generally a proper superset of the convex hull of the intersection of X and Y.  So unless some bounds can be established on the optimality of the result of rounding, it’s hard to consider this relaxation as novel.  2. Theorem 2 and its context are not rigorously stated.  d^* is defined in appendix, not the main paper. The bound has an expectation while the Algorithms 1 and 2 have no randomness.  How does the bound in Theorem 2 translate to the rate of finding the W that is \epsilon accurate for the very original convex problem Eq 11?  How do the “constants” in the bound depend on the dimension of the problem (N, \ell_n, |\Sigma|, etc)?  

=====

Review #2
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): Multiple sequence alignment (MSA) and motif discovery (MD) are important NP-hard problems encountered frequently in computational biology; applications range from studies of repetitive elements in genomes to protein family identification. This paper brings to bear on these problems a relatively recent but well-known innovation by Chandrasekharan et al. that generalizes to arbitrary structured atoms ideas about sparsity traditionally applied to vectors and matrices; the approach revolves around use of the norm induced by the convex hull of the collection of atoms, i.e., the atomic norm. The authors consider convex relaxations of MSA and MD and compare the output of several MSA solvers with that of their ConvexMSA algorithm, which incorporates frequently invoked convex optimization techniques like ADMM. ConvexMSA clearly tends to minimize nonmatches better than competitors that use, for example, EM and Gibbs sampling; indeed in many cases, unlike competitors, it actually finds the "ground truth" that was perturbed in simulation -- and sometimes it finds alignments with lower scores than the ground truth.

Clarity - Justification: The text has some grammatical issues.

Significance - Justification: Applications of convex programming to sequence alignment are scant and have not generally proved useful, though there have been efforts over the years to apply linear programming and associated algorithms to MSA with the sum-of-pairs objective. (On this front, the authors are encouraged to search for papers on branch-and-cut for solving the maximum trace problem applied to MSA by Kececioglu and collaborators.) The authors' work, by contrast, looks very promising: it clearly produces higher-quality alignments than competitors.

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The reader may be left wondering how fast an implementation of ConvexMSA is compared to for example Clustal. The paper may thus benefit from a speed comparison across tools. (It's fine if ConvexMSA is Matlab code that could be optimized -- the object of the paper is by no means to win the speed game.) The paper should include a link to ConvexMSA.

=====

Review #3
=====
Summary of the paper (Summarize the main claims/contributions of the paper.): This paper introduces formulations as convex optimization programs of two major problems in bioinformatics: the multiple sequence alignment (MSA) problem, and the motif discovery (MD) problem. It introduces algorithms to solve these problems via an augmented lagrangian method, separating the convex constraints in two parts, each of them amenable to efficient algorithms. They propose a sufficient condition for their algorithm to converge, and compare their method against competitors on synthetic and real data. 

Clarity - Justification: The paper is generally well written and technically sound. The only negative point regarding clarity is that the notation is heavy: the first three pages introduce a lot of intertwined notions which makes reading more difficult. My understanding of the paper is not deep enough to suggest specific changes, but if the authors managed to make the notation lighter I think it could help make the paper more clear. It was also unclear to me how precisely the validity constraints (A_n sets) were defined. 

Significance - Justification: Overall, the paper is significant: the proposed approach is original, technically sound and seems to yield good results. My main concern is on the scale of the experiments. MSA techniques are compared on a maximum of 30 sequence of length 50, which is a borderline useful regime in some applications so it makes the contribution significant for a machine learning conference, but it is also limited: the 1994 clustalw paper made experiments on 60 length 60 sequences, the 2001 clustal omega paper on tenth of thousands of sequences of thousands of elements. It would be important to comment on this aspect: how many sequences and which lengths can the proposed method deal with? Similarly, the MD problem seems to be simplified in the paper: the experiments consist in finding motifs in a single very short sequence. The general problem tackled by methods like MEME is to identify motifs which are present across several much longer sequences. Here again, it would be important to clearly state and discuss this limitation. 

Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is clear, the proposed method is novel and interesting. The scale of the experiments is however small: most bioinformatics application require more, longer sequences. This point should be discussed in the paper.

=====