We thank the reviewers for detailed comments and suggestions that we will incorporate in the submission.$ We chose to focus on the Quadratic Assignment Problem (QAP) since this is a well-studied and general class of hard permutation problems, and includes many problems of interest such as max-cut and facility location. We ran our experiments on QAPLIB, the canonical set of test instances that encompasses a range of applications. We hope that demonstrating the use of this approach for QAP would spur potential applications to other hard permutation problems of interest to the community. Lim and Wright (2014) directly use the extended formulation of the permutahedron by Goemans (2009/2015). The relaxation we derive here also exploits the relationship between sorting networks and permutations, but is otherwise very different. It only has box constraints and applies to permutation matrices instead of just permutation vectors, significantly increasing the potential range of applications. [Experiments] In response to Reviewers 3 and 6's questions about experiments, we plan to discuss the following in the revision: * Include comparisons to a variant of CD without continuation * Test a greedy swap heuristic at the end of both algorithms * More synthetic experiments to assess scaling We ran preliminary tests on the first two: Experiments on CD without continuation or the local optimum heuristic (CDwoC). * CDwoC is often a few times faster than CD, and is faster than FAQ10/30 for every problem family * Compared to FAQ, CDwoC has better solutions for 3 problem families A greedy pairwise swap (GS) heuristic can be appended to any algorithm. Tests of this heuristic on FAQ (FAQGS) and CD (CDGS) show: * Out of the 7 problem families where CD fared better than FAQ30, CDGS is better than FAQGS30 for 4 and similar for 3. Of the other 9 families, FAQGS30 does slightly better for 2 compared to FAQ30 * Out of the 12 families where CD fared better than FAQ10, CDGS is better than FAQGS10 for 6, similar for 3, and worse for 3. FAQGS10 is better than FAQ10 for the other 4. Thus, our simple version of GS seems to benefit FAQ more than CD, but there are several problem families (in this comprehensive class) for which CD is still better. [Reviewer 3] Re: Vogelstein et al. and continuation - A continuation approach could also be applied, at the cost of increasing run time for FAQ. In this context, Zaslavskiy et al. (as we mention) interpolated between a pair of relaxations that use the Birkhoff polytope (like rQAP), but FAQ performs significantly better than their work on QAPLIB. Re: Regularization and bias - Regularization affects each permutation equally, so the solutions remain the same. We agree that the choice of sorting network may affect which local solutions CD would converge to (and the regularization term may exacerbate this). Re: Convergence rate - Empirically, FAQ often does not converge (i.e. the nearest permutation stays the same) within 30 iterations, whereas typically CD converges within a few cycles at each value of mu. Re: bur and els - These instances are small (n~20). The per-iteration cost of FAQ is a lot higher here than for other problems of similar n, suggesting that the linear assignment problems that FAQ needs to solve for Frank-Wolfe are 'hard'. [Reviewer 5] Re: Lines 660-674 - The second interpretation is correct. [Reviewer 6] Re: Givens rotations - We have noticed these similarities; it could be interesting to investigate the connection. Re: Proposition 3.1 - The sequence of iterates f(x^1),f(x^2)... is monotonically decreasing. Re: Lines 566-570 - X and Y should be replaced with S and T respectively. [Reviewer 7] Re: Convergence and Birkhoff FAQ (Frank-Wolfe, Birkhoff relaxation) has no convergence guarantees. A bound on the number of cycles would be interesting, but nonconvexity and difficulty of QAP makes this improbable. Re: Stability and solution quality We're not sure what is meant by 'stability' in this context. We do test one notion of stability: the effect of restarts and the choice of sorting networks on solution quality. For this we used 100 runs and provided both the best and median results. The main guarantee we have is that we terminate at a local minima when the objective is coordinatewise concave. It is known to be NP-hard to approximate QAP within any polynomial factor. Re: Scalability - We agree that scalability is important. Our synthetic QAP results indicate that CD scales slightly better than FAQ. We plan to test scaling on much larger instances. (We excluded such comparisons in the initial manuscript because FAQ runtimes are long for large instances.) Re: Line 305 - Yes. Re: Line 660 - Performing greedy swaps is a very natural heuristic, and is in fact the basis for many QAP metaheuristics (e.g. tabu search).