We are gratified that all three reviewers appreciate the novelty of our work. As summarized by R1 and R5, we present a new geometric regularization method for classification models. In response to R7’s comments, enforcing the regularizer on the class probability estimator, rather than on the real functional predictor or the class label predictor, is a deliberate feature of our regularization process, for reasons motivated in the Introduction (the “small local oscillation” phenomenon). The final goal of this regularization approach is classification rather than estimating the class probability. As a result, in experiment design, we feel that using classification accuracy as the evaluation metric is reasonable, while R7 proposes other interesting criteria that are outside the focus of this paper. $ It is also worth noting that we did not cherry pick a specific subset of UCI, but followed exactly the same setup as previous work for the sake of comparison. As noted by the reviewers, “The experimental results on eight low-dimensional datasets clearly demonstrate its superior performance over other regularization methods” (R5), and “The results of this paper are a promising start” (R1). Detailed responses: R1: As pointed out by the reviewer, the softmax of h initially guarantees that the image of f lies in the simplex, which lies in an affine subspace of Euclidean space. In the general theory of our algorithm, we would update f by moving in the optimal direction \nabla P_{\mathcal T_m,f} + \lambda \nabla P_{G,f} to a function f’. However, \nabla P_{G,f} may not lie in the tangent space to the simplex, so f’ may no longer take values in the simplex. To move in the right direction, we must project \nabla P_{G,f} back into the tangent space of the simplex, as mentioned at lines 528-549. In the RBF implementation, we have to further determine the RBF functions h at each step. This involves (13) applied to the projection of \nabla P_{G,f}, not to \nabla P_{G,f} itself. We thank the reviewer for asking about this, and we will clarify it in the paper. We are glad that the reviewer points out the potential fruitful applications of our novel regularization method beyond classification, and we are planning to pursue these in our future work. R5: We will correct the typos and add details about RBFs to Section 3.2 accordingly (please also see our response to R1). R7: As explained in the first paragraph of this rebuttal, we feel that the reviewer shifts the focus away from the main contributions of this work. Further clarifications: 1. Our method follows the regularized empirical loss minimization setup to obtain a point estimate of the prediction function, rather than following a probabilistic inference setup. The methods we compared our results to also follow the same general setup, whereas PCVM follows a probabilistic inference setup and conformal prediction focuses on the confidence level of the predictor in an online setup. 2. We are studying the geometry of a specific manifold, i.e., the functional graph of a predictor, which is well-defined given the predictor and unrelated to the distribution of the data. This is in contrast with the traditional manifold learning problem and assumptions therein. Of course, the properties of the predictor will affect the properties of this manifold; for instance, the RBF representation in our experiments leads to a smooth predictor, which results in a smooth functional graph. However, this is not an a priori assumption made by our regularization approach. We have specified in the Abstract two minimal requirements to apply this regularization approach: First, a class probability estimator can be obtained. Second, first and second partial derivatives can be calculated. 3. In our judgment, the tangent distance method is not directly related to our work, as it focuses on a new class of kernels for SVM, i.e., a different functional representation to approximate the problem. Even from a purely mathematical perspective, this is very different from our approach, as it only involves linear and quadratic forms vs. our method, which is based on differential geometry formalisms, specifically, the theory of geometric volume flow. 4. The most closely related work is covered in the Introduction and further discussed in Section 2.3. We will be very grateful if Reviewer 7 would possibly re-evaluate our submission based on the above clarifications, which we will also incorporate into our paper.