We are grateful to the reviewers for their thorough reading of our submission and the comments they provided for the betterment of this paper. Please find our response below:$
1. We wish to thank the reviewers for pointing out the typos in the paper and for their suggestions. We will certainly try to include all the suggestions provided by the reviewers in the final version of the paper. 

Points raised by Reviewer 2:
2. Alternate between scalar functions and matrix functions : Matrix functions are in a sense scalar functions applied to the eigenvalues of the matrix, i.e., f(A)=Uf(D)U^T, where A=UDU^T is the eigen-decomposition of A with eigenvalues d_1,...,d_n, and f(D)=diag([f(d_1),f(d_2),...,f(d_n)]). Hence, it is common to use scalar and matrix functions interchangeably.
For the Chebyshev polynomials (or the Hermite polynomials), we only need the matrix-vector products T_k(A)v. Hence, to obtain a kth degree polynomial, we only need to apply A to some vectors k-times (to form A^kv), and we do not form matrix-matrix products. An advantage with the orthogonal polynomials is that the three term recurrence allows an economical computation of such vectors. Unfortunately, such details were included only in the supplementary material due to the space constraints.

3. Transforming the spectrum to [-1,1] and rectangular matrices: We consider the input matrix A to be PSD with its spectrum in a general interval [\lambda_n,\lambda_1]. In this case, the numerical rank is equivalent to counting the number of eigenvalues in the interval [\epsilon,\lambda_1], which is then computed using the trace of the step function of A (which has a value one over this interval and zero elsewhere). Since the Chebyshev polynomials (used to approximate the step function) are defined over the interval [-1,1], we need to use a linear transformation on A to transform its spectrum to [-1,1], i.e., B=(A-cI)/h, where c=(\lambda_n+\lambda_1)/2 and h=(\lambda_1-\lambda_n)/2. Now B is no longer PSD, but the eigenvalue count of B in the interval [(\epsilon-c)/h,1] will be same as the eigenvalue count of A in [\epsilon,\lambda_1] due to the linear transformation. Therefore, the rank estimated will be the same. For a rectangular matrix A, we consider the matrix A^TA, but do not form the matrix-matrix product explicitly, since the method only requires matrix-vector products A^TAv. Again, such details had to be included only in the supplementary material due to the space constraints.

4. why can P be interpreted as a step function of A? : The projector P=U_{[a,b]}U_{[a,b]}^T, where U_{[a,b]} is the set of eigenvectors of A corresponding to the eigenvalues in [a,b]. Since h(A)=Uh(D)U^T, h(D) will have 1s for the eigenvalues in [a,b] and zeros for others. Therefore, P=h(A).

5. what are the \beta_j's? : \beta_j's are scalars whose values depend on the function f(t) considered. For the DOS, they are all equal to 1 (in fact, we just need their mean to be equal to one). This fact helps us compute f(t) as the average of v^TT_k(A)v, see Theorem 3.1 in [Lin et. al, 2016].
Where does F(p) come from? Why is this the right way to approach \phi(t) and why is an interpolation needed? : F(p) is the discrete cosine transform of f(t). Since f(t) is a sum of cosine functions, the discrete cosine transform of f(t) will be a sum of delta functions, which is exactly what the DOS function \phi(t) is. Interpolation is required since F(p) is computed only at m points using m-degree Chebyshev polynomials, but the output \phi(t) is a continuous function (as depicted in figure 2).
Since this method has already been presented in [Lin et. al, 2016], we have omitted detailed explanations in our paper. The Lanczos spectroscopic method was considered because it is based on Chebyshev polynomials, and we can reuse the vectors T_k(A)v for the rank (step function) estimation.

6. which delta was used? How sensitive is this choice? : We chose \delta to be |tol|. We mainly want the step function to push the relevant eigenvalues close to one, and the choice of \delta defines the sharpness of the filter. The sharpness required will depend on the spectral gap. Perhaps, we can define \delta automatically from the DOS plot. In our experiments, we found that any \delta<0.1 works fine.