Rebuttal for Dealbreaker: A Nonlinear Latent Variable Model for Educational$
We thank the reviewers for their favorable comments on the novelty, exposition, and experimental results. We also greatly appreciate their constructive criticism. We have carefully considered all comments and detail our responses below.
# Reviewer 1
We agree that the proposed dealbreaker (DB) model does not significantly outperform existing methods in terms of prediction performance. The key strength of the DB (and the focus of this paper) is definitely interpretability, as pointed out by the reviewer. While all existing models yield similar prediction accuracy, prediction accuracy is not a relevant metric in this application; indeed, even with high prediction accuracy, the underlying latent student knowledge remains unaccessible. We present accuracy results, because they are the standard metric that the relevant applications use to evaluate algorithms.
# Reviewer 2
We agree that the DB model only slightly outperforms other competing algorithms for some datasets. We emphasize, however, that the key strength of the DB (and the focus of the paper) is the interpretability of the model parameters, which is crucial to the applications we study (see also our comments to Reviewer 1).
We emphasize that no single method can be expected to outperform all other models on all datasets. However, we present an in-depth analysis revealing the kinds of situations where DB outperforms other algorithms. As we have discussed (see the bottom of p. 6 and the first column on p. 7, lines 656-706), all of our prediction results were computed on the entire dataset (averaged over every question). We have reported that the DB model outperforms affine models on subsets of the dataset featuring questions with diverse response patterns across students (see Table 2 for details). This observation shows that the DB model is superior to affine models at analyzing questions with more complex concept knowledge requirements for students to achieve success.
Thanks for the helpful comments on factor models in collaborative filtering (CF). We emphasize that the 3PL MIRT model and the Rasch model (both compared against) are affine factor models; we have also discussed the relationship of DB to existing factor models for educational data in the Introduction (see the entire second column of p. 1). However, most of the existing algorithms for CF in domains outside of education treat the data as real-valued and are thus not ideally suited to our application with binary-valued data. The 1-bit MC algorithm--one of the few exceptions--is included in our experiments. The experimental results in that paper show its superiority over most CF algorithms; therefore, we focused on it rather than comparing against additional CF algorithms.
# Reviewer 3
We will add a citation regarding proximal methods to the final paper.
Regarding the proof of Theorem 1, several authors have studied the proximal operator of the infinity norm, which can be viewed as a special case of our Theorem 1. Boyd & Parikh have an excellent writeup of similar results, although they are certainly not the first to present this simple case (see, e.g., the discussion of the dual problem by Duchi, Shalev-Swartz, Singer ‘08 and other much older references). We will happily include these citations in the final paper and mention how they relate to our theoretical result. However, we emphasize that Theorem 1 is not identical to these results, because our applications required considerably broader generality. Rather than simply considering the max of a set of variables (max x_i), we consider a max over functions of variables (max g(x_i), where g is any differentiable non-increasing function). Nonetheless, we will mention the close relationship between our results and previous L-infinity results in the literature.
Regarding convergence of the ADMM scheme, convergence of ADMM for non-convex problems has only very recently been proven by Li and Pong (“Global convergence of splitting methods for non-convex composite optimization”) when a sufficiently large penalty parameter is used. We will cite this in the final version and briefly discuss the implications to DB.
We would like to clarify that we do indeed have a proper experimental setting, since we repeated our experiment 20 times with different randomly divided training/testing sets and averaged the results; hence, error bars are available (we decided against listing them in the table due to space limitations and clarity of exposition). Nevertheless, we will add error bars to Tables 1-3 in the final version (which will require some machinations to keep the tables readable).
We are in the process of making our datasets available online; currently, we are awaiting our IRB’s approval.