Paper ID: 141 Title: Dealbreaker: A Nonlinear Latent Variable Model for Educational Data Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a new model for students' responses on assessment questions. The proposed model is a generalization of the well-studied Rasch model with 3 additional assumptions: (a) unlike Rasch model, each student has "k" ability scores which capture his understanding with respect to "k" different concepts, (b) similarly each project has a difficulty level for each of the "k" concepts, and (c) the probability of answering a question correctly is a function of the hardest aspect of the question (where "hardest" means the largest gap between students ability and the required ability in that particular question). The authors then highlight why learning the parameters of this model is non-trivial, and they discuss which optimization methods can be used to learn these parameters. They also provide a "soft" version of the problem which approximates the model but it is easier to learn the parameters for. Finally, the authors show experimental results which shows that the new model outperform (slightly) in some cases. However, it is important to note that this model provides some interpretable results by highlighting which aspect of the problem is the most difficult. Clarity - Justification: I think the paper is well-written. The message is clear, and it is easy to follow all the details. The experiments are clearly honest and precise. It was a pleasure to read the article. Significance - Justification: I believe the new model is certainly interesting. While it may not outperform other baselines significantly, it provides interpretable results. More precisely, it can highlight which aspect is the most difficult aspect in a question. Similarly, it learn the ability of students with respect to each aspect. Also, the optimization section is very interesting to read. I'm aware that many of the proposed techniques are based on existing work, but demonstrating how they can be used for this particular application is interesting and can serve as a nice tutorial for approaching such optimization problems. Finally, I think the anecdotal results (both on the questions dataset as well as the movie dataset) show that this formulation is capable of unveiling some of the interesting patterns hidden in the data. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is very well-written. It is easy to read and follow all the details. The experiments are clearly accurate and honest. Overall, the presentation quality is high. Regarding the technical contributions, I must say that while the new proposed model is not doing significantly better in terms of accuracy or AUC, it certainly provides more interpretable results which I found very interesting. The anecdotal results provided by the authors show that the new model captures (more) the complexity of responses provided by the students and their correctness. The new IRT model is non-trivial to optimize, but the authors do a great job of explaining how existing optimization techniques can be used to learn the parameters for the model (as well as a more relaxed/soft version of the model). Overall, I would argue for acceptance as I think the new model is interesting and practical as the paper demonstrates. ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper proposes a non-linear model for modeling student's response to questions. They work under the assumption that a student's ability to answer a question is limited by the minimum difference in the student's mastery of a particular topic and the ability of answering a particular question. They also proposed a somewhat novel prox operator in the process learning parameters of there model. They evaluate there proposed model on a set of question answering datasets and on movie ratings dataset. Clarity - Justification: The presentation is clear. The specific point where I got confused is that the authors never mentioned anything about convergence of the specified method. More specifically that ADMM will converge for the given non-convex optimization problem. Also the convexity of P_{min} (section 3.2) was not clear (although in this case you can directly get to unique optima) Significance - Justification: I liked the proposed method and their application to both student-modeling and movie rating. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Model: Overall its very well written and easy to follow. Theorem 1 which is presented as a major theoretical contribution is a extension of section 6.4 of "Proximal Algorithms - Neal Parikh and Steven Boyd", but its not cited while proving that solution. The authors failed to mention anything about the quality of the final achieved stationary point and the insight about convergence. Experiments *Major criticism:* As mentioned before I did not like the absence of error bars in the table 1, 2, and 3. This makes it very difficult to understand/ interpret the results. It is known that the ADMM parameter $\rho$ is set very high for convergence but no discussion is mentioned in the experiment section. I liked the detailed analysis of which algorithm has better performance for different types of questions (hard vs easy). About the dataset: Is only one of the data publicly available? Or all of them will be available? Overall I like the paper and the presentation of the paper, but due to lack of proper experiment procedure I am refraining from giving it a strong accept. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper proposes a nonlinear latent variable model for predicting student responses to a question. Students and questions are represented as vectors in a common latent concept space and uses the minimum element-wise difference to calculate the probability. The proposed method is based on the intuition that the likelihood of student's success depends on the underlying concept on which the student has the weakest mastery. The paper proposed a hard and a soft algorithm for the resulting non-convex problem. The proposed dealbreaker algorithm is experimentally compared to 3 state-of-art baselines and to a matrix completion method. Some minor improvement (less than 1%) was observed. Clarity - Justification: The paper is accessible and relatively easy to read. It provides a nice overview of the related work and clearly positions the proposed work in the framework of the existing approaches. Significance - Justification: While the proposed approach is nicely justified and there is some novelty in the proposed approach, the experimental results are unconvincing and barely move the needle as compared to simpler baselines. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Positives: + well justified approach + paper is nicely written + experimental design is sufficiently comprehensive + the hard version of the model is technically well developed Negatives: - the results do not show a significant improvement over baselines. More work is needed to understand why. It might be necessary to have a more thorough experimentation to discover situations when the proposed approach is truly beneficial. - although the methodology is developed for educational data mining, the problem is closely related to latent factor models for collaborative filtering. A more thorough comparison to those models is needed. =====