Paper ID: 1246 Title: The Label Complexity of Mixed-Initiative Classifier Training Review #1 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper studies the problem of learning with a combination of labeled data chosen by a teacher and labeled data actively requested by the learner (it also discusses using purely one or the other). There are two main themes. The first is a discussion of the theoretical properties of statistical learning in this setting. The second is an empirical study of how effective human labelers actually are at providing helpful examples for teaching simple concepts. It additionally discusses a study in which the teachers are provided with examples of good teaching sets for other concepts of the same type, which reveals that such teacher-training may sometimes be quite helpful (though other times, not as much). Clarity - Justification: The paper is well-written and easy to follow. Significance - Justification: The topic seems important. The main contribution seems to be the empirical study, which though simplistic, initiates an interesting direction of research. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): Overall, this seems to be an important topic, and I applaud these first steps toward understanding how to effectively employ labelers who are willing to provide helpful examples, but perhaps require some guidance or hints to help them choose such examples wisely. The theoretical discussion is nice to have (e.g., the definition of teaching dimension, for obtaining epsilon excess error rates in the version space), but doesn't really introduce anything new. These examples are well known in the literature, and the analysis of TD(epsilon) for them is essentially the same as the existing analysis of their empirical teaching dimensions (these quantities can be formally related to each other). The general results, such as proposition 1, are fairly obvious. Thus, in my mind, the main contribution is the empirical study of the effectiveness of human teachers, which I found absolutely fascinating. The limitation of this study is that it is restricted to very simple concept spaces. I would be curious to see a more involved scenario, such a learning a linear separator for some NLP or vision task or other realistic application. I would imagine the types of inefficiencies in the human teachers become more interesting there, especially since the teachers would not have an intuitive understanding of the concept space (due to the primitive feature space representation). Furthermore, the types of hints suggested in the present work would presumably no longer be feasible, since we couldn't communicate an example target concept to the teacher. For now, this paper provides a nice initial step, and I will be eager to see follow-up works that bring this closer to something practically useful. minor comments: The lower bound of Kulkarni et al. in paragraph 2 of section 3.1 is missing a "+" before the "max". Also, in the sentence after that, the original source for this agnostic case lower bound is Beygelzimer, Dasgupta & Langford (2009). In paragraph 4 of section 3.1, the phrase "until it encounters a positive point" should be qualified by something like "or reaches grid spacing $\leq \epsilon$". In the last paragraph of section 3.1, "blink" -> "blind". ===== Review #2 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): The paper provides a high-level overview of different settings of active learning, where the queries are made either by computer program, or by a human teacher, or in a combined regime. The authors compare the sample complexities of these three scenarios by reviewing several already known results from the literature on statistical learning theory and combining them with a notion of a teaching dimension. Finally, they consider two famous settings of an active learning classification problems (1d threshold and 1d intervals) and provide real-life experiments involving Mechanical Turk service based on these two problems. The authors conclude that the numbers observed during the experiments are well aligned with the theory of active learning and teaching dimension. The paper contains no theorems, no propositions, no proofs. Clarity - Justification: The paper is written rather clearly. However, I had hard time figuring out about the main goals and results of the paper. The paper is organized in such a way that there are no theorems, propositions, or any rigorous mathematical statements contained in it. The only novel contribution is, apparently, the experiment provided in section 5. While the experiment involves the human participants gathered through the Mechanical Turk service, which of course looks very curious and unusual, the main purpose of this experiment remains unclear to me. Significance - Justification: I failed to spot any novel results in this work. The text contains no precise statements (no Theorems, Propositions, Lemmas). The authors develop some discussions based on the existing literature, but I could not identify any interesting and novel contributions in these arguments. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): The paper is nicely written and reads smoothly, but I don't see any non-trivial and interesting contributions to the machine learning community. In the abstract the authors claim to provide a theoretical justification for the mixed-initiative learning, but this is done in a rather hand-wavy form by reviewing previously known results from active learning. Considering these facts and everything said above, I don't think this paper can be accepted. ===== Review #3 ===== Summary of the paper (Summarize the main claims/contributions of the paper.): This paper analyzes the label complexity of mixed-initiative classifier training, i.e. training a classifier by a combination of active learning (where the learning algorithm proposes examples it wishes a human to label) and human teaching (where a human provides a set of labeled examples to provide the learning algorithm). Label complexity is given in terms of the teaching dimension and the standard AL label complexity and is analyzed as a function of how good the human teacher is. Because the quality of human teaching is an important factor in the label complexity, the authors also conduct human experiments that determined that human teachers can be taught (to some degree) how to provide good sets of examples to learning algorithms. Clarity - Justification: The paper is nicely written and well organized. The discussion is thorough and insightful. Significance - Justification: The paper contains novel theoretical contributions and experimental results that could impact many real-world learning systems. Detailed comments. (Explain the basis for your ratings while providing constructive feedback.): I recommend acceptance due to this being the first formal analysis of a topic that is strongly connected to real-world applications of active learning. =====