What Drives Interactive Improvement from Feedback?
Abstract
Language models are increasingly used in settings where a failed first attempt is followed by natural-language feedback and revision. However, improvement over multiple turns does not by itself show that a model used the feedback: the same gains might arise from repeated sampling or output format corrections. We study what drives improvement from feedback using a controlled student-teacher protocol on four verifiable reasoning environments: Omni-MATH, Codeforces, BBEH Linguini, and ARC-AGI1. Across thirteen open-weight models, each evaluated as both student and teacher, we investigate the effects of receiving external feedback, self-generated feedback, interaction history, and of feedback generated with access to privileged task information. The resulting picture is nuanced. By analogy to the generator-verifier gap, one might expect self-feedback to outperform unguided self-refinement: a model that cannot solve a problem may still diagnose its own failed attempt. We find that this gap is often small and varies substantially across environments. Feedback from stronger teacher models may yield larger gains, yet teacher strength is not the main source of variation. Across models, outcomes are influenced more by the choice of student than by the choice of teacher, indicating that improvement from feedback is primarily limited by the student's ability to turn an error diagnosis into a correct next attempt.