Timezone: »
Large Language Models (LLMs) have shown remarkable aptitude in generating code from natural language specifications, but still struggle on challenging programming tasks. Self-repair---in which the user provides executable unit tests and the model uses these to debug and fix mistakes in its own code---may improve performance in these settings without significantly altering the way in which programmers interface with the system. However, existing studies on how and when self-repair works effectively have been limited in scope, and one might wonder how self-repair compares to keeping a software engineer in the loop to give feedback on the code model's outputs. In this paper, we analyze GPT-3.5 and GPT-4's ability to perform self-repair on APPS, a challenging dataset consisting of diverse coding challenges. We find that when the cost of generating both feedback and repaired code is taken into account, performance gains from self-repair are marginal and can only be seen with GPT-4. In contrast, when human programmers are used to provide feedback, the success rate of repair increases by as much as 57%. These findings suggest that self-repair still trails far behind what can be achieved with a feedback-giving human kept closely in the loop.
Author Information
Theo X. Olausson (MIT)
Jeevana Priya Inala (MIT)
Chenglong Wang (Microsoft)
Jianfeng Gao (Microsoft Research AI)
Jianfeng Gao is Partner Research Manager at Microsoft Research AI. He leads the development of AI systems for machine reading comprehension (MRC), question answering (QA), social bots, goal-oriented dialogue, and business applications. From 2014 to 2017, he was Partner Research Manager at Deep Learning Technology Center at Microsoft Research, Redmond, where he was leading the research on deep learning for text and image processing. From 2006 to 2014, he was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond, where he worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation. From 2005 to 2006, he was a Research Lead in Natural Interactive Services Division at Microsoft, where he worked on Project X, an effort of developing natural user interface for Windows. From 2000 to 2005, he was Research Lead in Natural Language Computing Group at Microsoft Research Asia, where he and his colleagues developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Microsoft Windows.
Armando Solar-Lezama (MIT)
More from the Same Authors
-
2021 : Safe Human-Interactive Control via Shielding »
Jeevana Priya Inala -
2023 : Self-verification improves few-shot clinical information extraction »
Zelalem Gero · Chandan Singh · Hao Cheng · Tristan Naumann · Michel Galley · Jianfeng Gao · Hoifung Poon -
2023 : Differentiable Tree Operations Promote Compositional Generalization »
Paul Soulos · Edward Hu · Kate McCurdy · Yunmo Chen · Roland Fernandez · Paul Smolensky · Jianfeng Gao -
2023 : Building Community Driven Libraries of Natural Programs »
Leonardo Hernandez Cano · Yewen Pu · Robert Hawkins · Josh Tenenbaum · Armando Solar-Lezama -
2023 : Prof. Armando Solar-Lezama (MIT): Neurosymbolic Learning as a Path to Learning with Guarantees »
Armando Solar-Lezama -
2023 Poster: Understand and Modularize Generator Optimization in ELECTRA-style Pretraining »
Chengyu Dong · Liyuan Liu · Hao Cheng · Jingbo Shang · Jianfeng Gao · Xiaodong Liu -
2023 Poster: Differentiable Tree Operations Promote Compositional Generalization »
Paul Soulos · Edward Hu · Kate McCurdy · Yunmo Chen · Roland Fernandez · Paul Smolensky · Jianfeng Gao -
2022 : Session 3: New Computational Technologies for Reasoning »
Armando Solar-Lezama · Guy Van den Broeck · Jan-Willem van de Meent · Charles Sutton -
2022 : Session 1: New Reasoning Problems and Modes of Reasoning »
Robert Ness · Rosemary Nan Ke · Armando Solar-Lezama -
2022 Workshop: Beyond Bayes: Paths Towards Universal Reasoning Systems »
Zenna Tavares · Emily Mackevicius · Elias Bingham · Nan Rosemary Ke · Talia Ringer · Armando Solar-Lezama · Nada Amin · John Krakauer · Robert O Ness · Alexis Avedisian -
2021 Poster: A Language for Counterfactual Generative Models »
Zenna Tavares · James Koppel · Xin Zhang · Ria Das · Armando Solar-Lezama -
2021 Spotlight: A Language for Counterfactual Generative Models »
Zenna Tavares · James Koppel · Xin Zhang · Ria Das · Armando Solar-Lezama -
2021 Poster: A large-scale benchmark for few-shot program induction and synthesis »
Ferran Alet · Javier Lopez-Contreras · James Koppel · Maxwell Nye · Armando Solar-Lezama · Tomas Lozano-Perez · Leslie Kaelbling · Josh Tenenbaum -
2021 Spotlight: A large-scale benchmark for few-shot program induction and synthesis »
Ferran Alet · Javier Lopez-Contreras · James Koppel · Maxwell Nye · Armando Solar-Lezama · Tomas Lozano-Perez · Leslie Kaelbling · Josh Tenenbaum -
2020 Poster: UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training »
Hangbo Bao · Li Dong · Furu Wei · Wenhui Wang · Nan Yang · Xiaodong Liu · Yu Wang · Jianfeng Gao · Songhao Piao · Ming Zhou · Hsiao-Wuen Hon -
2020 Poster: Mapping natural-language problems to formal-language solutions using structured neural representations »
Kezhen Chen · Qiuyuan Huang · Hamid Palangi · Paul Smolensky · Ken Forbus · Jianfeng Gao -
2020 Poster: Feature Quantization Improves GAN Training »
Yang Zhao · Chunyuan Li · Ping Yu · Jianfeng Gao · Changyou Chen -
2019 Poster: Learning to Infer Program Sketches »
Maxwell Nye · Luke Hewitt · Josh Tenenbaum · Armando Solar-Lezama -
2019 Oral: Learning to Infer Program Sketches »
Maxwell Nye · Luke Hewitt · Josh Tenenbaum · Armando Solar-Lezama -
2019 Poster: Predicate Exchange: Inference with Declarative Knowledge »
Zenna Tavares · Javier Burroni · Edgar Minasyan · Armando Solar-Lezama · Rajesh Ranganath -
2019 Oral: Predicate Exchange: Inference with Declarative Knowledge »
Zenna Tavares · Javier Burroni · Edgar Minasyan · Armando Solar-Lezama · Rajesh Ranganath -
2019 Tutorial: Neural Approaches to Conversational AI »
Michel Galley · Jianfeng Gao -
2018 Poster: Selecting Representative Examples for Program Synthesis »
Yewen Pu · Zachery Miranda · Armando Solar-Lezama · Leslie Kaelbling -
2018 Oral: Selecting Representative Examples for Program Synthesis »
Yewen Pu · Zachery Miranda · Armando Solar-Lezama · Leslie Kaelbling