Timezone: »
The problem of fixing errors in programs has attracted substantial interest over the years. The key challenge for building an effective code fixing tool is to capture a wide range of errors and meanwhile maintain high accuracy. In this paper, we address this challenge and present a new learning-based system, called TFix. TFix works directly on program text and phrases the problem of code fixing as a text-to-text task. In turn, this enables it to leverage a powerful Transformer based model pre-trained on natural language and fine-tuned to generate code fixes (via a large, high-quality dataset obtained from GitHub commits). TFix is not specific to a particular programming language or class of defects and, in fact, improved its precision by simultaneously fine-tuning on 52 different error types reported by a popular static analyzer. Our evaluation on a massive dataset of JavaScript programs shows that TFix is practically effective: it is able to synthesize code that fixes the error in ~67 percent of cases and significantly outperforms existing learning-based approaches.
Author Information
Berkay Berabi (ETH Zurich)
Jingxuan He (ETH Zurich)
Veselin Raychev (Snyk)
Martin Vechev (ETH Zurich)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer »
Thu. Jul 22nd 01:45 -- 01:50 PM Room
More from the Same Authors
-
2021 : Automated Discovery of Adaptive Attacks on Adversarial Defenses »
Chengyuan Yao · Pavol Bielik · Petar Tsankov · Martin Vechev -
2023 : Incentivizing Honesty among Competitors in Collaborative Learning »
Florian Dorner · Nikola Konstantinov · Georgi Pashaliev · Martin Vechev -
2023 : Programmable Synthetic Tabular Data Generation »
Mark Vero · Mislav Balunovic · Martin Vechev -
2023 : Hiding in Plain Sight: Disguising Data Stealing Attacks in Federated Learning »
Kostadin Garov · Dimitar I. Dimitrov · Nikola Jovanović · Martin Vechev -
2023 : Large Language Models are Zero-Shot Multi-Tool Users »
Luca Beurer-Kellner · Marc Fischer · Martin Vechev -
2023 : LMQL Chat: Scripted Chatbot Development »
Luca Beurer-Kellner · Marc Fischer · Martin Vechev -
2023 : Large Language Models for Code: Security Hardening and Adversarial Testing »
Jingxuan He · Martin Vechev -
2023 : Connecting Certified and Adversarial Training »
Yuhao Mao · Mark Müller · Marc Fischer · Martin Vechev -
2023 : Understanding Certified Training with Interval Bound Propagation »
Yuhao Mao · Mark Müller · Marc Fischer · Martin Vechev -
2023 Workshop: 2nd Workshop on Formal Verification of Machine Learning »
Mark Müller · Brendon G. Anderson · Leslie Rice · Zhouxing Shi · Shubham Ugare · Huan Zhang · Martin Vechev · Zico Kolter · Somayeh Sojoudi · Cho-Jui Hsieh -
2023 Poster: FARE: Provably Fair Representation Learning with Practical Certificates »
Nikola Jovanović · Mislav Balunovic · Dimitar I. Dimitrov · Martin Vechev -
2023 Poster: TabLeak: Tabular Data Leakage in Federated Learning »
Mark Vero · Mislav Balunovic · Dimitar I. Dimitrov · Martin Vechev -
2022 Workshop: Workshop on Formal Verification of Machine Learning »
Huan Zhang · Leslie Rice · Kaidi Xu · aditi raghunathan · Wan-Yi Lin · Cho-Jui Hsieh · Clark Barrett · Martin Vechev · Zico Kolter -
2022 Poster: On Distribution Shift in Learning-based Bug Detectors »
Jingxuan He · Luca Beurer-Kellner · Martin Vechev -
2022 Spotlight: On Distribution Shift in Learning-based Bug Detectors »
Jingxuan He · Luca Beurer-Kellner · Martin Vechev -
2021 Poster: Scalable Certified Segmentation via Randomized Smoothing »
Marc Fischer · Maximilian Baader · Martin Vechev -
2021 Spotlight: Scalable Certified Segmentation via Randomized Smoothing »
Marc Fischer · Maximilian Baader · Martin Vechev -
2021 Poster: PODS: Policy Optimization via Differentiable Simulation »
Miguel Angel Zamora Mora · Momchil Peychev · Sehoon Ha · Martin Vechev · Stelian Coros -
2021 Spotlight: PODS: Policy Optimization via Differentiable Simulation »
Miguel Angel Zamora Mora · Momchil Peychev · Sehoon Ha · Martin Vechev · Stelian Coros -
2020 Poster: Adversarial Robustness for Code »
Pavol Bielik · Martin Vechev -
2020 Poster: Adversarial Attacks on Probabilistic Autoregressive Forecasting Models »
Raphaël Dang-Nhu · Gagandeep Singh · Pavol Bielik · Martin Vechev -
2019 Poster: DL2: Training and Querying Neural Networks with Logic »
Marc Fischer · Mislav Balunovic · Dana Drachsler-Cohen · Timon Gehr · Ce Zhang · Martin Vechev -
2019 Oral: DL2: Training and Querying Neural Networks with Logic »
Marc Fischer · Mislav Balunovic · Dana Drachsler-Cohen · Timon Gehr · Ce Zhang · Martin Vechev -
2018 Poster: Training Neural Machines with Trace-Based Supervision »
Matthew Mirman · Dimitar Dimitrov · Pavle Djordjevic · Timon Gehr · Martin Vechev -
2018 Oral: Training Neural Machines with Trace-Based Supervision »
Matthew Mirman · Dimitar Dimitrov · Pavle Djordjevic · Timon Gehr · Martin Vechev -
2018 Poster: Differentiable Abstract Interpretation for Provably Robust Neural Networks »
Matthew Mirman · Timon Gehr · Martin Vechev -
2018 Oral: Differentiable Abstract Interpretation for Provably Robust Neural Networks »
Matthew Mirman · Timon Gehr · Martin Vechev