Timezone: »
Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over 100% relative improvement on the latter --- at the cost of a moderate increase in computation time.
Author Information
Milan Cvitkovic (California Institute of Technology)
Badal Singh (Amazon Web Services)
Anima Anandkumar (Caltech)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Open Vocabulary Learning on Source Code with a Graph-Structured Cache »
Thu Jun 13th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2021 Workshop: Workshop on Socially Responsible Machine Learning »
Chaowei Xiao · Animashree Anandkumar · Mingyan Liu · Dawn Song · Raquel Urtasun · Jieyu Zhao · Xueru Zhang · Cihang Xie · Xinyun Chen -
2019 Poster: Minimal Achievable Sufficient Statistic Learning »
Milan Cvitkovic · Günther Koliander -
2019 Oral: Minimal Achievable Sufficient Statistic Learning »
Milan Cvitkovic · Günther Koliander -
2018 Poster: StrassenNets: Deep Learning with a Multiplication Budget »
Michael Tschannen · Aran Khanna · Animashree Anandkumar -
2018 Poster: Born Again Neural Networks »
Tommaso Furlanello · Zachary Lipton · Michael Tschannen · Laurent Itti · Anima Anandkumar -
2018 Oral: Born Again Neural Networks »
Tommaso Furlanello · Zachary Lipton · Michael Tschannen · Laurent Itti · Anima Anandkumar -
2018 Oral: StrassenNets: Deep Learning with a Multiplication Budget »
Michael Tschannen · Aran Khanna · Animashree Anandkumar