A Deep Learning Model for Source Code Generation
Citations Over TimeTop 10% of 2019 papers
Abstract
Natural Language Processing (NLP) models have been used extensively to study relationship among words in a corpus. Inspired by models such as n-gram we developed a model for analyzing source code via its Abstract Syntax Tree (AST). This study has applications in source code generation, code completion, and software forensics and investigation. The study also benefits software developers and programmers striving to improve code efficiency and speed up development process. With source code analysis, the Natural Language Tool Kit (NLTK) which is very useful in NLP, becomes severely limited by its inability to handle the semantic and syntactic properties of source codes. Therefore, we processed the source code datasets as Abstract Syntax Trees (ASTs) rather than the source code text itself to take advantage of the information-rich structure of the AST. The proposed model is built on the deep learning-based Long Short-Term Memory (LSTM) and Multiple Layer Perceptron (MLP) architectures. Results from our intrinsic evaluation on a corpus of python projects have demonstrated its ability of effectively predicting a sequence of source code tokens and show an improvement over previous work in this field.
Related Papers
- → Vulnerability Prediction From Source Code Using Machine Learning(2020)104 cited
- → StructCoder: Structure-Aware Transformer for Code Generation(2023)35 cited
- → A random code generation method based on syntax tree layering model(2021)2 cited
- Automatic Refactoring Method of Cloned Code Using Abstract Syntax Tree and Static Analysis(2009)
- → StructCoder: Structure-Aware Transformer for Code Generation(2022)7 cited