A survey on machine learning techniques applied to source code

Tushar Sharma,Maria Kechagia,Stefanos Georgiou, Rohit Tiwari, Indira Vats,Hadi Moazen,Federica Sarro

JOURNAL OF SYSTEMS AND SOFTWARE（2024）

引用 0|浏览10

暂无评分

摘要

The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learning for source code analysis. We review studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we conducted an extensive literature search and identified 494 studies. We summarize our observations and findings with the help of the identified studies. Our findings suggest that the use of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task and summarize machine learning techniques employed. We identify a comprehensive list of available datasets and tools useable in this context. Finally, the paper discusses perceived challenges in this area, including the availability of standard datasets, reproducibility and replicability, and hardware resources. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board.

查看译文

关键词

Machine learning for software engineering,Source code analysis,Deep learning,Datasets,Tools

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要