Measurement of Embedding Choices on Cryptographic API Completion TasksJust Accepted

ACM Transactions on Software Engineering and Methodology(2022)

引用 0|浏览5
暂无评分
摘要
In this paper, we conduct a measurement study to comprehensively compare the accuracy impacts of multiple embedding options in cryptographic API completion tasks. Embedding is the process of automatically learning vector representations of program elements. Our measurement focuses on design choices of three important aspects, program analysis preprocessing , token-level embedding , and sequence-level embedding . Our findings show that program analysis is necessary even under advanced embedding. The results show 36.20% accuracy improvement on average when program analysis preprocessing is applied to transfer bytecode sequences into API dependence paths. With program analysis and the token-level embedding training, the embedding dep2vec improves the task accuracy from 55.80% to 92.04%. Moreover, only a slight accuracy advantage (0.55% on average) is observed by training the expensive sequence-level embedding compared with the token-level embedding. Our experiments also suggest the differences made by the data. In the cross-app learning setup and a data scarcity scenario, sequence-level embedding is more necessary and results in a more obvious accuracy improvement (5.10%).
更多
查看译文
关键词
neural code completion,embedding,deep learning,neural networks,program analysis,API dependency,cryptography,secure coding,Java
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要