Distinguishing AI- and Human-Generated Code: A Case Study

PROCEEDINGS OF THE 2023 WORKSHOP ON SOFTWARE SUPPLY CHAIN OFFENSIVE RESEARCH AND ECOSYSTEM DEFENSES, SCORED 2023(2023)

引用 0|浏览2
暂无评分
摘要
While the use of AI assistants for code generation has the potential to revolutionize the way software is produced, assistants may generate insecure code, either by accident or as a result of poisoning attacks. They may also inadvertently violate copyright laws by mimicking code protected by restrictive licenses. We argue for the importance of tracking the provenance of AI-generated code in the software supply chain, so that adequate controls can be put in place to mitigate risks. For that, it is necessary to have techniques that can distinguish between human- and AI-generate code, and we conduct a case study in regards to whether such techniques can reliably work. We evaluate the effectiveness of lexical and syntactic features for distinguishing AI- and human-generated code on a standardized task. Results show accuracy up to 92%, suggesting that the problem deserves further investigation.
更多
查看译文
关键词
supply chain security,AI code generation,program analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要