Extracting compiler provenance from program binaries

PASTE '10: Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering(2010)

引用 86|浏览2
暂无评分
摘要
We present a novel technique that identifies the source compiler of program binaries, an important element of program provenance. Program provenance answers fundamental questions of malware analysis and software forensics, such as whether programs are generated by similar tool chains; it also can allow development of debugging, performance analysis, and instrumentation tools specific to particular compilers. We formulate compiler identification as a structured learning problem, automatically building models to recognize sequences of binary code generated by particular compilers. We evaluate our techniques on a large set of real-world test binaries, showing that our models identify the source compiler of binary code with over 90% accuracy, even in the presence of interleaved code from multiple compilers. A case study demonstrates the use of inferred compiler provenance to augment stripped binary parsing, reducing parsing errors by 18%.
更多
查看译文
关键词
program provenance,interleaved code,multiple compiler,compiler identification,binary code,compiler provenance,source compiler,program binary,binary parsing,particular compiler,code generation,forensics,languages
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要