基于行为motifs的多线程程序抄袭检测方法

田振洲; 王宁宁; 王清; 高聪; 刘烃; 郑庆华

doi:10.7544/issn1000-1239.2020.20180871

基于行为motifs的多线程程序抄袭检测方法

Plagiarism Detection of Multi-Threaded Programs by Mining Behavioral motifs

摘要

摘要: 软件动态胎记技术是实现混淆对抗的软件抄袭检测的有效手段之一.然而，多线程程序中线程交织的不确定性对其造成了不可忽视的影响；极端情况下，传统动态胎记技术甚至会判定同一个程序与其自身不存在抄袭关系.对此，提出从多线程程序在同一输入下的多条执行轨迹中进行相似部分的识别，并从中抽象出不易受线程交织影响的行为motifs来实现多线程程序的抄袭检测.该方法捕捉程序的动态执行轨迹，经过轨迹修剪、gram匹配以及扩展和抽象，从中提取motifs胎记建模多线程程序的行为；最终，通过衡量motifs胎记的相似性实现程序间潜在抄袭的判定.在一个包含234个不同版本多线程程序的公共数据集上开展的实验表明，motifs胎记是一种可靠的线程感知胎记方法，不仅可有效对抗当下主流的代码混淆技术，相比现有的2种多线程程序抄袭检测方法TreSB(thread-related system call birthmark)和TOB(thread-oblivious birthmark)，也体现出更优秀的检测能力.

Abstract: Thread scheduling nondeterminism in multi-threaded programs severely perturbs dynamic birthmarking techniques, which serves as one of the most promising approaches in solving obfuscation-resilient software plagiarism detection problem. In the extreme cases, a traditional dynamic birthmarking method even fails to detect plagiarism between a multi-threaded program and itself. To solve the problem, a novel method for plagiarism detection of multi-threaded programs is proposed by abstracting behavioral motifs, which are less susceptible to thread scheduling, from the similar parts of same-input-driven execution traces of a multi-threaded program. Specifically, the method captures execution traces during program runs, from which a series of motifs are inferred by performing trace pruning, gram-based matching, as well as bidirectional extending and abstraction, to depict the behavior of the multi-threaded program. Plagiarism between two programs is then determined by measuring the similarity of their motifs and a threshold. An empirical study conducted on a public benchmark consisting of totally 234 different versions of multi-threaded programs shows that the proposed method is resilient against most state-of-the-art code obfuscation techniques and outperforms two existing dynamic birthmarking methods TreSB (thread-related system call birthmark) and TOB (thread-oblivious birthmark) framework which also target multi-threaded program plagiarism detection. It indicates that the mined behavioral motifs form a new kind of robust thread-oblivious birthmark.

HTML全文

参考文献(0)

施引文献

资源附件(0)