ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (1): 202-213.doi: 10.7544/issn1000-1239.2020.20180871

• 软件技术 • 上一篇    下一篇

基于行为motifs的多线程程序抄袭检测方法

田振洲1,2,王宁宁1,王清1,高聪1,2,刘烃3,郑庆华3   

  1. 1(西安邮电大学计算机学院 西安 710121);2(陕西省网络数据分析与智能处理重点实验室(西安邮电大学) 西安 710121);3(西安交通大学电信学院 西安 710049) (tianzhenzhou@xupt.edu.cn)
  • 出版日期: 2020-01-01
  • 基金资助: 
    国家自然科学基金项目(61702414);陕西省自然科学基础研究计划项目(2018JQ6078);陕西省国际科技合作计划项目(2018KW-049,2019KW-008);陕西省重点研发计划项目(2019ZDLGY07-08)

Plagiarism Detection of Multi-Threaded Programs by Mining Behavioral motifs

Tian Zhenzhou1,2, Wang Ningning1, Wang Qing1, Gao Cong1,2, Liu Ting3, Zheng Qinghua3   

  1. 1(School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121);2(Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing (Xi’an University of Posts and Telecommunications), Xi’an 710121);3(School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049)
  • Online: 2020-01-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61702414), the Natural Science Basic Research Program of Shaanxi (2018JQ6078), the International Science and Technology Cooperation Program of Shaanxi Province (2018KW-049, 2019KW-008), and the Key Research and Development Program of Shaanxi Province (2019ZDLGY07-08).

摘要: 软件动态胎记技术是实现混淆对抗的软件抄袭检测的有效手段之一.然而,多线程程序中线程交织的不确定性对其造成了不可忽视的影响;极端情况下,传统动态胎记技术甚至会判定同一个程序与其自身不存在抄袭关系.对此,提出从多线程程序在同一输入下的多条执行轨迹中进行相似部分的识别,并从中抽象出不易受线程交织影响的行为motifs来实现多线程程序的抄袭检测.该方法捕捉程序的动态执行轨迹,经过轨迹修剪、gram匹配以及扩展和抽象,从中提取motifs胎记建模多线程程序的行为;最终,通过衡量motifs胎记的相似性实现程序间潜在抄袭的判定.在一个包含234个不同版本多线程程序的公共数据集上开展的实验表明,motifs胎记是一种可靠的线程感知胎记方法,不仅可有效对抗当下主流的代码混淆技术,相比现有的2种多线程程序抄袭检测方法TreSB(thread-related system call birthmark)和TOB(thread-oblivious birthmark),也体现出更优秀的检测能力.

关键词: 软件抄袭, 多线程程序抄袭检测, 动态胎记, 线程感知胎记, 行为motifs

Abstract: Thread scheduling nondeterminism in multi-threaded programs severely perturbs dynamic birthmarking techniques, which serves as one of the most promising approaches in solving obfuscation-resilient software plagiarism detection problem. In the extreme cases, a traditional dynamic birthmarking method even fails to detect plagiarism between a multi-threaded program and itself. To solve the problem, a novel method for plagiarism detection of multi-threaded programs is proposed by abstracting behavioral motifs, which are less susceptible to thread scheduling, from the similar parts of same-input-driven execution traces of a multi-threaded program. Specifically, the method captures execution traces during program runs, from which a series of motifs are inferred by performing trace pruning, gram-based matching, as well as bidirectional extending and abstraction, to depict the behavior of the multi-threaded program. Plagiarism between two programs is then determined by measuring the similarity of their motifs and a threshold. An empirical study conducted on a public benchmark consisting of totally 234 different versions of multi-threaded programs shows that the proposed method is resilient against most state-of-the-art code obfuscation techniques and outperforms two existing dynamic birthmarking methods TreSB (thread-related system call birthmark) and TOB (thread-oblivious birthmark) framework which also target multi-threaded program plagiarism detection. It indicates that the mined behavioral motifs form a new kind of robust thread-oblivious birthmark.

Key words: software plagiarism, multi-threaded program plagiarism detection, dynamic birthmark, thread-oblivious birthmark, behavioral motifs

中图分类号: