ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (1): 202-213.doi: 10.7544/issn1000-1239.2020.20180871

Previous Articles     Next Articles

Plagiarism Detection of Multi-Threaded Programs by Mining Behavioral motifs

Tian Zhenzhou1,2, Wang Ningning1, Wang Qing1, Gao Cong1,2, Liu Ting3, Zheng Qinghua3   

  1. 1(School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121);2(Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing (Xi’an University of Posts and Telecommunications), Xi’an 710121);3(School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049)
  • Online:2020-01-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61702414), the Natural Science Basic Research Program of Shaanxi (2018JQ6078), the International Science and Technology Cooperation Program of Shaanxi Province (2018KW-049, 2019KW-008), and the Key Research and Development Program of Shaanxi Province (2019ZDLGY07-08).

Abstract: Thread scheduling nondeterminism in multi-threaded programs severely perturbs dynamic birthmarking techniques, which serves as one of the most promising approaches in solving obfuscation-resilient software plagiarism detection problem. In the extreme cases, a traditional dynamic birthmarking method even fails to detect plagiarism between a multi-threaded program and itself. To solve the problem, a novel method for plagiarism detection of multi-threaded programs is proposed by abstracting behavioral motifs, which are less susceptible to thread scheduling, from the similar parts of same-input-driven execution traces of a multi-threaded program. Specifically, the method captures execution traces during program runs, from which a series of motifs are inferred by performing trace pruning, gram-based matching, as well as bidirectional extending and abstraction, to depict the behavior of the multi-threaded program. Plagiarism between two programs is then determined by measuring the similarity of their motifs and a threshold. An empirical study conducted on a public benchmark consisting of totally 234 different versions of multi-threaded programs shows that the proposed method is resilient against most state-of-the-art code obfuscation techniques and outperforms two existing dynamic birthmarking methods TreSB (thread-related system call birthmark) and TOB (thread-oblivious birthmark) framework which also target multi-threaded program plagiarism detection. It indicates that the mined behavioral motifs form a new kind of robust thread-oblivious birthmark.

Key words: software plagiarism, multi-threaded program plagiarism detection, dynamic birthmark, thread-oblivious birthmark, behavioral motifs

CLC Number: