Tian Zhenzhou, Wang Ningning, Wang Qing, Gao Cong, Liu Ting, Zheng Qinghua. Plagiarism Detection of Multi-Threaded Programs by Mining Behavioral motifs[J]. Journal of Computer Research and Development, 2020, 57(1): 202-213. DOI: 10.7544/issn1000-1239.2020.20180871
Citation:
Tian Zhenzhou, Wang Ningning, Wang Qing, Gao Cong, Liu Ting, Zheng Qinghua. Plagiarism Detection of Multi-Threaded Programs by Mining Behavioral motifs[J]. Journal of Computer Research and Development, 2020, 57(1): 202-213. DOI: 10.7544/issn1000-1239.2020.20180871
Tian Zhenzhou, Wang Ningning, Wang Qing, Gao Cong, Liu Ting, Zheng Qinghua. Plagiarism Detection of Multi-Threaded Programs by Mining Behavioral motifs[J]. Journal of Computer Research and Development, 2020, 57(1): 202-213. DOI: 10.7544/issn1000-1239.2020.20180871
Citation:
Tian Zhenzhou, Wang Ningning, Wang Qing, Gao Cong, Liu Ting, Zheng Qinghua. Plagiarism Detection of Multi-Threaded Programs by Mining Behavioral motifs[J]. Journal of Computer Research and Development, 2020, 57(1): 202-213. DOI: 10.7544/issn1000-1239.2020.20180871
1(School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121)
2(Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing (Xi’an University of Posts and Telecommunications), Xi’an 710121)
3(School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049)
Funds: This work was supported by the National Natural Science Foundation of China (61702414), the Natural Science Basic Research Program of Shaanxi (2018JQ6078), the International Science and Technology Cooperation Program of Shaanxi Province (2018KW-049, 2019KW-008), and the Key Research and Development Program of Shaanxi Province (2019ZDLGY07-08).
Thread scheduling nondeterminism in multi-threaded programs severely perturbs dynamic birthmarking techniques, which serves as one of the most promising approaches in solving obfuscation-resilient software plagiarism detection problem. In the extreme cases, a traditional dynamic birthmarking method even fails to detect plagiarism between a multi-threaded program and itself. To solve the problem, a novel method for plagiarism detection of multi-threaded programs is proposed by abstracting behavioral motifs, which are less susceptible to thread scheduling, from the similar parts of same-input-driven execution traces of a multi-threaded program. Specifically, the method captures execution traces during program runs, from which a series of motifs are inferred by performing trace pruning, gram-based matching, as well as bidirectional extending and abstraction, to depict the behavior of the multi-threaded program. Plagiarism between two programs is then determined by measuring the similarity of their motifs and a threshold. An empirical study conducted on a public benchmark consisting of totally 234 different versions of multi-threaded programs shows that the proposed method is resilient against most state-of-the-art code obfuscation techniques and outperforms two existing dynamic birthmarking methods TreSB (thread-related system call birthmark) and TOB (thread-oblivious birthmark) framework which also target multi-threaded program plagiarism detection. It indicates that the mined behavioral motifs form a new kind of robust thread-oblivious birthmark.