ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (9): 1914-1928.doi: 10.7544/issn1000-1239.20220014

• 软件技术 • 上一篇    下一篇

基于深度学习的数据竞争检测方法

张杨,乔柳,东春浩,高鸿斌   

  1. (河北科技大学信息科学与工程学院 石家庄 050018) (zhangyang@hebust.edu.cn)
  • 出版日期: 2022-09-01
  • 基金资助: 
    国家自然科学基金项目(61440012);河北省高等学校科学研究计划重点项目(ZD2019093);河北省科技支撑计划项目(16210312D);河北省研究生创新能力培养资助项目(CXZZSS2022081)

Deep Learning Based Data Race Detection Approach

Zhang Yang, Qiao Liu, Dong Chunhao, Gao Hongbin   

  1. (College of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018)
  • Online: 2022-09-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61440012), the Key Scientific Research Project of Hebei Education Department (ZD2019093), the Scientific Support Project of Hebei Province (16210312D), and the Innovative Ability Foundation for Graduates of Hebei Province (CXZZSS2022081).

摘要: 针对目前已有的基于深度学习的数据竞争检测方法提取特征单一和准确率低的问题,提出一种基于深度学习的数据竞争检测方法DeleRace,该方法首先利用程序静态分析工具WALA从多个实际应用程序中提取指令、方法和文件等多个级别的特征,对其向量化并构造训练样本数据;然后通过ConRacer工具对真实数据竞争进行判定进而标记样本数据,采用SMOTE增强算法使正负数据样本分布均衡化;最后构建并训练CNN-LSTM深度神经网络进行数据竞争检测.从DaCapo,JGF,IBM Contest,PJBench基准测试程序套件中分别选取26个不同应用领域的基准测试程序进行训练数据样本抽取和数据竞争检测,结果表明DeleRace的数据竞争检测准确率为96.79%,与目前已有的基于深度学习的检测方法DeepRace相比提升了4.65%.此外还将DeleRace与已有的动态数据竞争检测工具(Said和RVPredict)和静态数据竞争检测工具(SRD和ConRacer)进行比较,验证了DeleRace的有效性.

关键词: 数据竞争, 并发程序, 深度学习, 特征抽取, CNN-LSTM模型

Abstract: Existing approaches for deep-learning-based data race detection are suffering from the issues of single feature extraction and low accuracy. To improve the state-of-the-art, a novel approach called DeleRace is proposed to detect data race based on deep learning model. Firstly, DeleRace extracts instruction-level, method-level, and file-level features from a variety of real-world applications based on static analysis tool WALA. All these features are transformed by word vectorization to build the training dataset. Secondly, ConRacer, as an existing data race tool, is employed to identify the real race. Based on this tool, those positive samples in the training dataset is labelled. To further optimize the dataset, DeleRace leverages SMOTE algorithm to distribute both positive samples and negative ones in balance. Finally, CNN-LSTM model is constructed and a classifier is trained to detect data race. In the experimentation, a total of 26 real-world applications is selected from different fields in DaCapo, JGF, IBM Contest and PJBench benchmark suites. The experimental results show that the accuracy of DeleRace is 96.79% which is 4.65% higher than existing deep-learning-based approaches. Furthermore, the performance of DeleRace is compared with that of both dynamic tools (such as Said and RVPredict) and static tools (such as SRD and ConRacer), which demonstrates the effectiveness of DeleRace.

Key words: data race, concurrent program, deep learning, feature extraction, CNN-LSTM model

中图分类号: