ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (11): 2467-2477.doi: 10.7544/issn1000-1239.2020.20190747

Previous Articles    

Construction of Large-Scale Disease Terminology Graph with Common Terms

Zhang Chentong1, Zhang Jiaying1, Zhang Zhixing1, Ruan Tong1, He Ping2, Ge Xiaoling3    

  1. 1(East China University of Science and Technology, Shanghai 200237);2(Shanghai Hospital Development Center, Shanghai 200041);3(Children’s Hospital of Fudan University, Shanghai 201108)
  • Online:2020-11-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61772201) and the National Key Research and Development Program of China (2018YFC0910500).

Abstract: The National Health Planning Commission requires medical institutions to use the ICD (international classification of diseases) codes. However, due to the large amount of common terms in clinical disease descriptions, the direct matching rate between clinical diagnostic names in electronic medical records and ICD codes is low. Based on the real diagnostic data on the regional healthcare platform, this paper constructs a disease terminology graph fusing common terms. Specifically, this paper proposes a relationship recognition algorithm based on data enhancement which combines the rule algorithm based on the disease components and the pre-training BERT(bidirectional encoder representation from transformers) model. The proposed algorithm identifies synonymy and hypernymy between over 50 000 common terms and diseases in ICD10(international classification of diseases 10th revision,Chinese version), then further fuses the hierarchical structure of ICD11(international classification of diseases 11th revision,Chinese version). Moreover, this paper also proposes a task allocation algorithm based on the disease-department association graph to perform manual verification. Finally, a large-scale disease terminology graph including 1 460 synonyms and 46 508 hypernymy can be formed by 94 478 disease entities. The evaluation experiments show that the coverage of clinical diagnostic data based on disease terminology graph is 75.31% higher than direct mapping based on ICD10. In addition, compared with manual coding by doctors, the automatic coding using disease terminology graph can shorten 59.75% of the encoding time, and the accuracy rate is 85%.

Key words: common terms, disease terminology graph, ICD(international classification of diseases), relationship recognition, verification

CLC Number: