ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (3): 513-527.doi: 10.7544/issn1000-1239.2021.20200402

• 人工智能 • 上一篇    下一篇

电子病历文本挖掘研究综述

吴宗友1,白昆龙2,3,4,杨林蕊3,4,5,王仪琦2,3,4,田英杰1   

  1. 1(中国科学院大学经济与管理学院 北京 100049);2(中国科学院大学计算机与科学技术学院 北京 100049);3(中国科学院虚拟经济与数据科学研究中心(中国科学院大学) 北京 100190);4(中国科学院大数据挖掘与知识管理重点实验室(中国科学院大学) 北京 100190);5(中国科学院大学中丹学院 北京 100049) (bossbit@126.com)
  • 出版日期: 2021-03-01
  • 基金资助: 
    国家自然科学基金项目(71731009, 61472390);中国科学院科技服务网络计划项目(KFJ-STS-ZDTP-060)

Review on Text Mining of Electronic Medical Record

Wu Zongyou1, Bai Kunlong2,3,4, Yang Linrui3,4,5, Wang Yiqi2,3,4, Tian Yingjie1   

  1. 1(School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100049);2(School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049);3(Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences (University of Chinese Academy of Sciences), Beijing 100190);4(Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences (University of Chinese Academy of Sciences), Beijing 100190);5(Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2021-03-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (71731009, 61472390) and the Science and Technology Service Network Program of Chinese Academy of Sciences (KFJ-STS-ZDTP-060).

摘要: 电子病历是医院信息化发展的产物, 其中包含了丰富的医疗信息和临床知识, 是辅助临床决策和药物挖掘等的重要资源.因此, 如何高效地挖掘大量电子病历数据中的信息是一个重要的研究课题.近些年来, 随着计算机技术尤其是机器学习以及深度学习的蓬勃发展, 对电子病历这一特殊领域数据的挖掘有了更高的要求.电子病历综述旨在通过对电子病历研究现状的分析来指导未来电子病历文本挖掘领域的发展.具体而言, 综述首先介绍了电子病历数据的特点和电子病历的数据预处理的常用方法; 然后总结了电子病历数据挖掘的4个典型任务(医学命名实体识别、关系抽取、文本分类和智能问诊), 并且围绕典型任务介绍了常用的基本模型以及研究人员在任务上的部分探索; 最后结合糖尿病和心脑血管疾病2类特定疾病, 对电子病历的现有应用场景做了简单介绍.

关键词: 电子病历, 自然语言处理, 数据挖掘, 机器学习, 深度学习

Abstract: Electronic medical records (EMR), produced with the development of hospital informa-tionization and contained rich medical information and clinical knowledge, play important roles in guiding and assisting clinical decision-making and drug mining. Therefore, how to efficiently mine important information in a large amount of electronic medical records is an essential research topic. In recent years, with the vigorous development of computer technology, especially machine learning and deep learning, data mining in the special field of electronic medical records have been raised to a new height. This review aims to guide future development in the field of electronic medical record text mining by analyzing the current status of electronic medical record research. Specifically, this paper begins with an introduction to the characteristics of electronic medical record data and introduces how to preprocess electronic medical record data; then four typical tasks around electronic medical record data mining (medical named entity recognition, relationship extraction, text classification and smart interview) introduce popular model methods; finally, from the perspective of the application of electronic medical record data mining in characteristic diseases, two specific diseases of diabetes and cardio-cerebrovascular diseases are combined and a brief introduction to the existing application scenarios of electronic medical records is given.

Key words: electronic medical records, natural language processing, data mining, machine learning, deep learning

中图分类号: