ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (7): 1432-1440.doi: 10.7544/issn1000-1239.2019.20180714

• 人工智能 • 上一篇    下一篇

基于神经网络的药物实体与关系联合抽取

曹明宇,杨志豪,罗凌,林鸿飞,王健   

  1. (大连理工大学计算机科学与技术学院 辽宁大连 116024) (caomingyu1997@mail.dlut.edu.cn)
  • 出版日期: 2019-07-01
  • 基金资助: 
    国家重点研发计划项目(2016YFC0901902);国家自然科学基金项目(61272373,61340020, 61572102);教育部新世纪优秀人才支持计划项目(NCET-13-0084)

Joint Drug Entities and Relations Extraction Based on Neural Networks

Cao Mingyu, Yang Zhihao, Luo Ling, Lin Hongfei, Wang Jian   

  1. (School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024)
  • Online: 2019-07-01

摘要: 药物实体及关系抽取研究对于生物医学研究具有重要的促进作用,也是进一步构建生物医学知识库的基础.现存方法主要采用流水线方式,即先对文本进行实体识别后再对实体对进行关系分类.流水线方法主要存在任务错误传播、未能考虑2个子任务的相互影响和句子中不同关系的相互影响的问题.针对这些问题,提出了一种基于神经网络的药物实体与关系联合抽取方法.使用了一种新标注模式,将药物实体及关系的联合抽取转化为端对端的序列标注任务.使用词向量和字符向量作为词表示输入,使用BiLSTM-CRF模型进行药物实体与关系联合抽取.实验结果表明:在药物-药物交互作用(drug-drug interactions, DDI)2013语料集上,取得了89.9%的实体识别F-score及67.3%的关系抽取F-score,优于使用相同模型的流水线方法.

关键词: 联合抽取, 标注模式, 药物-药物交互, 关系抽取, 实体识别

Abstract: Drug entities and relations extraction can accelerate biomedical research, and they are also the basis for further building a biomedical knowledge base and other researches. Traditionally, the pipeline method was used to tackle this problem. This method identifies entities in the paper by NER (named entity recognition) firstly, and then handles RC (relation classification) on each entity pair. The pipline method has three problems. The first is error propagation problem. In detail, the wrong NER results will lead to the wrong relation classification results. The remaining two problems are that it ignores the interaction between two subtasks and the interaction between different relations in the sentence. Considering these problems, this article proposes a joint drug entities and relations extraction method based on neural networks. This method employs a new tagging scheme which represents both entity and relation information by the tags and converts the joint extraction task to a tagging problem. This method applies word embedding and character embedding as input, and extracts drug entities and relations with BiLSTM-CRF model. The results shows that, on DDI (drug-drug interactions) 2013 corpus, this method achieves 89.9% F-score for NER and 67.3% F-score for RE (relations extraction) which is better than the pipeline method using the same model.

Key words: joint extraction, tagging scheme, drug-drug interactions (DDI), relation extraction, entity recognition

中图分类号: