基于神经网络的药物实体与关系联合抽取

曹明宇; 杨志豪; 罗凌; 林鸿飞; 王健

doi:10.7544/issn1000-1239.2019.20180714

基于神经网络的药物实体与关系联合抽取

Joint Drug Entities and Relations Extraction Based on Neural Networks

摘要

摘要: 药物实体及关系抽取研究对于生物医学研究具有重要的促进作用,也是进一步构建生物医学知识库的基础.现存方法主要采用流水线方式,即先对文本进行实体识别后再对实体对进行关系分类.流水线方法主要存在任务错误传播、未能考虑2个子任务的相互影响和句子中不同关系的相互影响的问题.针对这些问题,提出了一种基于神经网络的药物实体与关系联合抽取方法.使用了一种新标注模式,将药物实体及关系的联合抽取转化为端对端的序列标注任务.使用词向量和字符向量作为词表示输入,使用BiLSTM-CRF模型进行药物实体与关系联合抽取.实验结果表明:在药物-药物交互作用(drug-drug interactions, DDI)2013语料集上,取得了89.9%的实体识别F-score及67.3%的关系抽取F-score,优于使用相同模型的流水线方法.

Abstract: Drug entities and relations extraction can accelerate biomedical research, and they are also the basis for further building a biomedical knowledge base and other researches. Traditionally, the pipeline method was used to tackle this problem. This method identifies entities in the paper by NER (named entity recognition) firstly, and then handles RC (relation classification) on each entity pair. The pipline method has three problems. The first is error propagation problem. In detail, the wrong NER results will lead to the wrong relation classification results. The remaining two problems are that it ignores the interaction between two subtasks and the interaction between different relations in the sentence. Considering these problems, this article proposes a joint drug entities and relations extraction method based on neural networks. This method employs a new tagging scheme which represents both entity and relation information by the tags and converts the joint extraction task to a tagging problem. This method applies word embedding and character embedding as input, and extracts drug entities and relations with BiLSTM-CRF model. The results shows that, on DDI (drug-drug interactions) 2013 corpus, this method achieves 89.9% F-score for NER and 67.3% F-score for RE (relations extraction) which is better than the pipeline method using the same model.

HTML全文

参考文献(0)

施引文献

资源附件(0)