ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

### 基于生物医学文献的化学物质致病关系抽取

1. 1(大连理工大学计算机科学与技术学院 辽宁大连 116024);2(北京理工大学光电学院 北京 100081) (zhihengli@mail.dlut.edu.cn)
• 出版日期: 2018-01-01
• 基金资助:
国家自然科学基金项目(61272373，61340020，61572102，61572098)；新世纪优秀人才支持计划基金项目(NCET-13-0084)；中央高校基本科研业务费专项资金项目(DUT14YQ213)

### Chemical-Induced Disease Relation Extraction Based on Biomedical Literature

Li Zhiheng1, Gui Yingyi2, Yang Zhihao1, Lin Hongfei1, Wang Jian1

1. 1(School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024);2(School of Optoelectronics, Beijing Institute of Technology, Beijing 100081)
• Online: 2018-01-01

Abstract: drug reactions between chemicals and diseases make the topic of chemical-disease relations (CDRs) become a focus that receives much concern. And automatic extraction of chemical-induced disease (CID) relations from the biomedical literature can be used to support biocuration, new drug discovery and drug safety surveillance. In this paper, we present a CID relation extraction system, called CDRExtractor, to extract CID relations from biomedical literature at both sentence and document levels. To extract the CID relations located in the same sentence, we first manually annotate a sentence-level training set which is used to train the sentence-level classifier. And to improve the performances of the classifier, Co-training algorithm is used to exploit the unlabeled data with the feature kernel and graph kernel as two independent views. Then CDRExtractor uses a document-level classifier to extract the span sentence CID relations. The classifier utilizes the document level information (features) of the chemical and disease pair, and then returns the CID relations at the document level. Finally, the post-processing rules are applied to the union set of two classifiers and generate the final outputs. Experimental results show that CDRExtractor achieves an F-score of 67.72% on the test set of the BioCreative V CDR CID subtask.