ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (1): 198-206.doi: 10.7544/issn1000-1239.2018.20160893

Previous Articles     Next Articles

Chemical-Induced Disease Relation Extraction Based on Biomedical Literature

Li Zhiheng1, Gui Yingyi2, Yang Zhihao1, Lin Hongfei1, Wang Jian1   

  1. 1(School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024);2(School of Optoelectronics, Beijing Institute of Technology, Beijing 100081)
  • Online:2018-01-01

Abstract: drug reactions between chemicals and diseases make the topic of chemical-disease relations (CDRs) become a focus that receives much concern. And automatic extraction of chemical-induced disease (CID) relations from the biomedical literature can be used to support biocuration, new drug discovery and drug safety surveillance. In this paper, we present a CID relation extraction system, called CDRExtractor, to extract CID relations from biomedical literature at both sentence and document levels. To extract the CID relations located in the same sentence, we first manually annotate a sentence-level training set which is used to train the sentence-level classifier. And to improve the performances of the classifier, Co-training algorithm is used to exploit the unlabeled data with the feature kernel and graph kernel as two independent views. Then CDRExtractor uses a document-level classifier to extract the span sentence CID relations. The classifier utilizes the document level information (features) of the chemical and disease pair, and then returns the CID relations at the document level. Finally, the post-processing rules are applied to the union set of two classifiers and generate the final outputs. Experimental results show that CDRExtractor achieves an F-score of 67.72% on the test set of the BioCreative V CDR CID subtask.

Key words: information extraction, text mining, semi-supervised learning, Co-training, chemical-disease relation (CDR)

CLC Number: