Abstract:
drug reactions between chemicals and diseases make the topic of chemical-disease relations (CDRs) become a focus that receives much concern. And automatic extraction of chemical-induced disease (CID) relations from the biomedical literature can be used to support biocuration, new drug discovery and drug safety surveillance. In this paper, we present a CID relation extraction system, called CDRExtractor, to extract CID relations from biomedical literature at both sentence and document levels. To extract the CID relations located in the same sentence, we first manually annotate a sentence-level training set which is used to train the sentence-level classifier. And to improve the performances of the classifier, Co-training algorithm is used to exploit the unlabeled data with the feature kernel and graph kernel as two independent views. Then CDRExtractor uses a document-level classifier to extract the span sentence CID relations. The classifier utilizes the document level information (features) of the chemical and disease pair, and then returns the CID relations at the document level. Finally, the post-processing rules are applied to the union set of two classifiers and generate the final outputs. Experimental results show that CDRExtractor achieves an F-score of 67.72% on the test set of the BioCreative V CDR CID subtask.