Advanced Search
    Yang Pei, Yang Zhihao, Luo Ling, Lin Hongfei, Wang Jian. An Attention-Based Approach for Chemical Compound and Drug Named Entity Recognition[J]. Journal of Computer Research and Development, 2018, 55(7): 1548-1556. DOI: 10.7544/issn1000-1239.2018.20170506
    Citation: Yang Pei, Yang Zhihao, Luo Ling, Lin Hongfei, Wang Jian. An Attention-Based Approach for Chemical Compound and Drug Named Entity Recognition[J]. Journal of Computer Research and Development, 2018, 55(7): 1548-1556. DOI: 10.7544/issn1000-1239.2018.20170506

    An Attention-Based Approach for Chemical Compound and Drug Named Entity Recognition

    • Recognizing chemical compound and drug name from unstructured data in the field of biomedical text mining is of great significance. The current popular approaches are based on CRF model which needs large amounts of hand-crafted features, and these approaches inevitably have the tagging non-consistency problem (the same mentions in a document are tagged different labels). In this paper, we propose an attention-based BiLSTM-CRF architecture to mitigate these aforementioned drawbacks. First, word embedding is obtained from vast amounts of unlabeled biomedical text. Then the characters of current word are fed to a BiLSTM layer to learn the character representation of this word. After this, word and character representations are transformed to another BiLSTM layer and the current adjacency context representation of this word is generated. Then we use attention mechanism to obtain the current word’s context at document level on the basis of the adjacency context of all words in this document and the current word. At last, a CRF layer is used to predict the label sequence of this document according to the integration of the current adjacency context and the document-level context. Experimental results show that our method improves the consistency of mention’s label in the same document, and it can also achieve better performance (an F-score of 90.77%) than the state-of-the-art methods on the BioCreative IV CHEMDNER corpus.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return