ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (9): 1851-1858.doi: 10.7544/issn1000-1239.2019.20180733

Previous Articles     Next Articles

Domain Named Entity Recognition Combining GAN and BiLSTM-Attention-CRF

Zhang Han1,2, Guo Yuanbo1, Li Tao1   

  1. 1(Department of Cryptogram Engineering, Strategic Support Force Information Engineering University, Zhengzhou 450001); 2(Software College, Zhengzhou University, Zhengzhou 450001)
  • Online:2019-09-10
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61501515), the Key Scientific and Technological Research Project of Henan Province (172102210002), and the Young Scholar Teachers Project of Zhengzhou University (2017ZDGGJS048).

Abstract: Domain named entity recognition usually faces the lack of domain annotation data and the inconsistency of entity annotation in the same document due to the diversity of entity names in the domain. To issue the above problems, our work draws on the combination of the generative adversarial network (GAN) which can generate data and the BiLSTM-Attention-CRF model. Firstly, BiLSTM-Attention is used as the generator model of GAN, and CNN is used as the discriminant model. The two models use the crowd annotations and the expert annotations to train respectively, and integrate the positive annotation data consistent with the expert annotation data distribution from the crowd annotations to solve the problem of lack of annotation data in the domain; then we also introduce a new method to obtain the new feature representation of each word in the document through introducing a document-level global feature in the BiLSTM-Attention-CRF model in order to solve the problem of inconsistency of the entity in the same document caused by the diversification of the entity name. Finally, taking the crowd annotations in the information security field as a sample, a comprehensive horizontal evaluation experiment is carried out by learning the common features and applying them to the training BiLSTM-Attention-CRF model for the identification of named entities in the information security field. The results show that compared with the existing models and methods, the model we proposed has made great progress on various indicators, reflecting its superiority.

Key words: domain named entity recognition, generative adversarial network (GAN), crowd annotations, entity annotations consistent, BiLSTM-Attention-CRF model

CLC Number: