Noise Detection for Distant Supervised Named Entity Recognition

Wang Jiacheng; Wang Kai; Wang Haofen; Du Wen; He Zhidong; Ruan Tong; Liu Jingping

doi:10.7544/issn1000-1239.202220999

Wang Jiacheng, Wang Kai, Wang Haofen, Du Wen, He Zhidong, Ruan Tong, Liu Jingping. Noise Detection for Distant Supervised Named Entity RecognitionJ. Journal of Computer Research and Development, 2024, 61(4): 916-928. DOI: 10.7544/issn1000-1239.202220999

Citation:

Noise Detection for Distant Supervised Named Entity Recognition

Graphical Abstract

Graphical Abstract

Abstract

Abstract

On distantly supervised named entity recognition (NER), there are many reinforcement learning based approaches, which exploit the powerful decision-making ability of reinforcement learning to detect noise from the automatically labeled data generated by distant supervision. However, the structures of the policy network models used are typically simple, which results in a weak ability to recognize noisy instances. Furthermore, correct instances are identified at sentence level, resulting in part of the useful information in the sentence being discarded. In this paper, we propose a new reinforcement learning based method for distantly supervised NER, named RLTL-DSNER, which can detect correct instances at token level from noisy data generated by distant supervision, proposing to reduce the negative impact of noisy instances on distantly supervised NER model. Specifically, we introduce a tag confidence function to identify correct instances accurately. In addition, we propose a novel pretraining strategy for the NER model. This strategy can provide accurate state representations and effective reward values for the initial training of the reinforcement learning model. The pre-training strategy can help guide it to update in the right direction. We conduct experiments on four datasets to verify the superiority of the RLTL-DSNER method, gaining 4.28% F1 improvement on NEWS dataset over state-of-the-art methods.

FullText(HTML)

References (38)

Cited By

Turn off MathJax

Article Contents

Noise Detection for Distant Supervised Named Entity Recognition

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content