Abstract:
On distantly supervised named entity recognition (NER), there are many reinforcement learning based approaches, which exploit the powerful decision-making ability of reinforcement learning to detect noise from the automatically labeled data generated by distant supervision. However, the structures of the policy network models used are typically simple, which results in a weak ability to recognize noisy instances. Furthermore, correct instances are identified at sentence level, resulting in part of the useful information in the sentence being discarded. In this paper, we propose a new reinforcement learning based method for distantly supervised NER, named RLTL-DSNER, which can detect correct instances at token level from noisy data generated by distant supervision, proposing to reduce the negative impact of noisy instances on distantly supervised NER model. Specifically, we introduce a tag confidence function to identify correct instances accurately. In addition, we propose a novel pretraining strategy for the NER model. This strategy can provide accurate state representations and effective reward values for the initial training of the reinforcement learning model. The pre-training strategy can help guide it to update in the right direction. We conduct experiments on four datasets to verify the superiority of the RLTL-DSNER method, gaining 4.28%
F1 improvement on NEWS dataset over state-of-the-art methods.