Abstract:
Named Entity Recognition (NER) is a traditional task in natural language processing. The mainstream approach for nested Named Entity Recognition is the span-based classification method, which involves concatenating representations of entity boundaries to obtain spans in general. However, long entities tend to lead to weakened semantic associations between two entity boundaries. In addition, single-scale spans cannot completely capture how entities behave in different contexts. To address this, an entity semantic enhancement method based on multi-scale box fusion is proposed. The method represents spans as boxes with boundary position information. At first, multi-scale boxes will be obtained by fusing features of different scales to enhance the semantic features in the boxes and make the boxes more context-dependent. Then, the boundary positions of the boxes are further refined by a position-weighted attention mechanism to make the box information more accurate. Finally, the entity category of the boxes and the offset of the boxes relative to the true entities are simultaneously predicted to effectively support the recognition and localization of nested named entities. The method achieves a new state-of-the-art F1 scores of 88.63% on the ACE04 English, 88.53% on ACE05 English and 73.86% on Weibo Chinese datasets, which demonstrates the effectiveness of the model.