基于多尺度边框融合的实体语义增强方法

吴灿; 陈艳平; 扈应; 黄瑞章; 秦永彬

doi:10.7544/issn1000-1239.202440665

基于多尺度边框融合的实体语义增强方法

Entity Semantic Enhancement Method via Multi-Scale Box Fusion

摘要

摘要: 命名实体识别（named entity recognition，NER）是自然语言处理中的一项传统任务. 基于跨度分类的方法是用来解决嵌套命名实体识别的主流方法. 该方法通常是拼接实体边界的表示来获得跨度. 然而，长实体容易导致2个实体边界之间的语义关联被弱化，并且单一尺度的跨度无法完整地捕捉实体在不同上下文中的表现. 对此提出了一种基于多尺度边框融合的实体语义增强方法. 该方法将跨度表示为带有边界位置信息的边框. 首先，将通过融合不同尺度实体特征得到多尺度边框以增强边框中的语义特征，使边框的上下文依赖性更强. 然后，通过基于位置权重的注意力机制进一步细化边框的边界位置使得边框信息更准确. 最后，同时预测边框的实体类别和相对于真实实体的位置偏移量，有效支持嵌套命名实体的识别和定位. 该方法在ACE2004英文数据集、ACE2005英文数据集和Weibo中文数据集上分别取得了88.63%，88.53%，73.86%的F1值，证明了模型的有效性.

Abstract: Named entity recognition (NER) is a traditional task in natural language processing. The mainstream approach for nested named entity recognition is the span-based classification method, which involves concatenating representations of entity boundaries to obtain spans in general. However, long entities tend to lead to weakened semantic associations between two entity boundaries. In addition, single-scale spans cannot completely capture how entities behave in different contexts. To address this, an entity semantic enhancement method based on multi-scale box fusion is proposed. The method represents spans as boxes with boundary position information. At first, multi-scale boxes will be obtained by fusing features of different scales to enhance the semantic features in the boxes and make the boxes more context-dependent. Then, the boundary positions of the boxes are further refined by a position-weighted attention mechanism to make the box information more accurate. Finally, the entity category of the boxes and the offset of the boxes relative to the true entities are simultaneously predicted to effectively support the recognition and localization of nested named entities. The method achieves a new state-of-the-art F1 scores of 88.63% on the ACE2004 English dataset, 88.53% on ACE2005 English dataset and 73.86% on Weibo Chinese dataset, which demonstrates the effectiveness of the model.

HTML全文

参考文献(26)

施引文献

资源附件(0)