ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (9): 1980-1992.doi: 10.7544/issn1000-1239.20210078

• 人工智能 • 上一篇    下一篇

一种面向实体关系联合抽取中缓解曝光偏差的方法

王震1,范红杰2,柳军飞3   

  1. 1(北京大学软件与微电子学院 北京 100871);2(中国政法大学科学技术教学部 北京 102249);3(北京大学软件工程国家工程研究中心 北京 100871) (wang.zh@pku.edu.cn)
  • 出版日期: 2022-09-01
  • 基金资助: 
    中国政法大学科研创新项目(21FQ41001);中央高校基本科研业务费专项资金

An Alleviate Exposure Bias Method in Joint Extraction of Entities and Relations

Wang Zhen1, Fan Hongjie2, Liu Junfei3   

  1. 1(School of Software and Microelectronics, Peking University, Beijing 100871);2(Department of Science and Technology, China University of Political Science and Law, Beijing 102249);3(National Engineering Research Center for Software Engineering, Peking University, Beijing 100871)
  • Online: 2022-09-01
  • Supported by: 
    This work was supported by the Research Innovation Project of China University of Political Science and Law (21FQ41001) and the Fundamental Research Funds for the Central Universities.

摘要: 实体关系联合抽取的目的是从非结构化文本中同时提取实体提及和关系事实,是知识图构建的关键步骤,也是许多自然语言处理中高级任务的基础.现有工作大都采用了分阶段的联合抽取方法来处理文本中同时存在的多个三元组和实体重叠情况下的三元组抽取问题,虽然取得了合理的性能提升,但都存在严重的曝光偏差问题.对此,提出了一种名为融合关系表达向量(fusional relation expression embedding, FREE)的新方法,通过融合关系表达向量来有效缓解曝光偏差问题.此外,提出了一种称为条件层规范化层的新特征融合层来更有效地融合先验信息.在2个广泛使用的数据集上进行了大量对比实验,结果表明该方法相较于当前最先进的基线方法具有显著优势,可以更有效地处理各种情况,并在不牺牲效率的前提下取得了与当前针对曝光偏差问题的先进方法相当的性能.

关键词: 联合抽取, 曝光偏差, 实体重叠三元组, 融合关系表达向量, 特征融合

Abstract: Joint extraction of entities and relations aims to discover entity mentions and relational facts simultaneously from unstructured texts, which is a critical step in knowledge graph construction, and serves as a basis of many high-level tasks in natural language processing. The joint extraction model gets more widespread attention as they can model the correlation between entity recognition and relation extraction more effectively. Most of the existing work uses a phased joint extraction method to deal with the problem of triple extraction in the text where there are multiple triples and entities overlapping at the same time, although reasonable performance improvement has been achieved, there are serious exposure bias problems. In this paper, we propose a novel method called fusional relation expression embedding (FREE) to tackle the exposure bias problem by fusing relation expression information. Besides, a novel feature fusion layer called conditional layer normalization is proposed to fuse prior information more effectively. We conduct a lot of comparative experiments on two widely used data sets. The in-depth analysis of the experimental results shows that the proposed method has significant advantages over the current state-of-the-art baseline model, and it can deal with various situations more effectively and achieve the competitive performance as the current advanced model for exposure bias problems without sacrificing efficiency.

Key words: joint extraction, exposure bias, entity overlapped triplet, fusional relation expression embedding, feature fusion

中图分类号: