高级检索

    融合实体外部知识的远程监督关系抽取方法

    Integrating External Entity Knowledge for Distantly Supervised Relation Extraction

    • 摘要: 远程监督关系抽取旨在从无结构化的文本当中发现关系事实,它对许多下游任务有着非常重要的意义.虽然远程监督可以自动地生成大量带标签的训练样本,但是自动标注的过程不可避免地会遇到噪声数据的问题.当前的许多研究工作主要把关注点放在降噪的过程当中,尝试通过选择出正确的句子来生成更有效的包级别特征表示.但是在文本语料之外,还存在着大量与实体相关的外部知识没有被充分利用,而这些知识能够帮助模型更好地理解实体之间的关系.基于这一观察,提出了一种新颖的远程监督关系抽取方法,该方法通过利用外部知识图谱当中的结构化知识和文本语料中的语义知识,设计了一种实体知识感知的词嵌入表示方法,来丰富句子级别的特征表达能力.实验结果表明,在2个版本的大规模“纽约时报”基准数据集上,该方法都明显优于其他方法.此外,还通过对比实验进一步探索了2个版本的数据集所存在的差异,其中无实体交集的数据集能够更有效地反映模型性能.

       

      Abstract: Distantly supervised relation extraction aims to find the relational facts from unstructured texts, which is meaningful for many downstream tasks. Although distant supervision can automatically generate labeled training instances, it inevitably suffers from the wrong label problem. Current works mostly focus on the denoising process, trying to generate a more effective bag-level representation by selecting valid sentences. Nevertheless, there is a large amount of entity knowledge that can help the model to understand the relationship between entities, and these kinds of knowledge have not been fully utilized. Based on this observation, in this paper, we propose a novel distantly supervised relation extraction approach that exploits external entity knowledge to enhance the model’s expressive ability. In the model, the knowledge-aware word embeddings are generated to enrich the sentence level representations by introducing both structure knowledge from external knowledge graphs and semantic knowledge from corpus. The experimental results demonstrate that our proposed approach outperforms state-of-the-art the methods on both versions of a large-scale benchmark New York Times dataset. Besides, the differences between the two versions of dataset are also investigated through further comparative experiments, in which the dataset with no entity intersection can move effectively reflect model performance.

       

    /

    返回文章
    返回