高级检索

    结合全局特征的命名实体属性值抽取

    Extracting Attribute Values for Named Entities Based on Global Feature

    • 摘要: 关注非结构化文本中命名实体属性值的抽取问题.当前主流有监督属性值抽取方法仅使用局部特征,抽取效果有限,开展了利用文本全局特征改善属性值抽取的研究.通过适用于中文属性值抽取的全局特征,用局部特征以外的有价值信息提高抽取效果.据此,提出结合全局特征的感知机学习算法,该算法能够方便地融合文本全局特征,并将全局特征和局部特征统一结合到模型学习过程中,使模型具有更好的特征表示能力.实验结果表明,所提出方法的整体抽取效果高于仅使用局部特征的CRF模型和平均感知机模型.该方法适用于开放领域的属性值获取,具有较好的泛化能力.

       

      Abstract: Attribute-value extraction is an important and challenging task in information extraction, which aims to automatically discover the values of attributes of named entities. In this paper, we focus on extracting these values from Chinese unstructured text. In order to make models easy to compute, current major methods of attribute-value extraction use only local feature. As a result, it may not make full use of global information related to attribute values. We propose a novel approach based on global feature to enhance the performance of attribute-value extraction. Two types of global feature are defined to capture the extra information beyond local feature, which are boundary distribution feature and value-name dependency feature. To our knowledge, this is the first attempt to acquire attribute values utilizing global feature. Then a new perceptron algorithm is proposed that can use all types of global feature. The proposed algorithm can learn the parameters of local feature and global feature simultaneously. Experiments are carried out on different kinds of attributes of some entity categories. Experimental results show that both precision and recall of our proposed approach are significantly higher than CRF model and averaged perceptron with only local feature. The proposed approach has a good generalization capability on open-domain.

       

    /

    返回文章
    返回