ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (2): 284-302.doi: 10.7544/issn1000-1239.2016.20150842

所属专题: 2016数据融合与知识融合专题

• 软件技术 • 上一篇    下一篇

基于句法语义特征的中文实体关系抽取

甘丽新,万常选,刘德喜,钟青,江腾蛟   

  1. (江西财经大学信息管理学院 南昌 330013) (数据与知识工程江西省高校重点实验室(江西财经大学) 南昌 330013) (spiderganxin@163.com)
  • 出版日期: 2016-02-01
  • 基金资助: 
    国家自然科学基金项目(61173146,61562032,61363039,61363010,61462037);江西省高等学校科技落地计划项目(KJLD12022);江西省教育厅科技研究项目(GJJ12733,GJJ13249)

Chinese Named Entity Relation Extraction Based on Syntactic and Semantic Features

Gan Lixin, Wan Changxuan, Liu Dexi, Zhong Qing, Jiang Tengjiao   

  1. (School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330013) (Jiangxi Key Laboratory of Data and Knowledge Engineering (Jiangxi University of Finance and Economics), Nanchang 330013)
  • Online: 2016-02-01

摘要: 作为语义网络和本体的基础,实体关系抽取已被广泛应用于信息检索、机器翻译和自动问答系统中.实体关系抽取的核心问题在于实体关系特征的选择和提取.中文长句的句式较复杂,经常包含多个实体的特点以及数据稀疏问题,给中文关系探测和关系抽取任务带了挑战.为了解决上述问题,提出了一种基于句法语义特征的实体关系抽取方法.通过将2个实体各自的依存句法关系进行组合,获取依存句法关系组合特征,利用依存句法分析和词性标注选择最近句法依赖动词特征.将这2个新特征加入到基于特征的关系探测和关系抽取中,使用支持向量机(support vector machine, SVM)方法,以真实旅游领域文本作为语料进行实验.实验表明,从句法和语义上提取的2个特征能够有效地提高实体关系探测和关系抽取的性能,其准确率、召回率和F1值均优于已有方法.此外,最近句法依赖动词特征非常有效,尤其对数据稀疏的关系类型贡献最大,在关系探测和关系抽取上的性能均优于当前经典的基于动词特征方法.

关键词: 关系抽取, 关系探测, 句法特征, 语义特征, 支持向量机

Abstract: Named entity relations are a foundation of semantic networks and ontology, and are widely used in information retrieval and machine translation, as well as automatic question and answering systems. In named entity relationships, relationship feature selection and extraction are two key issues. Characteristics of Chinese long sentences with complicated sentence patterns and many entities, as well as the data sparse problem, bring challenges for Chinese entity relationship detection and extraction tasks. To deal with above problems, a novel method based on syntactic and semantic features is proposed. The feature of dependency relation composition is obtained through the combination of their respective dependency relations between two entities. And the verb feature with the nearest syntactic dependency is captured from dependency relation and POS (part of speech). The above features are incorporated into feature-based relationship detection and extraction using SVM. Evaluation on a real text corpus in tourist domain shows above two features from syntactic and semantic aspects can effectively improve the performance of entity relationship detection and extraction, and outperform previously best-reported systems in terms of precision, recall and F1 value. In addition, the verb feature with nearest syntactic dependency achieves high effectiveness for relationship detection and extraction, especially obtaining the most prominent contribution to the performance improvement of data sparse entity relationships, and significantly outperforms the state-of-the-art based on the verb feature.

Key words: relationship extraction, relationship detection, syntactic feature, semantic feature, support vector machine (SVM)

中图分类号: