ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (5): 958-967.doi: 10.7544/issn1000-1239.2018.20170232

• 人工智能 • 上一篇    下一篇

基于粗糙集知识发现的开放领域中文问答检索

韩朝1,2,3,苗夺谦1,2,任福继3,张红云1,2   

  1. 1(同济大学电子与信息工程学院 上海 201804); 2(嵌入式系统与服务计算教育部重点实验室(同济大学) 上海 201804); 3(德岛大学工学部 日本德岛 7708506) (1990hanzhao@tongji.edu.cn)
  • 出版日期: 2018-05-01
  • 基金资助: 
    国家自然科学基金项目(61673301,61273304,61573255);高等学校博士学科点专项基金项目(20130072130004);安徽省高校优秀青年人才基金项目(gxyq2017056)

Rough Set Knowledge Discovery Based Open Domain Chinese Question Answering Retrieval

Han Zhao1,2,3, Miao Duoqian1,2, Ren Fuji3,Zhang Hongyun1,2   

  1. 1(College of Electronic and Information Engineering, Tongji University, Shanghai 201804); 2(Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai 201804); 3(The Faculty of Engineering, Tokushima University, Tokushima, Japan 7708506)
  • Online: 2018-05-01

摘要: 基于信息检索的开放领域问答系统,其主要原理是先使用语义分析工具和知识库获得确定性的语义和知识等信息,然后再进行问答句匹配度计算.但在实际的中文问答系统应用中,由于中文语言表达的不确定性和中文知识表达的不确定性大量存在,现有的匹配度计算方法不适合大量不确定性存在的应用场景.针对这一问题,提出了一种基于粗糙集知识发现的中文问答检索方法,利用粗糙集的属性约简方法和上近似概念从已标注的问答语料库中发现并表示知识,再结合传统的句子相似度方法对问句和候选句进行匹配度计算.实验结果表明:相对传统的问答检索方法,该方法在MAP和MRR两个评测指标上均有提升.

关键词: 问答系统, 信息检索, 粗糙集, 知识发现, 文本挖掘

Abstract: In the information retrieval (IR) based open domain question answering system (QA system), the main principle is that first use the semantic tools and knowledgebase to get the semantic and knowledge information, then calculate the matching value of both semantic and knowledge. However, in some practical applications of Chinese question answering, because of the uncertainty of both the Chinese language representation and the Chinese knowledge representation, the current methods are not very effective. To solve this problem, a rough set knowledge discovery based Chinese question answering method is proposed in this paper. It uses the method of rough set equivalence partitioning to represent the rough set knowledge of the QA pairs, then uses the idea of attribute reduction to mine out the upper approximation representations of all the knowledge items. Based on the rough set QA knowledgebase, the knowledge match value of a QA pair can be calculated as a kind of knowledge item similarity. After all the knowledge similarities of one question and its answer candidates are given, the final matching values which combines rough set knowledge similarity with traditional sentence similarity can be used to rank the answer candidates. The experiment shows that the proposed method can improve the MAP and MRR compared with the baseline information retrieval methods.

Key words: question answering (QA) system, information retrieval (IR), rough set, knowledge discovery, text mining

中图分类号: