高级检索

    基于粗糙集知识发现的开放领域中文问答检索

    Rough Set Knowledge Discovery Based Open Domain Chinese Question Answering Retrieval

    • 摘要: 基于信息检索的开放领域问答系统,其主要原理是先使用语义分析工具和知识库获得确定性的语义和知识等信息,然后再进行问答句匹配度计算.但在实际的中文问答系统应用中,由于中文语言表达的不确定性和中文知识表达的不确定性大量存在,现有的匹配度计算方法不适合大量不确定性存在的应用场景.针对这一问题,提出了一种基于粗糙集知识发现的中文问答检索方法,利用粗糙集的属性约简方法和上近似概念从已标注的问答语料库中发现并表示知识,再结合传统的句子相似度方法对问句和候选句进行匹配度计算.实验结果表明:相对传统的问答检索方法,该方法在MAP和MRR两个评测指标上均有提升.

       

      Abstract: In the information retrieval (IR) based open domain question answering system (QA system), the main principle is that first use the semantic tools and knowledgebase to get the semantic and knowledge information, then calculate the matching value of both semantic and knowledge. However, in some practical applications of Chinese question answering, because of the uncertainty of both the Chinese language representation and the Chinese knowledge representation, the current methods are not very effective. To solve this problem, a rough set knowledge discovery based Chinese question answering method is proposed in this paper. It uses the method of rough set equivalence partitioning to represent the rough set knowledge of the QA pairs, then uses the idea of attribute reduction to mine out the upper approximation representations of all the knowledge items. Based on the rough set QA knowledgebase, the knowledge match value of a QA pair can be calculated as a kind of knowledge item similarity. After all the knowledge similarities of one question and its answer candidates are given, the final matching values which combines rough set knowledge similarity with traditional sentence similarity can be used to rank the answer candidates. The experiment shows that the proposed method can improve the MAP and MRR compared with the baseline information retrieval methods.

       

    /

    返回文章
    返回