ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (7): 1539-1547.doi: 10.7544/issn1000-1239.2018.20170507

• 信息处理 • 上一篇    下一篇

基于关键信息的问题相似度计算

齐乐,张宇,刘挺   

  1. (哈尔滨工业大学社会计算与信息检索研究中心 哈尔滨 150001) (lqi@ir.hit.edu.cn)
  • 出版日期: 2018-07-01
  • 基金资助: 
    国家“九七三”重点基础研究发展计划基金项目(2014CB340503);国家自然科学基金项目(61472105,61502120)

Question Similarity Calculation Based on Key Information

Qi Le, Zhang Yu, Liu Ting   

  1. (Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin 150001)
  • Online: 2018-07-01

摘要: 判断问题相似是社区问答(community question answer, CQA)中很重要的一个研究方向.社区问答中的问题通常由主题和描述构成.由于社区问答的开放性,用户的提问长短不一,而问题中会包含大量干扰模型判断问题是否相似的背景信息.为了减少上述问题对计算问题相似度的影响,模型将关键词及问题主题视为问题的关键信息,并使用这些信息计算问题相似度.首先,在基于文本间相似及相异信息的CNN模型的基础上引入了关键词抽取技术.同时,为了更好地利用问题主题的信息,模型融合了问题主题相似度的特征.模型在SemEval2017评测的问题相似任务中进行了实验,其平均精度均值(mean average precision, MAP)达到了49.65%,超过了评测中的最佳结果.

关键词: 问题相似, 社区问答, 关键词, 问题主题, 卷积神经网络

Abstract: Question similarity calculation is a major task in community question answering (CQA). It is helpful to retrieve relevant question-answer pairs from QA community by leveraging the similarity among queries. Questions in CQA are usually composed of topics and descriptions, which are both important in the task of question similarity calculation. In addition, due to the openness of the CQA, the length of questions is different. Meanwhile, background information of the questions will interfere with the judgment on question similarity. In order to reduce the influence of the irrelevant content and the various length of questions, the keywords are extracted from the descriptions of questions. Then, the keywords are fed to the CNN-based model to extract the similar and dissimilar information between texts. At the same time, in order to make better use of the information about the question topic, the model also combines the feature from the similarity between them. In summary, the model treats the keywords and topics as the key information about the questions, and uses the information to calculate similarity between them. The model is experimented on the question similarity task of the SemEval2017. The mean average precision (MAP) reaches 49.65%, which exceeds the best result in the evaluation.

Key words: question similarity, community question answering (CQA), keywords, question topic, convolutional neural network (CNN)

中图分类号: