基于关键信息的问题相似度计算

齐乐; 张宇; 刘挺

doi:10.7544/issn1000-1239.2018.20170507

基于关键信息的问题相似度计算

Question Similarity Calculation Based on Key Information

摘要

摘要: 判断问题相似是社区问答(community question answer， CQA)中很重要的一个研究方向.社区问答中的问题通常由主题和描述构成.由于社区问答的开放性，用户的提问长短不一，而问题中会包含大量干扰模型判断问题是否相似的背景信息.为了减少上述问题对计算问题相似度的影响，模型将关键词及问题主题视为问题的关键信息，并使用这些信息计算问题相似度.首先，在基于文本间相似及相异信息的CNN模型的基础上引入了关键词抽取技术.同时，为了更好地利用问题主题的信息，模型融合了问题主题相似度的特征.模型在SemEval2017评测的问题相似任务中进行了实验，其平均精度均值(mean average precision， MAP)达到了49.65%，超过了评测中的最佳结果.

Abstract: Question similarity calculation is a major task in community question answering (CQA). It is helpful to retrieve relevant question-answer pairs from QA community by leveraging the similarity among queries. Questions in CQA are usually composed of topics and descriptions, which are both important in the task of question similarity calculation. In addition, due to the openness of the CQA, the length of questions is different. Meanwhile, background information of the questions will interfere with the judgment on question similarity. In order to reduce the influence of the irrelevant content and the various length of questions, the keywords are extracted from the descriptions of questions. Then, the keywords are fed to the CNN-based model to extract the similar and dissimilar information between texts. At the same time, in order to make better use of the information about the question topic, the model also combines the feature from the similarity between them. In summary, the model treats the keywords and topics as the key information about the questions, and uses the information to calculate similarity between them. The model is experimented on the question similarity task of the SemEval2017. The mean average precision (MAP) reaches 49.65%, which exceeds the best result in the evaluation.

HTML全文

参考文献(0)

施引文献

资源附件(0)