Abstract:
Question similarity calculation is a major task in community question answering (CQA). It is helpful to retrieve relevant question-answer pairs from QA community by leveraging the similarity among queries. Questions in CQA are usually composed of topics and descriptions, which are both important in the task of question similarity calculation. In addition, due to the openness of the CQA, the length of questions is different. Meanwhile, background information of the questions will interfere with the judgment on question similarity. In order to reduce the influence of the irrelevant content and the various length of questions, the keywords are extracted from the descriptions of questions. Then, the keywords are fed to the CNN-based model to extract the similar and dissimilar information between texts. At the same time, in order to make better use of the information about the question topic, the model also combines the feature from the similarity between them. In summary, the model treats the keywords and topics as the key information about the questions, and uses the information to calculate similarity between them. The model is experimented on the question similarity task of the SemEval2017. The mean average precision (MAP) reaches 49.65%, which exceeds the best result in the evaluation.