基于框架语义分析的汉语句子相似度计算

李  茹; 王智强; 李双红; 梁吉业; Collin Baker

基于框架语义分析的汉语句子相似度计算

Chinese Sentence Similarity Computing Based on Frame Semantic Parsing

摘要

摘要: 句子相似度计算在自然语言处理的许多领域中发挥着重要作用.已有的汉语句子相似度计算方法由于考虑句子的语义不全面，使得相似度计算结果不够准确，为此提出一种新的汉语句子相似度计算方法.该方法基于汉语框架网语义资源，通过多框架语义分析、框架的重要度度量、框架的相似匹配、框架间相似度计算等关键步骤来实现句子语义的相似度量.其中多框架语义分析是从框架角度对句子中的所有目标词进行识别、框架选择及框架元素标注，从而达到全面刻画句子语义的目的；在此基础上根据句子中框架的语义覆盖范围对不同框架的重要度进行区分，能够使得相似度结果更准确.在包含多目标词的句子集上的实验结果显示，基于多框架语义分析的句子相似度计算方法相对传统方法获得了更好的测试结果.

Abstract: Sentence similarity computing plays an important role in many tasks of natural language processing. Recent approaches to sentence similarity computing have focused on word-level information without considering the semantic structural information; these methods based on the sentence structure are not generally desirable as they are severely affected by the incomplete description of sentence semantic. Hence, similarity computing isn't able to get better results. To solve this problem, this paper proposes a novel similarity computing approach based on Chinese FrameNet. The approach implements to measure the sentences' semantics similarity by multi-frame semantic parsing, importance measure of frames, similar match of frames, similarity computing between frames and so on. From the frame perspective, the multi-frame semantic parsing comprehensively describes sentences' semantics by identifying all the target words, choosing corresponding frames and labeling the frame elements. On that basis, the similarity result can be more accurate by distinguishing the different frames' importance in accordance with the semantic coverage area of the frame. In addition, by means of extracting the semantic core words of the frame element, the approach improves the precision of similarity among the frames of chunk form. The sentences which contain multiple target words are chosen as the corpus of the experiments. In contrast with traditional approaches, the results show that the proposed approach could achieve better similarity results.

HTML全文

参考文献(0)

施引文献

资源附件(0)