ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (7): 1499-1509.doi: 10.7544/issn1000-1239.2015.20140383

Previous Articles     Next Articles

Word Semantic Similarity Measurement Based on Nave Bayes Model

Wang Junhua1,2,3, Zuo Wanli1,2, Yan Zhao1,2   

  1. 1(College of Computer Science and Technology, Jilin University, Changchun 130012);2(Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012);3(School of Computer Science & Engineering, Changchun University of Technology, Changchun 130012)
  • Online:2015-07-01

Abstract: Measuring semantic similarity between words is a classical and hot problem in nature language processing, the achievement of which has great impact on many applications such as word sense disambiguation, machine translation, ontology mapping, computational linguistics, etc. A novel approach is proposed to measure words semantic similarity by combining Nave Bayes model with knowledge base. To start, extract attribute variables based on WordNet; then, generate conditional probability distribution by statistics and piecewise linear interpolation technique; after that, obtain posteriori through Bayesian inference; at last, quantify word semantic similarity. The main contributions are definition of distance and depth between word pairs with small amount of computation and high degree of distinguishing the characteristics from words’ sense, and word semantic similarity measurement based on nave Bayesian model. On benchmark data set R&G(65), the experiment is conducted through 5-fold cross validation. The sample Pearson correlation between test results and human judgments is 0.912, with 0.4% improvement over existing best practice, and 7%~13% improvement over classical methods. Spearman correlation between test results and human judgments is 0.873, with 10%~20% improvement over classical methods. And the computational complexity of the method is as efficient as the classical methods, which indicates that integrating Nave Bayes model with knowledge base to measure word semantic similarity is reasonable and effective.

Key words: word semantic similarity, semantic similarity, piecewise linear interpolation, Nave Bayes model, WordNet

CLC Number: