Abstract:
Measuring semantic similarity between words is a classical and hot problem in nature language processing, the achievement of which has great impact on many applications such as word sense disambiguation, machine translation, ontology mapping, computational linguistics, etc. A novel approach is proposed to measure words semantic similarity by combining Nave Bayes model with knowledge base. To start, extract attribute variables based on WordNet; then, generate conditional probability distribution by statistics and piecewise linear interpolation technique; after that, obtain posteriori through Bayesian inference; at last, quantify word semantic similarity. The main contributions are definition of distance and depth between word pairs with small amount of computation and high degree of distinguishing the characteristics from words’ sense, and word semantic similarity measurement based on nave Bayesian model. On benchmark data set R&G(65), the experiment is conducted through 5-fold cross validation. The sample Pearson correlation between test results and human judgments is 0.912, with 0.4% improvement over existing best practice, and 7%~13% improvement over classical methods. Spearman correlation between test results and human judgments is 0.873, with 10%~20% improvement over classical methods. And the computational complexity of the method is as efficient as the classical methods, which indicates that integrating Nave Bayes model with knowledge base to measure word semantic similarity is reasonable and effective.