基于朴素贝叶斯模型的单词语义相似度度量

王俊华; 左万利; 闫昭

doi:10.7544/issn1000-1239.2015.20140383

基于朴素贝叶斯模型的单词语义相似度度量

王俊华^1,2,3,
左万利^1,2,
闫昭^1,2

¹(吉林大学计算机科学与技术学院长春 130012)
²(符号计算与知识工程教育部重点实验室(吉林大学) 长春 130012)
³(长春工业大学计算机科学与工程学院长春 130012) (wangjunhua_1982@126.com)

基金项目: 国家自然科学基金项目(60973040)；国家自然科学青年基金项目(60903098，61300148)；吉林省重点科技攻关项目(20130206051GX)；吉林省科技计划青年基金项目(20130522112JH)

详细信息

中图分类号: TP391.1
计量
- 文章访问数: 1584
- HTML全文浏览量: 0
- PDF下载量: 806
出版历程
- 发布日期: 2015-06-30

Word Semantic Similarity Measurement Based on Nave Bayes Model

Wang Junhua^1,2,3,
Zuo Wanli^1,2,
Yan Zhao^1,2

¹(College of Computer Science and Technology, Jilin University, Changchun 130012)
²(Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012)
³(School of Computer Science & Engineering, Changchun University of Technology, Changchun 130012)

摘要

摘要: 单词语义相似度度量是自然语言处理领域的经典和热点问题.通过结合朴素贝叶斯模型和知识库，提出一个新颖的度量单词语义相似度度量途径.首先借助通用本体WordNet获取属性变量，然后使用统计和分段线性插值生成条件概率分布列，继而通过贝叶斯推理实现信息融合获得后验概率，并在此基础上量化单词语义相似度.主要贡献是定义了单词对距离和深度，并将朴素贝叶斯模型用于单词语义相似度度量.在基准数据集R&G(65)上，对比算法评判结果与人类评判结果的相关度，采用5折交叉验证对算法进行分析，样本Pearson相关度达到0.912，比当前最优方法高出0.4%，比经典算法高出7%~13%；Spearman相关度达到0.873,比经典算法高出10%~20%；且算法的运行效率和经典算法相当.实验结果显示将朴素贝叶斯模型和知识库相结合解决单词语义相似度问题是合理有效的.
- 单词语义相似度 /
- 语义相似度 /
- 分段线性插值 /
- 朴素贝叶斯模型 /
- WordNet
Abstract: Measuring semantic similarity between words is a classical and hot problem in nature language processing, the achievement of which has great impact on many applications such as word sense disambiguation, machine translation, ontology mapping, computational linguistics, etc. A novel approach is proposed to measure words semantic similarity by combining Nave Bayes model with knowledge base. To start, extract attribute variables based on WordNet; then, generate conditional probability distribution by statistics and piecewise linear interpolation technique; after that, obtain posteriori through Bayesian inference; at last, quantify word semantic similarity. The main contributions are definition of distance and depth between word pairs with small amount of computation and high degree of distinguishing the characteristics from words’ sense, and word semantic similarity measurement based on nave Bayesian model. On benchmark data set R&G(65), the experiment is conducted through 5-fold cross validation. The sample Pearson correlation between test results and human judgments is 0.912, with 0.4% improvement over existing best practice, and 7%~13% improvement over classical methods. Spearman correlation between test results and human judgments is 0.873, with 10%~20% improvement over classical methods. And the computational complexity of the method is as efficient as the classical methods, which indicates that integrating Nave Bayes model with knowledge base to measure word semantic similarity is reasonable and effective.
- word semantic similarity /
- semantic similarity /
- piecewise linear interpolation /
- Nave Bayes model /
- WordNet

HTML全文

参考文献(0)

施引文献(8)

期刊类型引用(6)

1.	童伟传，方友军，唐明. 基于数据挖掘的政务数据安全风险检测系统. 信息技术. 2023(02): 151-156 . 百度学术
2.	白荣华，魏强，郭瑞，刘金. 政务信息系统商用密码集约化平台设计与实现. 信息安全研究. 2023(05): 461-468 . 百度学术
3.	黎祥远. 攻防视角下的高校网络安全防护策略——基于网络安全攻防演练的研究. 华商论丛. 2023(01): 101-106 . 百度学术
4.	朱然，曾宇. 基于信任评估模型的物联网节点篡改共识仿真. 计算机仿真. 2021(04): 267-271 . 百度学术
5.	刘平. 国家公共文化云网络安全设计和实践. 百花. 2020(07): 31-34 . 百度学术
6.	张锐昕，王玉荣. 中国政府上网20年:发展历程、成就及反思. 福建师范大学学报(哲学社会科学版). 2019(05): 43-50+168 . 百度学术