• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于相似度量的自适应三支垃圾邮件过滤器

谢秦, 张清华, 王国胤

谢秦, 张清华, 王国胤. 基于相似度量的自适应三支垃圾邮件过滤器[J]. 计算机研究与发展, 2019, 56(11): 2410-2423. DOI: 10.7544/issn1000-1239.2019.20180793
引用本文: 谢秦, 张清华, 王国胤. 基于相似度量的自适应三支垃圾邮件过滤器[J]. 计算机研究与发展, 2019, 56(11): 2410-2423. DOI: 10.7544/issn1000-1239.2019.20180793
Xie Qin, Zhang Qinghua, Wang Guoyin. An Adaptive Three-way Spam Filter with Similarity Measure[J]. Journal of Computer Research and Development, 2019, 56(11): 2410-2423. DOI: 10.7544/issn1000-1239.2019.20180793
Citation: Xie Qin, Zhang Qinghua, Wang Guoyin. An Adaptive Three-way Spam Filter with Similarity Measure[J]. Journal of Computer Research and Development, 2019, 56(11): 2410-2423. DOI: 10.7544/issn1000-1239.2019.20180793
谢秦, 张清华, 王国胤. 基于相似度量的自适应三支垃圾邮件过滤器[J]. 计算机研究与发展, 2019, 56(11): 2410-2423. CSTR: 32373.14.issn1000-1239.2019.20180793
引用本文: 谢秦, 张清华, 王国胤. 基于相似度量的自适应三支垃圾邮件过滤器[J]. 计算机研究与发展, 2019, 56(11): 2410-2423. CSTR: 32373.14.issn1000-1239.2019.20180793
Xie Qin, Zhang Qinghua, Wang Guoyin. An Adaptive Three-way Spam Filter with Similarity Measure[J]. Journal of Computer Research and Development, 2019, 56(11): 2410-2423. CSTR: 32373.14.issn1000-1239.2019.20180793
Citation: Xie Qin, Zhang Qinghua, Wang Guoyin. An Adaptive Three-way Spam Filter with Similarity Measure[J]. Journal of Computer Research and Development, 2019, 56(11): 2410-2423. CSTR: 32373.14.issn1000-1239.2019.20180793

基于相似度量的自适应三支垃圾邮件过滤器

基金项目: 国家自然科学基金项目(61876201,61772096);重庆市研究生科研创新项目(CYS18244)
详细信息
  • 中图分类号: TP301.6

An Adaptive Three-way Spam Filter with Similarity Measure

  • 摘要: 垃圾邮件过滤是信息时代的一个重要研究课题,一封重要邮件被错分会产生不可估量的代价.因此,如何提高过滤器的性能成为垃圾邮件过滤领域中的核心问题.目前,业界通常采用机器学习算法中的二分类模型以处理垃圾邮件过滤问题.然而,较之于三支决策模型,二分类模型会产生较大的错分代价.作为三支决策的一个重要分支,基于决策理论粗糙集的三支决策模型符合人类认知习惯,且能有效地降低错分代价,进而提高过滤器的性能.然而,在构造损失函数时,少有研究考虑由于等价类之间的差异性而对分类结果带来的影响.因此,在基于决策理论粗糙集的三支决策模型的基础上,提出了一种基于相似度量的自适应三支垃圾邮件分类模型.该模型根据集合方差计算了条件属性的权重,并基于相似度量建立了一种刻画差异信息的综合评价函数,进而根据贝叶斯决策规则构建了一种计算自适应阈值对的方法.实验结果表明所提模型在垃圾邮件过滤领域表现优异.
    Abstract: Spam filtering is an important issue in the information age. And, if an important email is wrongly classified, it would lead to an immeasurable cost. Thus, in the field of spam filtering, the ways to improve the accuracy and recall of the filters is the key issue. At present, the binary classification model in machine learning is usually used to deal with spam filtering. However, compared with the three-way decisions, the binary classification model usually leads to a higher cost of misclassification. And, as an important branch of three-way decisions, the three-way decisions with decision-theoretic rough sets can effectively reduce the misclassification cost and further improve the performance of filters. And, it also conforms to human cognition. Nevertheless, few studies consider the effect on classification results induced by the differences among equivalence classes when constructing the loss functions. Therefore, under the framework of the three-way decisions with decision-theoretic rough sets, an adaptive three-way spam filter with similarity measure is proposed. The model calculates the weights of condition attributes according to set variance firstly. Then, a comprehensive evaluation function for describing difference information among equivalence classes based on similarity measure of set is established. Finally, an adaptive model for calculating threshold pairs based on Bayesian decision rules is constructed. Experimental results show that the proposed model performs well in the field of spam filtering.
  • 期刊类型引用(12)

    1. 杨兴耀,肖瑞,卢进堂. 新疆维吾尔语口音普通话短文的语音识别研究. 东北师大学报(自然科学版). 2024(04): 72-80 . 百度学术
    2. 闫凯,宋烨,刘瑜,杨莉,张浩源. 老龄化背景下居家养老系统方言识别算法应用研究——以粤语为例. 信息与电脑(理论版). 2023(02): 120-122 . 百度学术
    3. 蒋若怡,韦永壮,王慧娇. 基于深度学习的差分神经区分器求解方法. 计算机工程与设计. 2023(06): 1629-1634 . 百度学术
    4. 赵建川,杨浩铨,徐勇,吴恋,崔忠伟. 基于对比预测编码模型的多任务学习语种识别方法. 数据采集与处理. 2022(02): 288-297 . 百度学术
    5. 万苗,任杰,马苗,曹瑞. 多任务学习在中国方言分类中的应用研究. 计算机技术与发展. 2022(04): 109-115 . 百度学术
    6. 郝焕香. 基于深度学习的方言语音识别模型构建. 自动化与仪器仪表. 2022(04): 48-51 . 百度学术
    7. 王瑶,龙华,邵玉斌,杜庆治. 可变时长的短时广播语音多语种识别. 云南大学学报(自然科学版). 2022(03): 490-496 . 百度学术
    8. 付英,刘增力,汤辉. 基于CNN-BiGRU的方言语种识别. 通信技术. 2022(06): 712-719 . 百度学术
    9. 王瑶,龙华,邵玉斌,杜庆治,王延凯. 基于CRNN混合神经网络的多语种识别. 光电子·激光. 2022(06): 620-628 . 百度学术
    10. 张允耀,黄鹤鸣,张会云. 复杂噪声环境下语音识别研究. 计算机与现代化. 2021(09): 68-74 . 百度学术
    11. 辛强伟,唐云凯. 多维度数据组合的人工智能系统性能优化分析. 数字技术与应用. 2020(10): 36-38 . 百度学术
    14. 顾佳,黄明,关岳. 高速列车牵引变流器故障诊断研究. 振动.测试与诊断. 2020(05): 997-1002+1029 . 百度学术

    其他类型引用(15)

计量
  • 文章访问数:  1010
  • HTML全文浏览量:  5
  • PDF下载量:  498
  • 被引次数: 27
出版历程
  • 发布日期:  2019-10-31

目录

    /

    返回文章
    返回