基于频繁项集挖掘的贝叶斯分类算法

眭俊明  姜  远  周志华

基于频繁项集挖掘的贝叶斯分类算法

眭俊明姜远周志华

Bayesian Classifier Based on Frequent Item Sets Mining

Xu Junming, Jiang Yuan, and Zhou Zhihua

摘要

摘要: 朴素贝叶斯分类器是一种简单而且高效的分类学习算法，但是它所要求的属性独立性假设在真实世界应用中经常难以满足.为了放松属性独立性约束以提高朴素贝叶斯分类器的泛化能力，研究人员进行了大量的工作.提出了一种基于频繁项集挖掘技术的贝叶斯分类学习算法FISC (frequent item sets classifier).在训练阶段，FISC找到所有频繁项集并计算可能用到的概率估值.在测试阶段，FISC对于测试样本包含的每个项集构造一个分类器，通过集成这些分类器来给出预测结果.实验结果验证了FISC的有效性.

Abstract: Nave Bayesian classifier provides a simple and effective way to classifier learning, but its assumption on attribute independence is often violated in real-world applications. To alleviate this assumption and improve the generalization ability of Nave Bayesian classifier, many works have been done cy researchers. AODE ensembles some one-dependence Bayesian classifiers and LB selects and combines long item sets providing new evidence to compute the class probability. Both of them achieve good performance, but higher order dependence relations may contain useful information for classification and limiting the number of item sets used in classifier may restricts the benefit of item sets. For this consideration, a frequent item sets mining-based Bayesian classifier, FISC (frequent item sets classifier), is proposed. At the training stage, FISC finds all the frequent item sets satisfying the minimum support threshold min_sup and computes all the probabilities that may be used at the classification time. At the test stage, FISC constructs a classifier for each frequent item set contained in the test instance, and then classifies the instance by ensembling all these classifiers. Experiments validate the effectiveness of FISC and show how the performance of FISC varies with different min_sup. Based on the experiment result, an experiential selection for min_sup is suggested.

HTML全文

参考文献(0)

施引文献

资源附件(0)