关联规则挖掘中若干关键技术的研究

陈  耿; 朱玉全; 杨鹤标; 陆介平; 宋余庆; 孙志挥

关联规则挖掘中若干关键技术的研究

Study of Some Key Techniques in Mining Association Rule

摘要

摘要: Apriori类算法已经成为关联规则挖掘中的经典算法，其技术难点及运算量主要集中在以下两个方面：① 如何确定候选频繁项目集和计算项目集的支持数；② 如何减少候选频繁项目集的个数以及扫描数据库的次数.目前已提出了许多改进方法来解决第2个问题，并已取得了很好的效果.然而，对于第1个问题，仍沿用Apriori算法中的解决方案，其运算量是较大的.为此，提出了一种基于二进制形式的候选频繁项目集生成和相应的计算支持数算法，该算法只需对挖掘对象进行一些“或”、“与”、“异或”等逻辑运算操作，显著降低了算法的实现难度，将该算法与Apriori类算法相结合，可以进一步提高算法的执行效率，实验结果也表明算法是有效、快速的.

Abstract: The apriori algorithm has become a classic method for mining association rules. The difficulties and operation quantity of the apriori algorithm consist of the following two aspects: (1) how to generate candidate frequent itemsets and to calculate its support, (2) how to reduce the size of candidate frequent itemsets and times of accessing I/O. At present, there are many methods that can solve the second problems very well. However, very few methods have been presented to solve the first problem. An efficient and fast algorithm based on binary format for discovering candidate frequent itemsets and calculating the support of itemsets is proposed, which only executes some logical operation. A performance comparison of this algorithm with the apriori-like algorithms is given，and the experiments show that the new algorithm is more efficient.

HTML全文

参考文献(0)

施引文献

资源附件(0)