高级检索

    基于iceberg概念格并置集成的闭频繁项集挖掘算法

    An Algorithm for Mining Closed Frequent Itemsets Based on Apposition Assembly of Iceberg Concept Lattices

    • 摘要: 由于概念格的完备性,在基于概念格的数据挖掘过程中,构造概念格的时间复杂度和空间复杂度一直是影响其应用的主要因素.结合iceberg概念格的半格特性和概念格的集成思想,首先在理论上分析并置集成后的iceberg概念格与由完备概念格裁剪得到的iceberg格同构;然后分析了iceberg概念格集成过程中的映射关系;最终提出一个新颖的基于iceberg概念格并置的闭频繁项集挖掘算法(Icegalamera).此算法避免了完备概念格的计算,并且在构造过程中采用集成和剪枝策略,从而显著提高了挖掘效率.实验证明其产生的闭频繁项集的完备性.使用稠密和稀疏数据集在单站点模式下进行了性能测试,结果表明稀疏数据集上性能优势明显.

       

      Abstract: Formal concept analysis which is an unsupervised learning method for conceptual clustering constitutes an appropriate framework for data mining. However, due to the completeness of concept lattice, the task of constructing the lattice is known to be computationally expensive. The iceberg lattice of context, a substructure of the complete concept lattice, served as a condensed representation of frequent itemsets. And it is well suited for analyzing very large database. And building concept lattice by merging factor lattices drawn from data fragments may be adapted to distributed data mining environment. Inspired by those ideas, a novel algorithm called Icegalamera for iceberg concept lattice assembly from heterogeneous relational tables is presented and is utilized for closed frequent itemsets mining. The completeness of closed frequent itemsets produced by Icegalamera is proved both in theory and in empirical way, and then the merge mapping process is analyzed and implemented from partial iceberg concept lattices to global one. This algorithm avoids computation of structuring the complete concept lattice. Furthermore the merge and pruning strategies are adopted, which makes the algorithm's efficiency outperforms that of the apriori algorithm on generating frequent itemsets under condense and sparse data set.

       

    /

    返回文章
    返回