高级检索

    基于概率衰减窗口模型的不确定数据流频繁模式挖掘

    Frequent Patterns Mining over Uncertain Data Streams Based on Probability Decay Window Model

    • 摘要: 考虑到不确定数据流的不确定性,设计了一种新的概率频繁模式树PFP-tree和基于该树的概率频繁模式挖掘方法PFP-growth.PFP-growth使用事务性不确定数据流及概率衰减窗口模型,通过计算各概率数据项的期望支持度以发现概率频繁模式,其主要特点有:考虑到窗口内不同时间到达数据项的贡献度不同,采用概率衰减窗口模型计算期望支持度,以提高模式挖掘准确度;设置数据项索引表和事务索引表,以加快频繁模式树检索速度;通过剪枝删除不可能成为频繁模式的结点,以降低模式树的存储及检索开销;对每个结点都设立一个事务概率信息链表,以支持数据项在不同事务中具有不同概率的情形.实验结果表明,PFP-growth在保证挖掘模式准确度的前提下,在处理时间和内存空间等方面都具有较好的性能.

       

      Abstract: In recent years, a large amounts of uncertain data are emerging due to the wide usage of new technologies such as wireless sensor networks and radio frequency identification. Considering the uncertainty of uncertain data streams, a new kind of probability frequent pattern tree—PFP-tree and a probability frequent pattern mining method—PFP-growth are proposed in this paper. PFP-growth uses transactional uncertain data stream model and a time-based probability decay window model to find probability frequent patterns through calculating expected supports. The main characteristics of PFP-growth include: 1)Because the contributions on the expected supports of items arriving at different time within a window may be different, a time-based probability decay window model is used to improve mining precision ratios; 2)In order to enhance retrieval speed on PFP-tree,an item index table and a transaction index table are designed; 3)A pruning algorithm is designed to delete the nodes which are not possible to be frequent patterns, to reduce greatly the overhead of both time and space; 4)A transaction probability list is set for every node to meet the requirement that some data items may have different probabilities in different transactions. Experimental results have shown that the PFP-growth method can not only ensure a higher mining precision ratio, but also need less processing time and storage space than the existing methods.

       

    /

    返回文章
    返回