Abstract:
In recent years, a large amounts of uncertain data are emerging due to the wide usage of new technologies such as wireless sensor networks and radio frequency identification. Considering the uncertainty of uncertain data streams, a new kind of probability frequent pattern tree—PFP-tree and a probability frequent pattern mining method—PFP-growth are proposed in this paper. PFP-growth uses transactional uncertain data stream model and a time-based probability decay window model to find probability frequent patterns through calculating expected supports. The main characteristics of PFP-growth include: 1)Because the contributions on the expected supports of items arriving at different time within a window may be different, a time-based probability decay window model is used to improve mining precision ratios; 2)In order to enhance retrieval speed on PFP-tree,an item index table and a transaction index table are designed; 3)A pruning algorithm is designed to delete the nodes which are not possible to be frequent patterns, to reduce greatly the overhead of both time and space; 4)A transaction probability list is set for every node to meet the requirement that some data items may have different probabilities in different transactions. Experimental results have shown that the PFP-growth method can not only ensure a higher mining precision ratio, but also need less processing time and storage space than the existing methods.