Abstract:
Data stream is a continuous and time changed sequence of data elements, and contained information is different over time. In some data stream applications, the information embedded in the data arriving in the new recent time period is of particular value. Therefore, time decay model (TDM) is used for mining frequent patterns on data stream. Existing methods to design time decay factor have the characteristics of randomness, so the result set is unsteady. Or, the methods just consider 100% recall or 100% precision of the algorithm, while they ignore the corresponding high precision or recall. In order to balance high recall and high precision of the algorithm and ensure the stability of the result set, a novel way to set average decay factor is designed. To further increase the weights of the latest transactions and reduce the weights of historical transactions, another novel way to design decay factor based on Gaussian function is proposed. For comparing the pros and cons of different time factors, four time decay models are researched and designed. The algorithms based on these four models are designed to discover closed frequent patterns over data streams. The performance of the proposed methods to mine the frequent patterns on the high-density or low-density data streams is evaluated via experiments. Results show that using the average time decay factor balances the high recall and high precision of the algorithm. Compared with other ways, setting decay factor based on Gaussian function gets better performance than them.