Abstract:
Frequent items detection has play an important role in many applications, such as network monitor, network intrusion detection, association rules mining, and so on. While is has been studied intensively in the field of static uncertain data, frequent items detection is still novel in the emerging uncertain data stream field. In this paper, based on the sliding window model, an efficient algorithm(sTopK-UFI) is proposed to detect frequent items on uncertain data stream. In order to reduce the computation cost, the algorithm incrementally updates the exact top-K results upon each window slide and thus eliminates the need for complete recomputation from scratch. On the other hand, according to the current data in the sliding window, items to be frequent in the future are predicated according to the Poisson distribution. By computing the upper and lower bound of the probability, the pruning rules are proposed, which can reduce the candidate set and improve the query efficiently. Extensive experiments on real and synthetic datasets show the sTopK-UFI algorithm is an effective way to solve the problem of frequent items detection on uncertain data stream; it could significantly reduce the candidate set, save the memory space, improve the execution time and meet the requirements of practical applications.