Abstract:
PageRank algorithm plays an important role in various areas such as information retrieval and social network analysis. The continuously increasing size of networks and the timeliness requirements make it bring enormous computational overhead to repeat calculating PageRank for a massive and evolving network. Therefore, IncPR, a PageRank algorithm based on the idea of incremental computation model is proposed. IncPR reuses existing results from previous computation and modifies PageRank values according to the changed part of data sets accumulatively via an extended Monte Carlo method, which can effectively avoid reduplicated computation and improve the performance of PageRank computing with no additional storage overhead. Theoretical analysis shows that the computational complexity of IncPR processing an evolving network with m nodes and n edges changed is O((cm+n)R/c\+2), where c is the reset probability and R is the number of random walks starting from each node. The computational complexity of IncPR is lower than other existing related algorithms. Our evaluations with typical real-world graphs upon a Hama cluster also demonstrate that IncPR obtains PageRank values with equal (or even higher) accuracy as the baseline Monte Carlo PageRank algorithm and reduces the amount of computation significantly. In experiments with 0.01% data changed, IncPR reduces over 99.9% amount of computation.