ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (8): 1806-1818.doi: 10.7544/issn1000-1239.2016.20160210

Special Issue: 2016数据挖掘前沿技术专题

Previous Articles     Next Articles

IncPR: An Incremental Parallel PageRank Algorithm

Jiang Shuangshuang, Liao Qun, Yang Yulu,Li Tao   

  1. (College of Computer and Control Engineering, Nankai University, Tianjin 300350)
  • Online:2016-08-01

Abstract: PageRank algorithm plays an important role in various areas such as information retrieval and social network analysis. The continuously increasing size of networks and the timeliness requirements make it bring enormous computational overhead to repeat calculating PageRank for a massive and evolving network. Therefore, IncPR, a PageRank algorithm based on the idea of incremental computation model is proposed. IncPR reuses existing results from previous computation and modifies PageRank values according to the changed part of data sets accumulatively via an extended Monte Carlo method, which can effectively avoid reduplicated computation and improve the performance of PageRank computing with no additional storage overhead. Theoretical analysis shows that the computational complexity of IncPR processing an evolving network with m nodes and n edges changed is O((cm+n)R/c\+2), where c is the reset probability and R is the number of random walks starting from each node. The computational complexity of IncPR is lower than other existing related algorithms. Our evaluations with typical real-world graphs upon a Hama cluster also demonstrate that IncPR obtains PageRank values with equal (or even higher) accuracy as the baseline Monte Carlo PageRank algorithm and reduces the amount of computation significantly. In experiments with 0.01% data changed, IncPR reduces over 99.9% amount of computation.

Key words: PageRank, Web mining, incremental computing, Monte Carlo algorithm, parallel and distributed processing

CLC Number: