ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (7): 1567-1579.

### Asyn-SimRank: An Asynchronous Large-Scale SimRank Algorithm

Wang Chunlei1, Zhang Yanfeng2, Bao Yubin1, Zhao Changkuan2, Yu Ge1, Gao Lixin3

1. 1(Institute of Computer Software, College of Information Science and Engineering, Northeastern University, Shenyang 110819);2(Computer Center, Northeastern University, Shenyang 110819);3(Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, U.S. 01003)
• Online:2015-07-01

Abstract: The SimRank algorithm, which exploits network structure to measure the similarity between node pairs, has been widely used in many areas, such as online social networks and link prediction. In recent years, with the development of big data, the input data set of the SimRank algorithm is constantly increasing. People are utilizing distributed computing models, such as MapReduce, to design large-scale SimRank algorithm for solving the big data problems. However, since SimRank algorithm contains a high-cost iterative process with synchronization barriers between iterations and the computational complexity is high in each iteration, the large-scale SimRank computation does not result in the satisfactory performance. In this paper, 1)We propose Asyn-SimRank by employing the iterate-cumulate approach,which asynchronously executes the core iterative computation to avoid the high-cost synchronization barriers in large-scale distributed environments, and effectively reduces the amount of computation and communication.2) We further propose the keypoint priority scheduling mechanism to accelerate convergence. 3)We prove the accuracy and convergence property of Asyn-SimRank as well as the efficiency of the keypoint priority scheduling. 4)We then implement Asyn-SimRank on Maiter, which is a distributed framework supporting asynchronous iteration. Expremental results show that, compared with the SimRank and Delta-SimRank implementing on Hadoop and Spark, the large-scale Asyn-SimRank significantly promotes the computational efficiency and accelerates the convergence.

CLC Number: