• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

Asyn-SimRank:一种可异步执行的大规模SimRank算法

王春磊, 张岩峰, 鲍玉斌, 赵长宽, 于戈, 高立新

王春磊, 张岩峰, 鲍玉斌, 赵长宽, 于戈, 高立新. Asyn-SimRank:一种可异步执行的大规模SimRank算法[J]. 计算机研究与发展, 2015, 52(7): 1567-1579. DOI: 10.7544/issn1000-1239.2015.20140313
引用本文: 王春磊, 张岩峰, 鲍玉斌, 赵长宽, 于戈, 高立新. Asyn-SimRank:一种可异步执行的大规模SimRank算法[J]. 计算机研究与发展, 2015, 52(7): 1567-1579. DOI: 10.7544/issn1000-1239.2015.20140313
Wang Chunlei, Zhang Yanfeng, Bao Yubin, Zhao Changkuan, Yu Ge, Gao Lixin. Asyn-SimRank: An Asynchronous Large-Scale SimRank Algorithm[J]. Journal of Computer Research and Development, 2015, 52(7): 1567-1579. DOI: 10.7544/issn1000-1239.2015.20140313
Citation: Wang Chunlei, Zhang Yanfeng, Bao Yubin, Zhao Changkuan, Yu Ge, Gao Lixin. Asyn-SimRank: An Asynchronous Large-Scale SimRank Algorithm[J]. Journal of Computer Research and Development, 2015, 52(7): 1567-1579. DOI: 10.7544/issn1000-1239.2015.20140313
王春磊, 张岩峰, 鲍玉斌, 赵长宽, 于戈, 高立新. Asyn-SimRank:一种可异步执行的大规模SimRank算法[J]. 计算机研究与发展, 2015, 52(7): 1567-1579. CSTR: 32373.14.issn1000-1239.2015.20140313
引用本文: 王春磊, 张岩峰, 鲍玉斌, 赵长宽, 于戈, 高立新. Asyn-SimRank:一种可异步执行的大规模SimRank算法[J]. 计算机研究与发展, 2015, 52(7): 1567-1579. CSTR: 32373.14.issn1000-1239.2015.20140313
Wang Chunlei, Zhang Yanfeng, Bao Yubin, Zhao Changkuan, Yu Ge, Gao Lixin. Asyn-SimRank: An Asynchronous Large-Scale SimRank Algorithm[J]. Journal of Computer Research and Development, 2015, 52(7): 1567-1579. CSTR: 32373.14.issn1000-1239.2015.20140313
Citation: Wang Chunlei, Zhang Yanfeng, Bao Yubin, Zhao Changkuan, Yu Ge, Gao Lixin. Asyn-SimRank: An Asynchronous Large-Scale SimRank Algorithm[J]. Journal of Computer Research and Development, 2015, 52(7): 1567-1579. CSTR: 32373.14.issn1000-1239.2015.20140313

Asyn-SimRank:一种可异步执行的大规模SimRank算法

基金项目: 国家自然科学基金项目(61300023,61272179,61033007,61173028);中央高校基本科研业务费基金项目(N120416001,N120816001);中国移动基金项目(MCM20122051);辽宁省科技计划基金项目(2013217004)
详细信息
  • 中图分类号: TP301.6

Asyn-SimRank: An Asynchronous Large-Scale SimRank Algorithm

  • 摘要: SimRank算法利用网络结构来评估网络中任意2点的相似性,它被广泛应用于社交网络和链接预测等诸多领域中.近年来,随着大数据技术的发展,SimRank算法处理的数据不断增大,人们利用MapReduce等分布式计算模型设计实现分布式的大规模SimRank算法来适应大数据处理的需求.但是,由于SimRank算法包含开销较大的迭代过程,每次迭代之后都需要一个全局同步,且每次迭代的计算复杂度高、通信量大,SimRank算法不能在分布式环境下高效地实现.1)提出Asyn-SimRank算法,该算法采用迭代-累积的方式完成迭代计算,异步执行SimRank的核心迭代过程,避免了大规模分布式计算中的大量同步开销,同时有效降低计算量并减少通信开销;2)提出关键点优先调度计算,提升了Asyn-SimRank算法的全局收敛速度;3)证明了Asyn-SimRank算法的正确性和收敛性以及关键点优先调度计算的有效性;4)支持异步迭代的分布式框架Maiter上实现了Asyn-SimRank算法.实验结果显示,相比较于Hadoop,Spark上实现的SimRank算法和Delta-SimRank算法,Asyn-SimRank算法大大提升了算法的计算效率,加速了算法收敛.
    Abstract: The SimRank algorithm, which exploits network structure to measure the similarity between node pairs, has been widely used in many areas, such as online social networks and link prediction. In recent years, with the development of big data, the input data set of the SimRank algorithm is constantly increasing. People are utilizing distributed computing models, such as MapReduce, to design large-scale SimRank algorithm for solving the big data problems. However, since SimRank algorithm contains a high-cost iterative process with synchronization barriers between iterations and the computational complexity is high in each iteration, the large-scale SimRank computation does not result in the satisfactory performance. In this paper, 1)We propose Asyn-SimRank by employing the iterate-cumulate approach,which asynchronously executes the core iterative computation to avoid the high-cost synchronization barriers in large-scale distributed environments, and effectively reduces the amount of computation and communication.2) We further propose the keypoint priority scheduling mechanism to accelerate convergence. 3)We prove the accuracy and convergence property of Asyn-SimRank as well as the efficiency of the keypoint priority scheduling. 4)We then implement Asyn-SimRank on Maiter, which is a distributed framework supporting asynchronous iteration. Expremental results show that, compared with the SimRank and Delta-SimRank implementing on Hadoop and Spark, the large-scale Asyn-SimRank significantly promotes the computational efficiency and accelerates the convergence.
  • 期刊类型引用(6)

    1. 王璐璐. 基于混合注意力机制的时间旋转知识图谱补全. 网络安全与数据治理. 2024(10): 42-48 . 百度学术
    2. 何鹏,周刚,陈静,章梦礼,宁原隆. 类型增强的时态知识图谱表示学习模型. 计算机研究与发展. 2023(04): 916-929 . 本站查看
    3. 陈小英,熊盛武,王盛,张士伟. 基于上下文时序关联的时序知识图谱嵌入方法. 武汉大学学报(理学版). 2023(02): 249-257 . 百度学术
    4. 周丽华,王家龙,王丽珍,陈红梅,孔兵. 异质信息网络表征学习综述. 计算机学报. 2022(01): 160-189 . 百度学术
    5. 杨大伟,周刚,卢记仓,宁原隆. 基于知识表示学习的知识图谱补全研究综述. 信息工程大学学报. 2021(05): 558-565 . 百度学术
    6. 王红,卢林燕,王童. 航空安全事件知识图谱补全方法. 西南大学学报(自然科学版). 2020(11): 31-42 . 百度学术

    其他类型引用(3)

计量
  • 文章访问数:  1725
  • HTML全文浏览量:  0
  • PDF下载量:  682
  • 被引次数: 9
出版历程
  • 发布日期:  2015-06-30

目录

    /

    返回文章
    返回