ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (11): 2530-2541.doi: 10.7544/issn1000-1239.2016.20150619

• 信息安全 • 上一篇    下一篇

面向磁盘驻留的类Pregel系统的多级容错处理机制

毕亚辉1,姜苏洋1,王志刚1,冷芳玲1,鲍玉斌1,于戈1,钱岭2   

  1. 1(东北大学计算机科学与工程学院 沈阳 110819); 2(中国移动(苏州)软件技术有限公司 江苏苏州 215163) (biyahui1990@163.com)
  • 出版日期: 2016-11-01
  • 基金资助: 
    国家自然科学基金重点项目(61433008);国家自然科学基金项目(61173028,61272179);中央高校基本科研业务费专项基金项目(N100704001);教育部-中国移动科研基金项目(MCM20125021) This work was supported by the Key Program of the National Natural Science Foundation of China (61433008), the National Natural Science Foundation of China (61173028,61272179), the Fundamental Research Funds for the Central Universities (N100704001), and Chinese Ministry of Education-China Mobile Communications Corporation Research Funds (MCM20125021).

A Multi-Level Fault Tolerance Mechanism for Disk-Resident Pregel-Like Systems

Bi Yahui1, Jiang Suyang1, Wang Zhigang1, Leng Fangling1, Bao Yubin1, Yu Ge1, Qian Ling2   

  1. 1(College of Computer Science and Engineering, Northeastern University, Shenyang 110819); 2(China Mobile (Suzhou) Software Technology Co, Ltd, Suzhou, Jiangsu 215163)
  • Online: 2016-11-01

摘要: 基于BSP模型的分布式框架已经成为大规模图高频迭代处理的有效工具.分布式系统可以通过增加集群节点数量的方式提供弹性的处理能力,但同时也增加了故障发生的概率,因此亟需开发高效的容错处理机制.现有工作主要是基于检查点机制展开研究,包括数据备份和故障恢复2部分:前者没有考虑迭代过程中参与计算的数据规模的动态变化,而是备份所有图数据,因此引入了冗余数据的写开销;后者通常是从远程存储节点上读取备份数据进行故障恢复,而没有考虑利用本地磁盘数据恢复某些场景下的故障,引入额外的网络开销.因此提出了一种多级容错处理机制,将故障分为计算任务故障和计算节点故障2类,并设计了不同的备份和恢复策略. 备份阶段利用了某些应用在迭代计算过程中参与计算的数据规模的动态变化特性,设计了完全备份和写变化log自适应选择的策略,可以显著减少冗余数据的写开销.故障恢复阶段,对任务故障,利用本地磁盘上保留的图数据和远程的消息数据完成恢复;而对节点故障,则利用备份在远程信息进行恢复.最后,通过在真实数据集上的大量实验,验证了提出的多级容错机制的有效性.

关键词: 容错, 大规模图, 迭代计算, BSP模型, 检查点

Abstract: The BSP-based distributed frameworks, such as Pregel, are becoming a powerful tool for handling large-scale graphs, especially for applications with iterative computing frequently. Distributed systems can guarantee a flexible processing capacity by adding computing nodes, however, they also increase the probability of failures. Therefore, an efficient fault-tolerance mechanism is essential. Existing work mainly focuses on the checkpoint policy, including backup and recovery. The former usually backups all graph data, which leads to the cost of writing redundant data since some data are static during iterations. The latter always loads backup data from remote machines to recovery iterations, ignoring the usage of data in the local disk in special scenarios, which incurs network costs. It proposes a multi-level fault tolerant mechanism, which distinguishes failures into computing task failures and node failures, and then designs different strategies for backup and recovery. For the latter, considering that the volume of data involved in computation varies with iterations, a complete backup policy and an adaptive log-based policy are presented to reduce the cost of writing redundant data. After that, at the stages of recovery, we utilize the local graph data and the remote message data to handle the recovery for task failures, but the remote data are used for node failures. Finally, extensive experiments on real datasets validate the efficiency of our solutions.

Key words: fault tolerance, large-scale graph, iterative computing, BSP model, checkpoint

中图分类号: