面向磁盘驻留的类Pregel系统的多级容错处理机制

毕亚辉; 姜苏洋; 王志刚; 冷芳玲; 鲍玉斌; 于戈; 钱岭

doi:10.7544/issn1000-1239.2016.20150619

面向磁盘驻留的类Pregel系统的多级容错处理机制

A Multi-Level Fault Tolerance Mechanism for Disk-Resident Pregel-Like Systems

摘要

摘要: 基于BSP模型的分布式框架已经成为大规模图高频迭代处理的有效工具.分布式系统可以通过增加集群节点数量的方式提供弹性的处理能力，但同时也增加了故障发生的概率，因此亟需开发高效的容错处理机制.现有工作主要是基于检查点机制展开研究，包括数据备份和故障恢复2部分：前者没有考虑迭代过程中参与计算的数据规模的动态变化，而是备份所有图数据，因此引入了冗余数据的写开销；后者通常是从远程存储节点上读取备份数据进行故障恢复，而没有考虑利用本地磁盘数据恢复某些场景下的故障，引入额外的网络开销.因此提出了一种多级容错处理机制，将故障分为计算任务故障和计算节点故障2类，并设计了不同的备份和恢复策略. 备份阶段利用了某些应用在迭代计算过程中参与计算的数据规模的动态变化特性，设计了完全备份和写变化log自适应选择的策略，可以显著减少冗余数据的写开销.故障恢复阶段，对任务故障，利用本地磁盘上保留的图数据和远程的消息数据完成恢复；而对节点故障，则利用备份在远程信息进行恢复.最后，通过在真实数据集上的大量实验,验证了提出的多级容错机制的有效性.

Abstract: The BSP-based distributed frameworks, such as Pregel, are becoming a powerful tool for handling large-scale graphs, especially for applications with iterative computing frequently. Distributed systems can guarantee a flexible processing capacity by adding computing nodes, however, they also increase the probability of failures. Therefore, an efficient fault-tolerance mechanism is essential. Existing work mainly focuses on the checkpoint policy, including backup and recovery. The former usually backups all graph data, which leads to the cost of writing redundant data since some data are static during iterations. The latter always loads backup data from remote machines to recovery iterations, ignoring the usage of data in the local disk in special scenarios, which incurs network costs. It proposes a multi-level fault tolerant mechanism, which distinguishes failures into computing task failures and node failures, and then designs different strategies for backup and recovery. For the latter, considering that the volume of data involved in computation varies with iterations, a complete backup policy and an adaptive log-based policy are presented to reduce the cost of writing redundant data. After that, at the stages of recovery, we utilize the local graph data and the remote message data to handle the recovery for task failures, but the remote data are used for node failures. Finally, extensive experiments on real datasets validate the efficiency of our solutions.

HTML全文

参考文献(0)

施引文献

资源附件(0)