• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Bi Yahui, Jiang Suyang, Wang Zhigang, Leng Fangling, Bao Yubin, Yu Ge, Qian Ling. A Multi-Level Fault Tolerance Mechanism for Disk-Resident Pregel-Like Systems[J]. Journal of Computer Research and Development, 2016, 53(11): 2530-2541. DOI: 10.7544/issn1000-1239.2016.20150619
Citation: Bi Yahui, Jiang Suyang, Wang Zhigang, Leng Fangling, Bao Yubin, Yu Ge, Qian Ling. A Multi-Level Fault Tolerance Mechanism for Disk-Resident Pregel-Like Systems[J]. Journal of Computer Research and Development, 2016, 53(11): 2530-2541. DOI: 10.7544/issn1000-1239.2016.20150619

A Multi-Level Fault Tolerance Mechanism for Disk-Resident Pregel-Like Systems

More Information
  • Published Date: October 31, 2016
  • The BSP-based distributed frameworks, such as Pregel, are becoming a powerful tool for handling large-scale graphs, especially for applications with iterative computing frequently. Distributed systems can guarantee a flexible processing capacity by adding computing nodes, however, they also increase the probability of failures. Therefore, an efficient fault-tolerance mechanism is essential. Existing work mainly focuses on the checkpoint policy, including backup and recovery. The former usually backups all graph data, which leads to the cost of writing redundant data since some data are static during iterations. The latter always loads backup data from remote machines to recovery iterations, ignoring the usage of data in the local disk in special scenarios, which incurs network costs. It proposes a multi-level fault tolerant mechanism, which distinguishes failures into computing task failures and node failures, and then designs different strategies for backup and recovery. For the latter, considering that the volume of data involved in computation varies with iterations, a complete backup policy and an adaptive log-based policy are presented to reduce the cost of writing redundant data. After that, at the stages of recovery, we utilize the local graph data and the remote message data to handle the recovery for task failures, but the remote data are used for node failures. Finally, extensive experiments on real datasets validate the efficiency of our solutions.
  • Related Articles

    [1]Zhao Anning, Xu Nuo, Liu Kang, Luo Li, Pan Bingzheng, Bo Ziyi, Tan Chenghao. The Synthesis of Multiple Stateful Logic Gates for In-memory Computing with Low Wear[J]. Journal of Computer Research and Development, 2025, 62(3): 620-632. DOI: 10.7544/issn1000-1239.202440627
    [2]Xu Lijuan, Wang Bailing, Yang Meihong, Zhao Dawei, Han Jideng. Multi-Mode Attack Detection and Evaluation of Abnormal States for Industrial Control Network[J]. Journal of Computer Research and Development, 2021, 58(11): 2333-2349. DOI: 10.7544/issn1000-1239.2021.20210598
    [3]Li Yin. Test Suite Generating for Stateful Web Services Using Interface Contract[J]. Journal of Computer Research and Development, 2017, 54(3): 609-622. DOI: 10.7544/issn1000-1239.2017.20151045
    [4]Yi Maoxiang, Yu Chenglin, Fang Xiangsheng, Huang Zhengfeng, Ouyang Yiming, Liang Huaguo. State Vector Selective Generation of Parallel Folding Counters[J]. Journal of Computer Research and Development, 2015, 52(11): 2468-2475. DOI: 10.7544/issn1000-1239.2015.20140591
    [5]Zhao Ze, Shang Pengfei, Liu Qiang, Cui Li. Identification of Communication State for Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2014, 51(11): 2382-2392. DOI: 10.7544/issn1000-1239.2014.20131079
    [6]Li Zhetao, Wang Zhiqiang, Zhu Gengming, Li Renfa. A Data Gathering MAC Protocol Based on State Translation and Grouping for WSN[J]. Journal of Computer Research and Development, 2014, 51(6): 1167-1175.
    [7]Xie Zhengwei, Zhai Ying, Deng Peimin, Yi Zhong. Algebraic Properties of Probabilistic Finite State Automata[J]. Journal of Computer Research and Development, 2013, 50(12): 2691-2698.
    [8]Yu Wanjun, Liu Dayou, Liu Quan, Yang Bo. An Approach to Monitoring and Controlling Workflow Systems Based on the Instance State[J]. Journal of Computer Research and Development, 2006, 43(8): 1345-1353.
    [9]Zhang Shichao, Xu Yinjun, Gu Ning, Shi Baile. A Norm-Driven Grid Workflow State Machine Model[J]. Journal of Computer Research and Development, 2006, 43(2): 307-313.
    [10]Huang Kui, Wu Yichuan, Zheng Jianping, Wu Zhimei. Forwarding State Reduction Scheme Based on Interface Format for Sparse Mode Multicast[J]. Journal of Computer Research and Development, 2005, 42(9): 1564-1570.

Catalog

    Article views (1191) PDF downloads (343) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return