• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Ying Changtian, Yu Jiong, Bian Chen, Wang Weiqing, Lu Liang, Qian Yurong. Criticality Checkpoint Management Strategy Based on RDD Characteristics in Spark[J]. Journal of Computer Research and Development, 2017, 54(12): 2858-2872. DOI: 10.7544/issn1000-1239.2017.20160717
Citation: Ying Changtian, Yu Jiong, Bian Chen, Wang Weiqing, Lu Liang, Qian Yurong. Criticality Checkpoint Management Strategy Based on RDD Characteristics in Spark[J]. Journal of Computer Research and Development, 2017, 54(12): 2858-2872. DOI: 10.7544/issn1000-1239.2017.20160717

Criticality Checkpoint Management Strategy Based on RDD Characteristics in Spark

More Information
  • Published Date: November 30, 2017
  • The default fault tolerance mechanism of Spark is setting the checkpoint by programmer. When facing data loss, Spark recomputes the tasks based on the RDD lineage to recovery the data. Meanwhile, in the circumstance of complicated application with multiple iterations and large amount of input data, the recovery process may cost a lot of computation time. In addition, the recompute task only considers the data locality by default regardless the computing capabilities of nodes, which increases the length of recovery time. To reduce recovery cost, we establish and demonstrate the Spark execution model, the checkpoint model and the RDD critically model. Based on the theory, the criticality checkpoint management (CCM) strategy is proposed, which includes the checkpoint algorithm, the failure recovery algorithm and the cleaning algorithm. The checkpoint algorithm is used to analyze the RDD charactersitics and its influence on the recovery time, and selects valuable RDDs as checkpoints. The failure recovery algorithm is used to choose the appropriate nodes to recompute the lost RDDs, and cleaning algorithm cleans checkpoints when the disk space becomes insufficient. Experimental results show that: the strategy can reduce the recovery overhead efficiently, select valuable RDDs as checkpoints, and increase the efficiency of disk usage on the nodes with sacrificing the execution time slightly.
  • Related Articles

    [1]Lu Sidi, He Yuankai, Shi Weisong. Vehicle Computing: An Emerging Computing Paradigm for the Autonomous Driving Era[J]. Journal of Computer Research and Development, 2025, 62(1): 2-21. DOI: 10.7544/issn1000-1239.202440538
    [2]Chen Xiao, Huang Muhong, Tian Yifan, Wang Yan, Cao Sheng, Zhang Xiaosong. Internet of Vehicles Data Sharing Scheme via Blockchain Sharding[J]. Journal of Computer Research and Development, 2024, 61(9): 2246-2260. DOI: 10.7544/issn1000-1239.202330899
    [3]Le Junqing, Tan Zhouyong, Zhang Di, Liu Gao, Xiang Tao, Liao Xiaofeng. Secure and Efficient Federated Learning for Continuous IoV Data Sharing[J]. Journal of Computer Research and Development, 2024, 61(9): 2199-2212. DOI: 10.7544/issn1000-1239.202330894
    [4]Tang Xiaolan, Liang Yuting, Chen Wenlong. Multi-Stage Federated Learning Mechanism with non-IID Data in Internet of Vehicles[J]. Journal of Computer Research and Development, 2024, 61(9): 2170-2184. DOI: 10.7544/issn1000-1239.202330885
    [5]Kuang Boyu, Li Yuze, Gu Fangming, Su Mang, Fu Anmin. Review of Internet of Vehicle Security Research: Threats, Countermeasures, and Future Prospects[J]. Journal of Computer Research and Development, 2023, 60(10): 2304-2321. DOI: 10.7544/issn1000-1239.202330464
    [6]Zheng Yingying, Zhou Junlong, Shen Yufan, Cong Peijin, Wu Zebin. Time and Energy-Sensitive End-Edge-Cloud Resource Provisioning Optimization Method for Collaborative Vehicle-Road Systems[J]. Journal of Computer Research and Development, 2023, 60(5): 1037-1052. DOI: 10.7544/issn1000-1239.202220734
    [7]Han Mu, Yang Chen, Hua Lei, Liu Shuai, Ma Shidian. Vehicle Pseudonym Management Scheme in Internet of Vehicles for Mobile Edge Computing[J]. Journal of Computer Research and Development, 2022, 59(4): 781-795. DOI: 10.7544/issn1000-1239.20200620
    [8]Yao Hailong, Yan Qiao. Cryptanalysis and Design of Anonymous Authentication Protocol for Value-Added Services in Internet of Vehicles[J]. Journal of Computer Research and Development, 2022, 59(2): 440-451. DOI: 10.7544/issn1000-1239.20200487
    [9]Hou Wanyu, Sun Yu, Li Dawei, Cui Jian, Guan Zhenyu, Liu Jianwei. Anonymous Authentication and Key Agreement Protocol for 5G-V2V Based on PUF[J]. Journal of Computer Research and Development, 2021, 58(10): 2265-2277. DOI: 10.7544/issn1000-1239.2021.20210486
    [10]Zhou Huan, Xu Shouzhi, and Li Chengxia. A V2V Broadcast Protocol for Chain Collision Avoidance on Highways[J]. Journal of Computer Research and Development, 2009, 46(12): 2062-2067.
  • Cited by

    Periodical cited type(9)

    1. 方海泉,邓明明. 具有自主学习与记忆功能的智能政务问答系统研究. 电子技术应用. 2024(01): 21-26 .
    2. 曹策,陈焰,周兰江. 基于深度学习和文本情感的上市公司财务舞弊识别方法. 计算机工程与应用. 2024(04): 338-346 .
    3. 胡菊香,吕学强,游新冬,周建设. 聚类标注和多粒度特征融合的基金新闻分类. 小型微型计算机系统. 2024(02): 257-264 .
    4. 王润周,张新生,王明虎. 融合动态掩码注意力与多教师多特征知识蒸馏的文本分类. 中文信息学报. 2024(03): 113-129 .
    5. 康雷,张瑜. 基于文本挖掘的俄罗斯羽绒服消费需求. 现代纺织技术. 2024(08): 108-116 .
    6. 文益民,员喆,余航. 一种新的半监督归纳迁移学习框架:Co-Transfer. 计算机研究与发展. 2023(07): 1603-1614 . 本站查看
    7. 丁晓蔚,季婧,赵笑宇,王本强,丁毅杰,王献东. 互联网金融安全情绪感知及风险预警应用研究——基于BERT所作的探索. 情报杂志. 2023(09): 57-70 .
    8. 毕鑫,聂豪杰,赵相国,袁野,王国仁. 面向知识图谱约束问答的强化学习推理技术. 软件学报. 2023(10): 4565-4583 .
    9. 胡丹. 金融学文本大数据挖掘方法分析. 互联网周刊. 2022(09): 12-14 .

    Other cited types(17)

Catalog

    Article views (1273) PDF downloads (420) Cited by(26)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return