• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Mao Anqi, Tang Xiaochun, Ding Zhao, Li Zhanhuai. Scalability for Monolithic Schedulers of Cluster Resource Management Framework[J]. Journal of Computer Research and Development, 2021, 58(3): 497-512. DOI: 10.7544/issn1000-1239.2021.20200501
Citation: Mao Anqi, Tang Xiaochun, Ding Zhao, Li Zhanhuai. Scalability for Monolithic Schedulers of Cluster Resource Management Framework[J]. Journal of Computer Research and Development, 2021, 58(3): 497-512. DOI: 10.7544/issn1000-1239.2021.20200501

Scalability for Monolithic Schedulers of Cluster Resource Management Framework

Funds: This work was supported by the National Key Research and Development Program of China (2018YFB1003400).
More Information
  • Published Date: February 28, 2021
  • The significant advantages of monolithic cluster resource management system in ensuring the consistency of global resource status and applying multiple scheduling models make it widely used in actual systems. Howerver, the performance of the monolithic resource manager in a large cluster management environment does not meet expectations, because it uses a single node to maintain the global resource state. When the resource manager is receiving and processing large-scale periodic heartbeat information, the load pressure on the resource manager will increase sharply, which leads to a scalability bottleneck. In order to solve these problems, this paper proposes the idea of “no change, no update” to replace the periodic update mechanism of the resource manager. In our paper, we briefly summarize three main topics. Firstly, we introduce a differential-based heartbeat information processing model in the computing node. When the resource status of the computing node has not changed, it will not send the message to the resource manager, thereby reducing the size and number of messages. Secondly, we propose a ring network monitoring model between computing nodes. By adopting this mode, the periodic monitoring pressure can be transferred to the computing nodes. Finally, we implement these two models on YARN. After experimental verification, we can conclude that when the cluster reaches 10 000 nodes and the heartbeat interval is 3 s, the YARN based on our models increases the heartbeat information processing efficiency and resource update efficiency by about 40%. In addition, the scale of the cluster managed by improved YARN is more than 1.88 times that of the original YARN.
  • Related Articles

    [1]Zhang Xiaojian, Zhang Leilei, Zhang Zhizheng. Federated Learning Method Under User-Level Local Differential Privacy[J]. Journal of Computer Research and Development, 2025, 62(2): 472-487. DOI: 10.7544/issn1000-1239.202330167
    [2]Feng Xinyue, Yang Qiusong, Shi Lin, Wang Qing, Li Mingshu. Critical Memory Data Access Monitor Based on Dynamic Strategy Learning[J]. Journal of Computer Research and Development, 2019, 56(7): 1470-1487. DOI: 10.7544/issn1000-1239.2019.20180577
    [3]Yang Yatao, Zhang Yaze, Li Zichen, Zhang Fengjuan, Liu Boya. RAKA: New Authenticated Key Agreement Protocol Based on Ring-LWE[J]. Journal of Computer Research and Development, 2017, 54(10): 2187-2192. DOI: 10.7544/issn1000-1239.2017.20170477
    [4]HePan, TanChun, YuanYue, WuKaigui. Optimal Resources Allocation Algorithm for Optional Redundancy and Monitoring Strategies[J]. Journal of Computer Research and Development, 2016, 53(3): 682-696. DOI: 10.7544/issn1000-1239.2016.20148204
    [5]Peng Hu, Wu Zhijian, Zhou Xinyu, Deng Changshou. Bare-Bones Differential Evolution Algorithm Based on Trigonometry[J]. Journal of Computer Research and Development, 2015, 52(12): 2776-2788. DOI: 10.7544/issn1000-1239.2015.20140230
    [6]Fu Lingxiao, Peng Xin, and Zhao Wenyun. An Agent-Based Requirements Monitoring Framework for Internetware[J]. Journal of Computer Research and Development, 2013, 50(5): 1055-1065.
    [7]Zhu Jun, Guo Changguo, Wu Quanyuan. A Runtime Monitoring Web Services Interaction Behaviors Method Based on CPN[J]. Journal of Computer Research and Development, 2011, 48(12): 2277-2289.
    [8]Lu Zhaoxia, Zeng Guangzhou. A Cooperative Monitoring Model of Migrating Workflow[J]. Journal of Computer Research and Development, 2009, 46(3): 398-406.
    [9]Xu Jian, Zhang Kun, Liu Fengyu, Xu Manwu. An Approach to Immunity-Based Performance Monitoring and Evaluation for Computing Systems[J]. Journal of Computer Research and Development, 2007, 44(3).
    [10]Yu Wanjun, Liu Dayou, Liu Quan, Yang Bo. An Approach to Monitoring and Controlling Workflow Systems Based on the Instance State[J]. Journal of Computer Research and Development, 2006, 43(8): 1345-1353.
  • Cited by

    Periodical cited type(7)

    1. 李春生,王胡景,张可佳,富宇. 基于FCM-LSTM的软件运行资源变化规律方法研究. 微型电脑应用. 2024(03): 1-6 .
    2. 胡程,陈仕鸿. 分布式服务资源自适应弹性伸缩研究综述. 计算机科学与探索. 2024(10): 2551-2572 .
    3. 董爱强,胡学勇,于兴江,刘旭,戴发玉. 超大规模计算平台-感知混合容器集群的高性能计算作业调度. 自动化与仪器仪表. 2024(10): 60-64 .
    4. 王善勤,王立辉,颜洲,童皓. 面向高并发的消息中间件设计与实现. 海南师范大学学报(自然科学版). 2023(01): 29-37 .
    5. 郑文丽,丁晶,程立勋,蔡伊娜,包先雨. 基于令牌传递的Hadoop双层环形网络优化研究. 网络安全技术与应用. 2023(04): 1-3 .
    6. 莫理,柳本林,张树保,罗勇,刘代国. 基于分布式K-means算法的水电厂光纤测温系统可扩展性优化. 电子设计工程. 2023(16): 107-111 .
    7. 韩镇阳,张磊,任冬. 基于Kalman算法的大数据存储架构可扩展性优化算法. 网络安全与数据治理. 2023(11): 25-28 .

    Other cited types(2)

Catalog

    Article views (718) PDF downloads (305) Cited by(9)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return