• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

集中式集群资源调度框架的可扩展性优化

毛安琪, 汤小春, 丁朝, 李战怀

毛安琪, 汤小春, 丁朝, 李战怀. 集中式集群资源调度框架的可扩展性优化[J]. 计算机研究与发展, 2021, 58(3): 497-512. DOI: 10.7544/issn1000-1239.2021.20200501
引用本文: 毛安琪, 汤小春, 丁朝, 李战怀. 集中式集群资源调度框架的可扩展性优化[J]. 计算机研究与发展, 2021, 58(3): 497-512. DOI: 10.7544/issn1000-1239.2021.20200501
Mao Anqi, Tang Xiaochun, Ding Zhao, Li Zhanhuai. Scalability for Monolithic Schedulers of Cluster Resource Management Framework[J]. Journal of Computer Research and Development, 2021, 58(3): 497-512. DOI: 10.7544/issn1000-1239.2021.20200501
Citation: Mao Anqi, Tang Xiaochun, Ding Zhao, Li Zhanhuai. Scalability for Monolithic Schedulers of Cluster Resource Management Framework[J]. Journal of Computer Research and Development, 2021, 58(3): 497-512. DOI: 10.7544/issn1000-1239.2021.20200501
毛安琪, 汤小春, 丁朝, 李战怀. 集中式集群资源调度框架的可扩展性优化[J]. 计算机研究与发展, 2021, 58(3): 497-512. CSTR: 32373.14.issn1000-1239.2021.20200501
引用本文: 毛安琪, 汤小春, 丁朝, 李战怀. 集中式集群资源调度框架的可扩展性优化[J]. 计算机研究与发展, 2021, 58(3): 497-512. CSTR: 32373.14.issn1000-1239.2021.20200501
Mao Anqi, Tang Xiaochun, Ding Zhao, Li Zhanhuai. Scalability for Monolithic Schedulers of Cluster Resource Management Framework[J]. Journal of Computer Research and Development, 2021, 58(3): 497-512. CSTR: 32373.14.issn1000-1239.2021.20200501
Citation: Mao Anqi, Tang Xiaochun, Ding Zhao, Li Zhanhuai. Scalability for Monolithic Schedulers of Cluster Resource Management Framework[J]. Journal of Computer Research and Development, 2021, 58(3): 497-512. CSTR: 32373.14.issn1000-1239.2021.20200501

集中式集群资源调度框架的可扩展性优化

基金项目: 国家重点研发计划项目(2018YFB1003400)
详细信息
  • 中图分类号: TP311

Scalability for Monolithic Schedulers of Cluster Resource Management Framework

Funds: This work was supported by the National Key Research and Development Program of China (2018YFB1003400).
  • 摘要: 集中式集群资源管理系统既能够确保全局资源状态的一致性亦拥有多种调度模型, 因此被广泛应用于实际系统中.但是, 当集中式资源管理器在接收并处理大规模的周期性心跳信息时, 由于其采用单一节点来维护全局资源状态, 所以资源管理器的负载压力急剧增加, 导致调度能力降低, 影响了集群系统的可扩展性.针对上述问题, 提出一种“没有变化就不更新”的思想, 取代集中资源管理的定时更新机制, 改善了集中式资源管理系统的可扩展性.首先, 通过计算节点引入基于差分的心跳信息处理模型, 使得未发生状态变化的节点不必发送心跳消息, 从而减少消息发送的规模和次数; 其次, 针对节点宕机监测过程, 提出基于环形监视的节点监控模型, 让各个计算节点之间互相监视对方的宕机状态, 从而将周期性监测压力转移到计算节点; 最后, 给出这2种模型在集中式资源管理系统YARN上的实现, 并针对改进前后的系统进行实验测试.通过实验验证, 当集群达到1万个节点且心跳时间间隔3 s时, 改进后YARN系统的心跳信息处理效率以及资源更新效率相比原YARN系统提高40%左右.另外, 改进后YARN系统管理集群节点规模相比原YARN系统扩大1.88倍以上.
    Abstract: The significant advantages of monolithic cluster resource management system in ensuring the consistency of global resource status and applying multiple scheduling models make it widely used in actual systems. Howerver, the performance of the monolithic resource manager in a large cluster management environment does not meet expectations, because it uses a single node to maintain the global resource state. When the resource manager is receiving and processing large-scale periodic heartbeat information, the load pressure on the resource manager will increase sharply, which leads to a scalability bottleneck. In order to solve these problems, this paper proposes the idea of “no change, no update” to replace the periodic update mechanism of the resource manager. In our paper, we briefly summarize three main topics. Firstly, we introduce a differential-based heartbeat information processing model in the computing node. When the resource status of the computing node has not changed, it will not send the message to the resource manager, thereby reducing the size and number of messages. Secondly, we propose a ring network monitoring model between computing nodes. By adopting this mode, the periodic monitoring pressure can be transferred to the computing nodes. Finally, we implement these two models on YARN. After experimental verification, we can conclude that when the cluster reaches 10 000 nodes and the heartbeat interval is 3 s, the YARN based on our models increases the heartbeat information processing efficiency and resource update efficiency by about 40%. In addition, the scale of the cluster managed by improved YARN is more than 1.88 times that of the original YARN.
  • 期刊类型引用(5)

    1. 郑磊,韩鹏军,田晨雨,张琦,钱隆. 基于威胁建模的网络安全日志自动化分析技术. 微型电脑应用. 2023(07): 154-156+180 . 百度学术
    2. 魏丽英,杨立华. 智能化无线通信信道安全容量控制仿真. 计算机仿真. 2022(09): 230-233+238 . 百度学术
    3. 钟煜明,陈长辉. 网络安全分析中的大数据综合研究. 现代信息科技. 2020(08): 142-144 . 百度学术
    4. 刘鸿楠. 网络交易安全与民商法保护的相关性窥探. 法制与社会. 2019(27): 11-12 . 百度学术
    5. 曾峰,崔宁. 无线传感器网络安全技术. 电子技术与软件工程. 2019(19): 195-196 . 百度学术

    其他类型引用(3)

计量
  • 文章访问数:  719
  • HTML全文浏览量:  3
  • PDF下载量:  306
  • 被引次数: 8
出版历程
  • 发布日期:  2021-02-28

目录

    /

    返回文章
    返回