• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

内存计算框架局部数据优先拉取策略

卞琛, 于炯, 修位蓉, 钱育蓉, 英昌甜, 廖彬

卞琛, 于炯, 修位蓉, 钱育蓉, 英昌甜, 廖彬. 内存计算框架局部数据优先拉取策略[J]. 计算机研究与发展, 2017, 54(4): 787-803. DOI: 10.7544/issn1000-1239.2017.20160049
引用本文: 卞琛, 于炯, 修位蓉, 钱育蓉, 英昌甜, 廖彬. 内存计算框架局部数据优先拉取策略[J]. 计算机研究与发展, 2017, 54(4): 787-803. DOI: 10.7544/issn1000-1239.2017.20160049
Bian Chen, Yu Jiong, Xiu Weirong, Qian Yurong, Ying Changtian, Liao Bin. Partial Data Shuffled First Strategy for In-Memory Computing Framework[J]. Journal of Computer Research and Development, 2017, 54(4): 787-803. DOI: 10.7544/issn1000-1239.2017.20160049
Citation: Bian Chen, Yu Jiong, Xiu Weirong, Qian Yurong, Ying Changtian, Liao Bin. Partial Data Shuffled First Strategy for In-Memory Computing Framework[J]. Journal of Computer Research and Development, 2017, 54(4): 787-803. DOI: 10.7544/issn1000-1239.2017.20160049
卞琛, 于炯, 修位蓉, 钱育蓉, 英昌甜, 廖彬. 内存计算框架局部数据优先拉取策略[J]. 计算机研究与发展, 2017, 54(4): 787-803. CSTR: 32373.14.issn1000-1239.2017.20160049
引用本文: 卞琛, 于炯, 修位蓉, 钱育蓉, 英昌甜, 廖彬. 内存计算框架局部数据优先拉取策略[J]. 计算机研究与发展, 2017, 54(4): 787-803. CSTR: 32373.14.issn1000-1239.2017.20160049
Bian Chen, Yu Jiong, Xiu Weirong, Qian Yurong, Ying Changtian, Liao Bin. Partial Data Shuffled First Strategy for In-Memory Computing Framework[J]. Journal of Computer Research and Development, 2017, 54(4): 787-803. CSTR: 32373.14.issn1000-1239.2017.20160049
Citation: Bian Chen, Yu Jiong, Xiu Weirong, Qian Yurong, Ying Changtian, Liao Bin. Partial Data Shuffled First Strategy for In-Memory Computing Framework[J]. Journal of Computer Research and Development, 2017, 54(4): 787-803. CSTR: 32373.14.issn1000-1239.2017.20160049

内存计算框架局部数据优先拉取策略

基金项目: 国家自然科学基金项目(61262088,61462079,61363083,61562086);新疆维吾尔自治区高校科研计划(XJEDU2016S106)
详细信息
  • 中图分类号: TP311

Partial Data Shuffled First Strategy for In-Memory Computing Framework

  • 摘要: 内存计算框架的低延迟特性大幅提高了集群的计算效率,但Shuffle过程的性能瓶颈仍不可规避.宽依赖的同步操作导致大多数工作节点等待慢节点的计算结果,同步过程不仅浪费计算资源,更增加了作业延时,这一现象在异构集群环境下尤为突出.针对内存计算框架Shuffle操作的同步问题,建立了资源需求模型、执行效率模型和任务分配及调度模型.给出了分配效能熵(allocation efficiency entropy, AEE)和节点贡献度(worker contribution degree, WCD)的定义,提出了算法的优化目标.根据模型的相关定义求解,设计了局部数据优先拉取算法(partial data shuffled first algorithm, PDSF),通过高效节点优先调度,提高流水线与宽依赖任务的时间重合度,减少宽依赖Shuffle过程的同步延时,优化集群资源利用率;通过适度倾斜的任务分配,在保障慢节点计算连续性的前提下,提高分配任务量与节点计算能力的适应度,优化作业执行效率;通过分析算法的相关优化原则,证明了算法的帕累托最优性.实验表明:PDSF算法提高了内存计算框架的作业执行效率,并使集群资源得到有效利用.
    Abstract: In-memory computing framework has greatly improved the computing efficiency of cluster, but the low performance of Shuffle operation cannot be ignored. There is a compulsory synchronous operation of wide dependence node on in-memory computing framework, and most executors are obliged to delay their computing tasks to wait for the results of slowest worker, and the synchronization process not only wastes computing resources, but also extends the completion time of jobs and reduces the efficiency of implementation, and this phenomenon is even worse in heterogeneous cluster environment. In this paper, we establish the resource requirement model, job execution efficiency model, task allocation and scheduling model, give the definition of allocation efficiency entropy (AEE) and worker contribution degree (WCD). Moreover, the optimization objective of the algorithm is proposed. To solve the problem of optimizing, we design a partial data shuffled first algorithm (PDSF) which includes more innovative approaches, such as efficient executors priority scheduling, minimize executor wait time strategy and moderately inclined task allocation and so on. PDSF breaks through the restriction of parallel computing model, releases the high performance of efficient executors to decrease the duration of synchronous operation, and establish adaptive task scheduling scheme to improve the efficiency of job execution. We further analyze the correlative attributes of our algorithm, prove that PDSF conforms to Pareto optimum. Experimental results demonstrate that our algorithm optimizes the computational efficiency of in-memory computing framework, and PDSF contributes to the improvement of cluster resources utilization.
  • 期刊类型引用(9)

    1. 潘海霞,曹宁. 面向无线网络的数据传输自适应拥塞控制. 自动化与仪器仪表. 2024(01): 75-78+84 . 百度学术
    2. 江宝英,廖锋. 基于云计算的多媒体网络数据传输拥塞控制方法. 长江信息通信. 2024(11): 96-98 . 百度学术
    3. 吴欣. 基于流媒体技术的医学档案信息资源数字化传输. 微型电脑应用. 2023(08): 213-216 . 百度学术
    4. 朱振伸,范黎林,赵敬云. 多媒体网络中基于QoS的自适应SPC仿真. 计算机仿真. 2022(01): 213-217 . 百度学术
    5. 范洁,谢鑫,陈战胜. 关键姿态映射下视频动态帧目标定位方法. 计算机仿真. 2022(03): 156-159+248 . 百度学术
    6. 王健,王仲宇,朱文凯,孙洁茹,潘瑞娟,陈晓宁. 基于可穿戴设备的无线组网输液监控系统. 传感器与微系统. 2022(06): 106-108+113 . 百度学术
    7. 廖彬彬,张广兴,刁祖龙,谢高岗. 基于深度强化学习的MPTCP动态编码调度系统. 高技术通讯. 2022(07): 727-736 . 百度学术
    8. 刘伟,张涛. 移动边缘计算中基于视频内容协作分发的联合激励机制. 计算机应用研究. 2021(09): 2803-2810 . 百度学术
    9. 肖巍,卢劲伉,李博深,吴启槊,白英东,潘超. Faster RCNN优化实时人数流量检测. 长春工业大学学报. 2020(04): 369-374 . 百度学术

    其他类型引用(5)

计量
  • 文章访问数:  1302
  • HTML全文浏览量:  0
  • PDF下载量:  856
  • 被引次数: 14
出版历程
  • 发布日期:  2017-03-31

目录

    /

    返回文章
    返回