• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向多核处理器的机器学习推理框架

张潇, 支天

张潇, 支天. 面向多核处理器的机器学习推理框架[J]. 计算机研究与发展, 2019, 56(9): 1977-1987. DOI: 10.7544/issn1000-1239.2019.20180786
引用本文: 张潇, 支天. 面向多核处理器的机器学习推理框架[J]. 计算机研究与发展, 2019, 56(9): 1977-1987. DOI: 10.7544/issn1000-1239.2019.20180786
Zhang Xiao, Zhi Tian. Machine Learning Inference Framework on Multi-Core Processor[J]. Journal of Computer Research and Development, 2019, 56(9): 1977-1987. DOI: 10.7544/issn1000-1239.2019.20180786
Citation: Zhang Xiao, Zhi Tian. Machine Learning Inference Framework on Multi-Core Processor[J]. Journal of Computer Research and Development, 2019, 56(9): 1977-1987. DOI: 10.7544/issn1000-1239.2019.20180786
张潇, 支天. 面向多核处理器的机器学习推理框架[J]. 计算机研究与发展, 2019, 56(9): 1977-1987. CSTR: 32373.14.issn1000-1239.2019.20180786
引用本文: 张潇, 支天. 面向多核处理器的机器学习推理框架[J]. 计算机研究与发展, 2019, 56(9): 1977-1987. CSTR: 32373.14.issn1000-1239.2019.20180786
Zhang Xiao, Zhi Tian. Machine Learning Inference Framework on Multi-Core Processor[J]. Journal of Computer Research and Development, 2019, 56(9): 1977-1987. CSTR: 32373.14.issn1000-1239.2019.20180786
Citation: Zhang Xiao, Zhi Tian. Machine Learning Inference Framework on Multi-Core Processor[J]. Journal of Computer Research and Development, 2019, 56(9): 1977-1987. CSTR: 32373.14.issn1000-1239.2019.20180786

面向多核处理器的机器学习推理框架

基金项目: 国家重点研发计划项目(2017YFA0700900,2017YFA0700902,2017YFA0700901,2017YFB1003101);国家自然科学基金项目(61472396,61432016,61473275,61522211,61532016,61521092,61502446,61672491,61602441,61602446,61732002,61702478,61732020);北京市自然科学基金项目(JQ18013);国家“九七三”重点基础研究发展计划基金项目(2015CB358800);“核心电子器件、高端通用芯片及基础软件产品”国家科技重大专项基金项目(2018ZX01031102);中国科学院科技成果转移转化重点专项(KFJ-HGZX-013);中国科学院战略性先导科技专项(B类)(XDB32050200)
详细信息
  • 中图分类号: TP389.1

Machine Learning Inference Framework on Multi-Core Processor

Funds: This work was supported by the National Key Research and Development Program of China (2017YFA0700900, 2017YFA0700902, 2017YFA0700901, 2017YFB1003101), the National Natural Science Foundation of China (61472396, 61432016, 61473275, 61522211, 61532016, 61521092, 61502446, 61672491, 61602441, 61602446, 61732002, 61702478, 61732020), the Beijing Natural Science Foundation (JQ18013), the National Basic Research Program of China (973 Program) (2015CB358800), the National Science and Technology Major Projects of Hegaoji (2018ZX01031102), the Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013), and the Strategic Priority Research Program of Chinese Academy of Sciences (XDB32050200).
  • 摘要: 近年来,深度神经网络被广泛应用于各个领域并取得了极大的成功.由于神经网络模型的尺寸和计算量的不断增加,为了能够高效迅速地完成神经网络的计算,包括GPU和专用加速器在内的很多新型硬件处理器被用于深度学习的计算.尽管如此,通用处理器作为目前最为常见和易于获得的计算平台,探究如何高效地在其上运行神经网络算法同样具有重要意义.多核处理器在训练阶段可以采用数据并行的方式来提高数据吞吐量,加快训练速度.然而在推理阶段,相比吞吐量场景,端到端的时延往往更加重要,因为这决定了处理器在某个场景下的可用性.传统的数据并行方案不能满足推理场景下对处理器小数据、低延迟的要求.因此,对于多核的处理器结构,需要在算子内部对计算进行拆分,才能够充分利用多核结构的硬件资源.考虑到处理器的计算特点,需要一种精细的方法来对计算图中的算子进行合理的拆分,才能真正有效地发挥出多核处理器的计算潜能.提出一种基于算子拆分的并行框架,可以用较小的开销实现处理器由单核向多核结构上的扩展,并且能够针对给定的网络和底层处理器特点给出一种高效的拆分方案.实验结果表明:该方法能有效降低各种网络在多核处理器上的端到端时延.
    Abstract: In recent years, deep neural network has been widely used in many domains and got huge success. Since the size and computation workload for neural network model is increasing rapidly, GPU and many new-designed domain-specific accelerators have been used in order to complete computing neural networks as soon as possible. However, the traditional general-purpose processor should not be ignored. Considering it is common and easy to get, exploring efficient way for using general-purpose processor in deep learning is meaningful. In training phase, the multi-core architecture is suitable for data parallelism which helps to increase system throughput. However, in inference phase, end-to-end latency is much more important than throughput, and traditional data parallelism could not fulfill the requirement of small batch and low latency. In order to utilize hardware resource of multi-core architecture, it is necessary to split the computation task into smaller parts which can be executed on multi-core processor in parallel. Besides, a sophisticated strategy is necessary to make sure the split plan will not affect computing efficiency on each core. In this paper, we propose a parallel framework for the multi-core general-purpose processor. It divides each operation in the neural network into smaller ones and executes them on the multiple cores in parallel. By offering some necessary assistant operations, this framework can be easily transplanted to support potential multi-core processors. Also, the framework can automatically generate an effective splitting plan for the given neural networks. The plan is designed with enough consideration of both network architecture and low-level hardware. The experimental results show that this framework can give an efficient splitting plan which substantially reduces the end-to-end latency of inference task on multi-core processor.
  • 期刊类型引用(9)

    1. 潘海霞,曹宁. 面向无线网络的数据传输自适应拥塞控制. 自动化与仪器仪表. 2024(01): 75-78+84 . 百度学术
    2. 江宝英,廖锋. 基于云计算的多媒体网络数据传输拥塞控制方法. 长江信息通信. 2024(11): 96-98 . 百度学术
    3. 吴欣. 基于流媒体技术的医学档案信息资源数字化传输. 微型电脑应用. 2023(08): 213-216 . 百度学术
    4. 朱振伸,范黎林,赵敬云. 多媒体网络中基于QoS的自适应SPC仿真. 计算机仿真. 2022(01): 213-217 . 百度学术
    5. 范洁,谢鑫,陈战胜. 关键姿态映射下视频动态帧目标定位方法. 计算机仿真. 2022(03): 156-159+248 . 百度学术
    6. 王健,王仲宇,朱文凯,孙洁茹,潘瑞娟,陈晓宁. 基于可穿戴设备的无线组网输液监控系统. 传感器与微系统. 2022(06): 106-108+113 . 百度学术
    7. 廖彬彬,张广兴,刁祖龙,谢高岗. 基于深度强化学习的MPTCP动态编码调度系统. 高技术通讯. 2022(07): 727-736 . 百度学术
    8. 刘伟,张涛. 移动边缘计算中基于视频内容协作分发的联合激励机制. 计算机应用研究. 2021(09): 2803-2810 . 百度学术
    9. 肖巍,卢劲伉,李博深,吴启槊,白英东,潘超. Faster RCNN优化实时人数流量检测. 长春工业大学学报. 2020(04): 369-374 . 百度学术

    其他类型引用(5)

计量
  • 文章访问数:  1340
  • HTML全文浏览量:  9
  • PDF下载量:  938
  • 被引次数: 14
出版历程
  • 发布日期:  2019-08-31

目录

    /

    返回文章
    返回