• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

双精度浮点矩阵乘协处理器研究

贾迅, 邬贵明, 谢向辉, 吴东

贾迅, 邬贵明, 谢向辉, 吴东. 双精度浮点矩阵乘协处理器研究[J]. 计算机研究与发展, 2019, 56(2): 410-420. DOI: 10.7544/issn1000-1239.2019.20170908
引用本文: 贾迅, 邬贵明, 谢向辉, 吴东. 双精度浮点矩阵乘协处理器研究[J]. 计算机研究与发展, 2019, 56(2): 410-420. DOI: 10.7544/issn1000-1239.2019.20170908
Jia Xun, Wu Guiming, Xie Xianghui, Wu Dong. A Coprocessor for Double-Precision Floating-Point Matrix Multiplication[J]. Journal of Computer Research and Development, 2019, 56(2): 410-420. DOI: 10.7544/issn1000-1239.2019.20170908
Citation: Jia Xun, Wu Guiming, Xie Xianghui, Wu Dong. A Coprocessor for Double-Precision Floating-Point Matrix Multiplication[J]. Journal of Computer Research and Development, 2019, 56(2): 410-420. DOI: 10.7544/issn1000-1239.2019.20170908
贾迅, 邬贵明, 谢向辉, 吴东. 双精度浮点矩阵乘协处理器研究[J]. 计算机研究与发展, 2019, 56(2): 410-420. CSTR: 32373.14.issn1000-1239.2019.20170908
引用本文: 贾迅, 邬贵明, 谢向辉, 吴东. 双精度浮点矩阵乘协处理器研究[J]. 计算机研究与发展, 2019, 56(2): 410-420. CSTR: 32373.14.issn1000-1239.2019.20170908
Jia Xun, Wu Guiming, Xie Xianghui, Wu Dong. A Coprocessor for Double-Precision Floating-Point Matrix Multiplication[J]. Journal of Computer Research and Development, 2019, 56(2): 410-420. CSTR: 32373.14.issn1000-1239.2019.20170908
Citation: Jia Xun, Wu Guiming, Xie Xianghui, Wu Dong. A Coprocessor for Double-Precision Floating-Point Matrix Multiplication[J]. Journal of Computer Research and Development, 2019, 56(2): 410-420. CSTR: 32373.14.issn1000-1239.2019.20170908

双精度浮点矩阵乘协处理器研究

基金项目: 国家自然科学基金项目(91430214,61732018)
详细信息
  • 中图分类号: TP302

A Coprocessor for Double-Precision Floating-Point Matrix Multiplication

  • 摘要: 矩阵乘运算在多个应用领域特别是数值计算领域被广泛使用,但双精度浮点矩阵乘在CPU,GPGPU,FPGA等现有计算平台上的性能和效率受限,其往往成为大规模数值计算应用的性能瓶颈.针对该问题,以线性阵列计算结构为基础,研究了双精度浮点矩阵乘的定制加速.首先,对线性阵列计算结构进行了双缓冲优化并设计了针对双缓冲的存储访问调度,以提高结构的计算效率.其次,提出了矩阵乘协处理器和加速计算系统的结构,构建了协处理器的性能模型并对其结构设计空间进行了探索.最后,验证了协处理器的功能正确性并在某主流工艺下评估了其硬件开销.实验结果表明,设计的双精度浮点矩阵乘协处理器可以达到3 TFLOPS的计算性能和99%的计算效率.与NVIDIA K40 GPGPU相比,协处理器执行双精度浮点矩阵乘的性能是K40的1.95倍,而面积开销仅为K40的21.05%.探索了定制加速结构设计在高性能计算中的应用,对现有计算系统的性能提升具有一定的参考价值.
    Abstract: Matrix multiplication has been widely used in various application fields, especially the field of numerical computation. However, double-precision floating-point matrix multiplication suffers from non-optimal performance or efficiency on contemporary computing platforms, including CPU, GPGPU and FPGA. To address this problem, acceleration of double-precision floating-point matrix multiplication with a customized coprocessor is proposed in this paper, which adopts linear array as the basic building block. Firstly, double-buffering technique and optimized memory scheduling are applied to the basic linear array for better computation efficiency. Then, architecture of the matrix multiplication coprocessor and coprocessor-based accelerated computing system are formulated. Furthermore, a performance model tailored for the coprocessor is developed and the design space of coprocessor is explored in detail. Finally, functional correctness of the coprocessor is verified and its hardware implementation cost under mainstream technology node is evaluated. Experimental results show that the proposed coprocessor can achieve the performance of 3 TFLOPS and the efficiency of 99%. Compared with NVIDIA K40 GPGPU for executing double-precision floating-point matrix multiplication, the coprocessor proposed in this paper achieves 1.95× performance with hardware overheads of only 21.05% in area. This work explores the application of customized acceleration in high-performance computing and has certain guidance for improving performance of existing computing systems.
  • 期刊类型引用(4)

    1. 严忻恺,陈芳园. 弱耦合协处理器设计方法研究——以人工智能应用为例. 南京师大学报(自然科学版). 2024(03): 112-121 . 百度学术
    2. 汪杨,王晓蕾,袁子昂,袁儒明. 一种基于NoC多核系统的矩阵乘法映射技术. 电子科技. 2021(05): 54-60 . 百度学术
    3. 高营,鞠虎,刘德. 开源处理器Rocket的异构SoC原型验证设计. 单片机与嵌入式系统应用. 2021(07): 12-15+18 . 百度学术
    4. 贾迅,钱磊,原昊,张昆,吴东. 矩阵乘协处理器上BLAS level-3运算的设计. 计算机工程与科学. 2020(11): 1913-1921 . 百度学术

    其他类型引用(3)

计量
  • 文章访问数:  932
  • HTML全文浏览量:  1
  • PDF下载量:  476
  • 被引次数: 7
出版历程
  • 发布日期:  2019-01-31

目录

    /

    返回文章
    返回