• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

飞腾处理器上向量三角函数的设计实现与优化

沈洁, 龙标, 姜浩, 黄春

沈洁, 龙标, 姜浩, 黄春. 飞腾处理器上向量三角函数的设计实现与优化[J]. 计算机研究与发展, 2020, 57(12): 2610-2620. DOI: 10.7544/issn1000-1239.2020.20190721
引用本文: 沈洁, 龙标, 姜浩, 黄春. 飞腾处理器上向量三角函数的设计实现与优化[J]. 计算机研究与发展, 2020, 57(12): 2610-2620. DOI: 10.7544/issn1000-1239.2020.20190721
Shen Jie, Long Biao, Jiang Hao, Huang Chun. Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors[J]. Journal of Computer Research and Development, 2020, 57(12): 2610-2620. DOI: 10.7544/issn1000-1239.2020.20190721
Citation: Shen Jie, Long Biao, Jiang Hao, Huang Chun. Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors[J]. Journal of Computer Research and Development, 2020, 57(12): 2610-2620. DOI: 10.7544/issn1000-1239.2020.20190721
沈洁, 龙标, 姜浩, 黄春. 飞腾处理器上向量三角函数的设计实现与优化[J]. 计算机研究与发展, 2020, 57(12): 2610-2620. CSTR: 32373.14.issn1000-1239.2020.20190721
引用本文: 沈洁, 龙标, 姜浩, 黄春. 飞腾处理器上向量三角函数的设计实现与优化[J]. 计算机研究与发展, 2020, 57(12): 2610-2620. CSTR: 32373.14.issn1000-1239.2020.20190721
Shen Jie, Long Biao, Jiang Hao, Huang Chun. Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors[J]. Journal of Computer Research and Development, 2020, 57(12): 2610-2620. CSTR: 32373.14.issn1000-1239.2020.20190721
Citation: Shen Jie, Long Biao, Jiang Hao, Huang Chun. Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors[J]. Journal of Computer Research and Development, 2020, 57(12): 2610-2620. CSTR: 32373.14.issn1000-1239.2020.20190721

飞腾处理器上向量三角函数的设计实现与优化

基金项目: “核高基”国家科技重大专项基金项目(2018ZX01029-103);国家自然科学基金项目(61902407);湖南省自然科学基金资助项目(2018JJ3616)
详细信息
  • 中图分类号: TP311

Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors

Funds: This work was supported by the National Science and Technology Major Projects of Hegaoji (2018ZX01029-103), the National Natural Science Foundation of China (61902407), and Hunan Provincial Natural Science Foundation of China (2018JJ3616).
  • 摘要: 得益于单指令多数据(single instruction multiple data, SIMD)向量化技术,处理器浮点计算能力获得了成倍的提升,然而当前SIMD向量部件和指令集仅支持加、减、乘、除、逻辑运算等基本操作,对浮点超越函数没有提供直接的支持.作为浮点计算中最耗时的一类函数,如何提高其性能成为底层数学库优化工作的一个重点.面向超越函数中的三角函数,提出一种利用SIMD向量部件设计、实现与优化向量三角函数的方法.该方法结合标量数学库分段计算与向量数学库向量化实现的优势,增加和优化了向量三角函数中的分支处理,既减少了函数实现中的冗余计算,又提高了分支情况下向量部件的利用率.在飞腾处理器上的实验表明:所提优化方法既保证了向量三角函数的精度,同时有效提高了函数性能,与原始向量三角函数相比平均性能加速比为2.04倍.
    Abstract: Benefitting from SIMD (single instruction multiple data) vectorization, processors’ floating-point compute capability has been increased largely. However, the current SIMD units and SIMD instruction sets only support basic operations like arithmetic operations (addition, subtraction, multiplication, and division) and logical operations, and do not provide direct support for floating-point transcendental functions. Since transcendental functions are the most time-consuming functions in floating-point computing, improving these functions’ performance has become a key point in math library optimization. In this paper, we design and propose a new method that utilizes SIMD units to vectorize and optimize trigonometric functions (which are one class of transcendental functions). While most vector implementations use a unified algorithm to process all floating-point numbers, we select and import several optimizable branches from the scalar implementations to process different ranges of floating-point numbers. We further utilize a series of optimization techniques to accelerate the vectorized scalar code. By combining the piecewise computing of the scalar implementations and the vectorization advantage of the vector implementations, our method optimizes branch processing in vector trigonometric functions, reduces redundant computation, and increases the utilization of SIMD units. Experimental results show that our method meets accuracy requirement, and effectively improves trigonometric functions’ performance. Compared with original vector trigonometric functions, the average performance speedup of optimized functions is 2.04x.
  • 期刊类型引用(8)

    1. 张宇姣,徐健,吴迪. 基于图表示学习的知识图谱时序推理模型. 济南大学学报(自然科学版). 2025(02): 272-277 . 百度学术
    2. 陆佳炜,王小定,朱昊天,程振波,肖刚. 一种融合实体图上下文的三维旋转知识图谱表示学习. 小型微型计算机系统. 2023(01): 124-131 . 百度学术
    3. 陈小英,熊盛武,王盛,张士伟. 基于上下文时序关联的时序知识图谱嵌入方法. 武汉大学学报(理学版). 2023(02): 249-257 . 百度学术
    4. 卢菁,陈婉璐,刘丛. KGU-SP:一种挖掘标准模式的知识图谱更新方法. 小型微型计算机系统. 2023(06): 1177-1183 . 百度学术
    5. 魏飞鸣,许倩倩,顾网平,李永晨. 知识图谱在探测与识别领域中的应用分析. 制导与引信. 2023(04): 1-8+28 . 百度学术
    6. 马昂,于艳华,杨胜利,石川,李劼,蔡修秀. 基于强化学习的知识图谱综述. 计算机研究与发展. 2022(08): 1694-1722 . 本站查看
    7. 宁原隆,周刚,卢记仓,杨大伟,张田. 一种融合关系路径与实体描述信息的知识图谱表示学习方法. 计算机研究与发展. 2022(09): 1966-1979 . 本站查看
    8. 夏毅,兰明敬,陈晓慧,罗军勇,周刚,何鹏. 可解释的知识图谱推理方法综述. 网络与信息安全学报. 2022(05): 1-25 . 百度学术

    其他类型引用(16)

计量
  • 文章访问数:  761
  • HTML全文浏览量:  13
  • PDF下载量:  264
  • 被引次数: 24
出版历程
  • 发布日期:  2020-11-30

目录

    /

    返回文章
    返回