Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors

Shen Jie; Long Biao; Jiang Hao; Huang Chun

doi:10.7544/issn1000-1239.2020.20190721

Journal of Computer Research and Development > 2020 > 57(12): 2610-2620. > DOI: 10.7544/issn1000-1239.2020.20190721 CSTR: 32373.14.issn1000-1239.2020.20190721

Shen Jie, Long Biao, Jiang Hao, Huang Chun. Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors[J]. Journal of Computer Research and Development, 2020, 57(12): 2610-2620. DOI: 10.7544/issn1000-1239.2020.20190721

Citation:

PDF (1873 KB)

Implementation and Optimization of Vector Trigonometric Functions on Phytium Processors

(College of Computer, National University of Defense Technology, Changsha 410073)

Funds: This work was supported by the National Science and Technology Major Projects of Hegaoji (2018ZX01029-103), the National Natural Science Foundation of China (61902407), and Hunan Provincial Natural Science Foundation of China (2018JJ3616).

More Information

Published Date: November 30, 2020

Graphical Abstract

Abstract

Abstract

Benefitting from SIMD (single instruction multiple data) vectorization, processors’ floating-point compute capability has been increased largely. However, the current SIMD units and SIMD instruction sets only support basic operations like arithmetic operations (addition, subtraction, multiplication, and division) and logical operations, and do not provide direct support for floating-point transcendental functions. Since transcendental functions are the most time-consuming functions in floating-point computing, improving these functions’ performance has become a key point in math library optimization. In this paper, we design and propose a new method that utilizes SIMD units to vectorize and optimize trigonometric functions (which are one class of transcendental functions). While most vector implementations use a unified algorithm to process all floating-point numbers, we select and import several optimizable branches from the scalar implementations to process different ranges of floating-point numbers. We further utilize a series of optimization techniques to accelerate the vectorized scalar code. By combining the piecewise computing of the scalar implementations and the vectorization advantage of the vector implementations, our method optimizes branch processing in vector trigonometric functions, reduces redundant computation, and increases the utilization of SIMD units. Experimental results show that our method meets accuracy requirement, and effectively improves trigonometric functions’ performance. Compared with original vector trigonometric functions, the average performance speedup of optimized functions is 2.04x.
- vector trigonometric functions,
- segmented computing,
- SIMD vectorization,
- performance optimization,
- Phytium processors

FullText(HTML)

References (0)

[1]	Ye Guixin, Zhang Yuxiang, Zhang Cheng, Zhao Jiaqi, Wang Huanting. Automatic Optimization Heuristics Method for OpenCL Program Based on Graph Neural Network[J]. Journal of Computer Research and Development, 2023, 60(5): 1121-1135. DOI: 10.7544/issn1000-1239.202110943
[2]	Qin Junping, Deng Qingxu, Sun Shiwen, Renqing Daoerji, Tong Haibin, Su Xianli. Indoor Trajectory Tracking Algorithm Based on Time Series Heuristic Information[J]. Journal of Computer Research and Development, 2017, 54(12): 2698-2710. DOI: 10.7544/issn1000-1239.2017.20160803
[3]	Shao Zengzhen, Wang Hongguo, Liu Hong, Song Chaochao, Meng Chunhua, Yu Hongling. Heuristic Optimization Algorithms of Multi-Carpooling Problem Based on Two-Stage Clustering[J]. Journal of Computer Research and Development, 2013, 50(11): 2325-2335.
[4]	Li Ziqiang, Tian Zhuojun, Wang Yishou, Yue Benxian. A Fast Heuristic Parallel Ant Colony Algorithm for Circles Packing Problem with the Equilibrium Constraints[J]. Journal of Computer Research and Development, 2012, 49(9): 1899-1909.
[5]	Gu Wenxiang, Wang Jinyan, Yin Minghao. Knowledge Compilation Using Extension Rule Based on MCN and MO Heuristic Strategies[J]. Journal of Computer Research and Development, 2011, 48(11): 2064-2073.
[6]	Wei Wei, Ouyang Dantong, Lü Shuai, Yin Minghao. An Approach Combining Incremental Search and Heuristic Search for Solving Multiobjective Problems[J]. Journal of Computer Research and Development, 2010, 47(11): 1954-1961.
[7]	Luo Qing, Lin Yaping. Heuristic Traversal Path Algorithm Based on Linear Aggregation in Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2010, 47(11): 1919-1927.
[8]	Liu Yi, Zhang Xin, Li He, Qian Depei. A Heuristic Task Allocation Algorithm for Multi-Core Based Parallel Systems[J]. Journal of Computer Research and Development, 2009, 46(6): 1058-1064.
[9]	Liu Quan, Gao Yang, Chen Daoxu, Sun Jigui, Yao Wangshu. A Logical Reinforcement Learning Method Based on Heuristic Contour List[J]. Journal of Computer Research and Development, 2008, 45(11): 1824-1830.
[10]	Liu Linfeng, Liu Ye. A Heuristic Cluster Control Algorithm of Wireless Sensor Networks Topology[J]. Journal of Computer Research and Development, 2008, 45(7): 1099-1105.