Citation: | Feng Jingge, He Yeping, Tao Qiuming, Ma Hengtai. SLP Vectorization Method Based on Multiple Isomorphic Transformations[J]. Journal of Computer Research and Development, 2023, 60(12): 2907-2927. DOI: 10.7544/issn1000-1239.202220354 |
SLP (superword level parallelism) is an efficient auto-vectorization method to exploit the data level parallelism for basic block, oriented to SIMD (single instruction multiple data), and SLP has been widely used in the mainstream compilers. SLP performs vectorization by finding multiple sequences of isomorphic instructions in the same basic block. Recently there is a research trend that the compilers translate the sequences of non-isomorphisic instructions into the sequences of isomorphisic instructions to extend application scope of the SLP vectorization method. In this paper, we introduce SLP-M, a novel auto-vectorization method that can effectively vectorize the code containing sequences of non-isomorphic instructions in the same basic block, translatting the code into isomorphic form by selection and conduction of multiple transformation methods based on condition judgment and benefit evaluation. A new transformation method for binary expression replacement is also proposed. SLP-M improves application scope and performance benefit for SLP. We implement SLP-M in LLVM. A set of applications are taken from some benchmarks such as SPEC CPU 2017 to compare our approach and prior techniques. The experiments show that, compared with the existing methods, the performance of SLP-M improves by 21.8% on kernel functions, and improves by 4.1% in the overall tests of the benchmarks.
[1] |
高伟,赵荣彩,韩林,等. SIMD自动向量化编译优化概述[J]. 软件学报,2015,26(6):1265−1284 doi: 10.13328/j.cnki.jos.004811
Gao Wei, Zhao Rongcai, Han Lin, et al. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software, 2015, 26(6): 1265−1284 (in Chinese) doi: 10.13328/j.cnki.jos.004811
|
[2] |
Maleki S, Gao Yaoqing, Garzar M J, et al. An evaluation of vectorizing compilers[C] //Proc of the 20th IEEE Parallel Architectures and Compilation Techniques. Piscataway, NJ: IEEE, 2011: 372−382
|
[3] |
冯竞舸,贺也平,陶秋铭. 自动向量化:近期进展与展望[J]. 通信学报,2022,43(3):180−195 doi: 10.11959/j.issn.1000-436x.2022051
Feng Jingge, He Yeping, Tao Qiuming. Auto-vectorization: Recent development and prospect[J]. Journal on Communications, 2022, 43(3): 180−195 (in Chinese) doi: 10.11959/j.issn.1000-436x.2022051
|
[4] |
Kennedy K, Allen J R. Optimizing Compilers for Modern Architectures: A Dependence-based Approach [M]. San Francisco, CA: Morgan Kaufmann, 2001
|
[5] |
Larsen S, Amarasinghe S. Exploiting superword level parallelism with multimedia instruction sets[J]. Programming Language Design and Implementation, 2000, 35(5): 145−156
|
[6] |
李玉祥,施慧,陈莉. 面向向量化的局部数据重组[J]. 小型微型计算机系统,2009,30(8):1528−1534
Li Yuxiang, Shi Hui, Chen Li. Vectorization-oriented local data regrouping[J]. Computer System, 2009, 30(8): 1528−1534 (in Chinese)
|
[7] |
Porpodas V, Magni A, Jones T M. PSLP: Padded SLP automatic vectorization[C] //Proc of the 13th IEEE Code Generation and Optimization. Piscataway, NJ: IEEE, 2015: 190−201
|
[8] |
Porpodas V, Rocha R C O, Góes L F W. Look-ahead SLP: Auto-vectorization in the presence of commutative operations[C] //Proc of the 16th ACM Code Generation and Optimization. New York: ACM, 2018: 163−174
|
[9] |
Porpodas V, Rocha R C O, Brevnov E, et al. Super-Node SLP: Optimized vectorization for code sequences containing operators and their inverse elements[C] //Proc of the 17th ACM Code Generation and Optimization. New York: ACM, 2019: 206−216
|
[10] |
Feng Jingge, He Yeping, Tao Qiuming, et al. An SLP vectorization method based on equivalent extended transformation[J/OL]. Wireless Communications and Mobile Computing, 2022[2022-04-20].https://downloads.hindawi.com/journals/wcmc/2022/1832522.pdf
|
[11] |
Porpodas V, Ratnalikar P. PostSLP: Cross-region vectorization of fully or partially vectorized code[C] //Proc of the 32nd ACM Workshop on Languages and Compilers for Parallel Computing. New York: ACM, 2019: 15−31
|
[12] |
Rocha R C O, Porpodas V, Petoumenos P, et al. Vectorization-aware loop unrolling with seed forwarding[C/OL] //Proc of the 29th ACM Int Conf on Compiler Construction. New York: ACM, 2020[2022-04-23].https://rcor.me/papers/cc20valu.pdf
|
[13] |
魏帅,赵荣彩,姚远. 面向SLP的多重循环向量化[J]. 软件学报,2012,23(7):1717−1728 doi: 10.3724/SP.J.1001.2012.04106
Wei Shuai, Zhao Rongcai, Yao Yuan. Loop-nest auto-vectorization based on SLP[J]. Journal of Software, 2012, 23(7): 1717−1728 (in Chinese) doi: 10.3724/SP.J.1001.2012.04106
|
[14] |
高伟,韩林,赵荣彩,等. 向量并行度指导的循环SIMD向量化方法[J]. 软件学报,2017,28(4):925−939 doi: 10.13328/j.cnki.jos.005029
Gao Wei, Han Lin, Zhao Rongcai, et al. Vectorization method for loops guided by SIMD parallelism[J]. Journal of Software, 2017, 28(4): 925−939 (in Chinese) doi: 10.13328/j.cnki.jos.005029
|
[15] |
赵捷,赵荣彩. 基于有向图可达性的SLP向量化识别方法[J]. 中国科学:信息科学,2017,47(3):310−325 doi: 10.1360/N112016-00146
Zhao Jie, Zhao Rongcai. Identifying superword level parallelism with directed graph reachability[J]. SCIENTIA SINICA Informationis, 2017, 47(3): 310−325 (in Chinese) doi: 10.1360/N112016-00146
|
[16] |
Shin J, Hall M, Chame J. Superword-level parallelism in the presence of control flow[C] //Proc of the 3rd IEEE Code Generation and Optimization. Piscataway, NJ: IEEE, 2005: 165−175
|
[17] |
Chen Y, Mendis C, Amarasinghe S. All you need is superword-level parallelism: Systematic control-flow vectorization with SLP [C] //Proc of the 43rd ACM SIGPLAN Int Conf on Programming Language Design and Implementation. New York: ACM, 2022: 301−315
|
[18] |
Huh J, Tuck J. Improving the effectiveness of searching for isomorphic chains in superword level parallelism[C] //Proc of the 50th ACM Int Symp on Microarchitecture. New York: ACM, 2017: 718−729
|
[19] |
Liu Jun, Zhang Yuanrui, Jang O, et al. A compiler framework for extracting superword level parallelism[C] //Proc of the 33rd ACM Programming Language Design and Implementation. New York: ACM, 2012: 347−358
|
[20] |
Mendis C, Amarasinghe S. GoSLP: Globally optimized superword level parallelism framework[C/OL] //Proc of the 30th ACM Object Oriented Programming Systems Languages and Applications. New York: ACM, 2018[2022-03-12].https://dl.acm.org/doi/pdf/10.1145/3276480
|
[21] |
吕鹏伟,刘从新,赵一明,等. 基于动态规划的自动向量化方法[J]. 北京理工大学学报,2017,37(5):544−550 doi: 10.15918/j.tbit1001-0645.2017.05.020
Lü Pengwei, Liu Congxin, Zhao Yiming, et al. Auto-vectorization method based on dynamic programming[J]. Transactions of Beijing Institute of Technology, 2017, 37(5): 544−550 (in Chinese) doi: 10.15918/j.tbit1001-0645.2017.05.020
|
[22] |
Porpodas V, Jones T M. Throttling automatic vectorization: When less is more[C] //Proc of the 24th IEEE Parallel Architecture and Compilation. Piscataway, NJ: IEEE, 2015: 432−444
|
[23] |
Porpodas V, Rocha R C O, Góes L F W. VW-SLP: Auto-vectorization with adaptive vector width[C/OL] //Proc of the 28th IEEE Parallel Architectures and Compilation Techniques. Piscataway, NJ: IEEE, 2018[2022-03-11]. http://vporpo.me/papers/vwslp_pact2018.pdf
|
[24] |
赵博,赵荣彩,李雁冰,等. 类型转换语句的SLP发掘方法[J]. 计算机科学,2014,41(11):16−21 doi: 10.11896/j.issn.1002-137X.2014.11.004
Zhao Bo, Zhao Rongcai, Li Yanbing, et al. SLP exploitation method for type conversion statements[J]. Computer Science, 2014, 41(11): 16−21 (in Chinese) doi: 10.11896/j.issn.1002-137X.2014.11.004
|
[25] |
Liu Yuping, Hong Dingyong, Wu Janjan, et al. Exploiting SIMD asymmetry in ARM-to-x86 dynamic binary translation [J/OL]. Transactions on Architecture and Code Optimization, 2019[2022-02-13].https://dl.acm.org/doi/pdf/10.1145/3301488
|
[26] |
Mendis C, Jain A, Jain P, et al. Revec: Program rejuvenation through revectorization[C]//Proc of the 28th ACM Int Conf on Compiler Construction. New York: ACM, 2019: 29−41
|
[27] |
Chen Yishen, Mendis C, Carbin M, et al. VeGen: A vectorizer generator for SIMD and beyond[C] //Proc of the 26th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2021: 902−914
|
[28] |
Fritts J E, Steiling F W, Tucek J A, et al. MediaBench II video: Expediting the next generation of video systems research[J]. Microprocessors and Microsystems, 2009, 33(4): 301−318 doi: 10.1016/j.micpro.2009.02.010
|
[29] |
Jia Zhihao, Padon O, Thomas J, et al. TASO: Optimizing deep learning computation with automatic generation of graph substitutions [C] //Proc of the 27th ACM Symp on Operating Systems Principles. New York: ACM, 2019: 47−62
|
[30] |
Willsey M, Nandi C, Wang Y R, et al. Egg: Fast and extensible equality saturation[C/OL] //Proc of the 48th ACM SIGPLAN Symp on Principles of Programming. New York: ACM, 2021[2022-03-12].https://dl.acm.org/doi/pdf/10.1145/3434304
|
[31] |
Lattner C, Adve V. The LLVM compiler framework and infrastructure tutorial[C] //Proc of the 17th Int Workshop on Languages and Compilers for Parallel Computing. Berlin: Springer, 2004: 15−16
|
[32] |
纪守领,李进锋,杜天宇,等. 机器学习模型可解释性方法、应用与安全研究综述[J]. 计算机研究与发展,2019,56(10):2071−2096 doi: 10.7544/issn1000-1239.2019.20190540
Ji Shouling, Li Jinfeng, Du Tianyu, et al. Survey on techniques, applications and security of machine learning interpretability[J]. Journal of Computer Research and Development, 2019, 56(10): 2071−2096 (in Chinese) doi: 10.7544/issn1000-1239.2019.20190540
|
[1] | Lin Liansheng, Zheng Huanqin, Su Shen, Lei Kai, Chen Xiaofeng, Tian Zhihong. An On-Chain Mechanism Against DeFi Price Manipulation Attacks[J]. Journal of Computer Research and Development, 2025, 62(2): 443-457. DOI: 10.7544/issn1000-1239.202330291 |
[2] | Song Shuwei, Ni Xiaoze, Chen Ting. Gas Optimization for Smart Contracts: A Survey[J]. Journal of Computer Research and Development, 2023, 60(2): 311-325. DOI: 10.7544/issn1000-1239.202220887 |
[3] | Ying Chenhao, Xia Fuyuan, Li Jie, Si Xueming, Luo Yuan. Incentive Mechanism Based on Truth Estimation of Private Data for Blockchain-Based Mobile Crowdsensing[J]. Journal of Computer Research and Development, 2022, 59(10): 2212-2232. DOI: 10.7544/issn1000-1239.20220493 |
[4] | Feng Jingyu, Yang Jinwen, Zhang Ruitong, Zhang Wenbo. A Spectrum Sharing Incentive Scheme Against Location Privacy Leakage in IoT Networks[J]. Journal of Computer Research and Development, 2020, 57(10): 2209-2220. DOI: 10.7544/issn1000-1239.2020.20200453 |
[5] | Hai Mo, Zhu Jianming. A Propagation Mechanism Combining an Optimal Propagation Path and Incentive in Blockchain Networks[J]. Journal of Computer Research and Development, 2019, 56(6): 1205-1218. DOI: 10.7544/issn1000-1239.2019.20180419 |
[6] | He Yunhua, Li Mengru, Li Hong, Sun Limin, Xiao Ke, Yang Chao. A Blockchain Based Incentive Mechanism for Crowdsensing Applications[J]. Journal of Computer Research and Development, 2019, 56(3): 544-554. DOI: 10.7544/issn1000-1239.2019.20170670 |
[7] | He Haiwu, Yan An, Chen Zehua. Survey of Smart Contract Technology and Application Based on Blockchain[J]. Journal of Computer Research and Development, 2018, 55(11): 2452-2466. DOI: 10.7544/issn1000-1239.2018.20170658 |
[8] | Xiong Jinbo, Ma Rong, Niu Ben, Guo Yunchuan, Lin Li. Privacy Protection Incentive Mechanism Based on User-Union Matching in Mobile Crowdsensing[J]. Journal of Computer Research and Development, 2018, 55(7): 1359-1370. DOI: 10.7544/issn1000-1239.2018.20180080 |
[9] | Wang Bo, Huang Chuanhe, Yang Wenzhong, Dan Feng, and Xu Liya. An Incentive-Cooperative Forwarding Model Based on Punishment Mechanism in Wireless Ad Hoc Networks[J]. Journal of Computer Research and Development, 2011, 48(3): 398-406. |
[10] | Yue Guangxue, Li Renfa, Chen Zhi, Zhou Xu. Analysis of Free-riding Behaviors and Modeling Restrain Mechanisms for Peer-to-Peer Networks[J]. Journal of Computer Research and Development, 2011, 48(3): 382-397. |
1. |
李硕,王馨爽. 多场景融合的码号数据分发架构及关键技术研究. 数据通信. 2024(06): 1-3+11 .
![]() | |
2. |
俞惠芳,李磊. 基于椭圆曲线签密的跨链医疗数据共享方案. 通信学报. 2024(12): 57-66 .
![]() |