Citation: | Gu Beibei, Qiu Jiyan, Wang Ning, Chen Jian, Chi Xuebin. A Performance Data Collection Method for Computing Software in Heterogeneous Systems[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440512 |
Supercomputing has rapidly developed from traditional CPU clusters to heterogeneous platforms. With the type conversion of hardware platforms, it faces significant challenges in optimizing computing software programs and performance evaluation. Currently, some international mainstream parallel program performance analysis tools and software generally have low compatibility with domestic supercomputing heterogeneous system processor products, often requiring instrumentation and recompilation of code, and low accuracy in single node performance data collection. To improve these shortcomings, this article proposes a floating-point performance data collection method for heterogeneous system computing software. This method is based on the domestic supercomputing system verification platform to develop and verify the floating-point performance collection prototype. At present, effective collection of single node and multi node performance indicator data has been achieved, and it is non-invasive to the original program. There is no need to modify the code of the monitored program for monitoring in a plug-in manner, making it highly versatile. Finally, we conducted comparative experimental analysis with three types of programs: rocHPL, Cannon, and mixbench, and conducted performance data collection monitoring research on ResNet (residual network, ResNet) program for AI computing. We have demonstrated that the collection method proposed in this article has high accuracy, achieves the expected collection effect in experiments, and has good reference value for program optimization, verifying the effectiveness of the proposed method.
[1] |
Szegedy C, Liu Wei, Jia Yangqing, et al. Going deeper with convolutions[C/OL]//Proc of the 28th Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015[2025-01-09]. https://ieeexplore.ieee.org/document/7298594
|
[2] |
Madsen J R, Awan M G, Brunie H, et al. Timemory: Modular performance analysis for HPC[C]//Proc of the 35th Conf on ISC High Performance 2020(ISC 2020). Berlin: Springer, 2020: 434–452
|
[3] |
Martin B, Kim B D, Jeff D, et al. PerfExpert: An easy-to-use performance diagnosis tool for HPC applications[C/OL]//Proc of the 24th Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: EEE, 2010[2025-01-09]. https://ieeexplore.ieee.org/document/5644905
|
[4] |
Dieter M,Scott B. Bischof C,et al. Score-P:A unified performance measurement system for petascale applications[C]//Proc of Competence in High Performance Computing,2010. Berlin:Springer,2012:85–97(没有届
|
[5] |
Miceli R, Civario G, Sikora A, et. al. Autotune: A plugin-driven approach to the automatic tuning of parallel applications[C]//Proc of the 11th Int Conf on Applied Parallel and Scientic Computing. Berlin: Springer, 2013: 328−342
|
[6] |
Gerndt M, Kereku E. Periscope: Advanced techniques for performance analysis[C]//Proc of the Int Conf on Parallel Computing: Current & Future Issues of High-End Computing 2005. Julich: John von Neumann Institute for Computing, 2006: 15-
|
[7] |
Parasyris K, Lna I, Menon H, et al. HPC-MixPBench: An HPC benchmark suite for mixed-precision analysis[C]//Proc of the 17th Int Conf in 2020 IEEE Int Symp on Workload Characterization. Piscataway, NJ: IEEE, 2020: 25−36
|
[8] |
Chalmers N, Kurzak J, McDougall D, et al. Optimizing high-performance linpack for exascale accelerated architectures[J]. arXiv preprint, arXiv: 2304.10397v1, 2023
|
[9] |
Dongarra J, Luszczek P, Petitet A. The LINPACK benchmark: Past, present and future[J]. Concurrency and Computation: Practice and Experience, 2003, 15(9): 803−820 doi: 10.1002/cpe.728
|
[10] |
黎雷生,杨文浩,马文静,等. 复杂异构计算系统 HPL的优化[J]. 软件学报,2021,32(8):2307−2318
Li Leisheng, Yang Wenhao, Ma Wenjing, et al. Optimization of HPL on complex heterogeneous computing system[J]. Journal of Software, 2021, 32(8): 2307−2318 (in Chinese)
|
[11] |
Eustace A, Srivastava A. ATOM: A flexible interface for building high performance program analysis tools[C/OL]//Proc of the Winter 1995 USENIX Conf. New York: ACM, 1995[2025-01-09]. https://dl.acm.org/doi/abs/10.5555/1267411.1267436(没有届
Eustace A, Srivastava A. ATOM: A flexible interface for building high performance program analysis tools[C/OL]//Proc of the Winter 1995 USENIX Conf. New York: ACM, 1995[2025-01-09]. https://dl.acm.org/doi/abs/10.5555/1267411.1267436(没有届)
|
[12] |
Browne S, Dongarra J, Garner N, et al. A scalable cross-platform infrastructure for application performance tuning using hardware counters[C]//Proc of the 12th Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2000: 42−55
|
[13] |
He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]//Proc of the 29th Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
|
[1] | Wu Zehui, Wei Qiang, Wang Xinlei, Wang Yunchao, Yan Chenyu, Chen Jing. Survey of Automatic Software Vulnerability Exploitation[J]. Journal of Computer Research and Development, 2024, 61(9): 2261-2274. DOI: 10.7544/issn1000-1239.202220410 |
[2] | Li Jinpeng, Zhang Chuang, Chen Xiaojun, Hu Yue, Liao Pengcheng. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21. DOI: 10.7544/issn1000-1239.2021.20190785 |
[3] | Ma Yanchun, Liu Yongjian, Xie Qing, Xiong Shengwu, Tang Lingli. Review of Automatic Image Annotation Technology[J]. Journal of Computer Research and Development, 2020, 57(11): 2348-2374. DOI: 10.7544/issn1000-1239.2020.20190793 |
[4] | Xie Juanying, Hou Qi, Shi Yinghuan, Lü Peng, Jing Liping, Zhuang Fuzhen, Zhang Junping, Tan Xiaoyang, Xu Shengquan. The Automatic Identification of Butterfly Species[J]. Journal of Computer Research and Development, 2018, 55(8): 1609-1618. DOI: 10.7544/issn1000-1239.2018.20180181 |
[5] | Ling Jimin, Zhang Li. An Approach to Automatically Build Customizable Reference Process Models[J]. Journal of Computer Research and Development, 2017, 54(3): 642-653. DOI: 10.7544/issn1000-1239.2017.20151047 |
[6] | You Feng, Zhao Ruilian, Lü Shanshan. Output Domain Based Automatic Test Case Generation[J]. Journal of Computer Research and Development, 2016, 53(3): 541-549. DOI: 10.7544/issn1000-1239.2016.20148045 |
[7] | Hao Fanchang, Luan Junfeng, Zhu Daming, Zhang Peng, and Li Ming. A Faster Algorithm for Sorting Genomes by Reciprocal Translocation, Insertion and Deletion[J]. Journal of Computer Research and Development, 2010, 47(11): 2011-2023. |
[8] | Ma Peijun, Wang Tiantian, and Su Xiaohong. Automatic Grading of Student Programs Based on Program Understanding[J]. Journal of Computer Research and Development, 2009, 46(7): 1136-1142. |
[9] | Shi Yuliang, Huang Guang'an, Ye Wei, Zhang Liang, Shi Baile. Automatic Composition of Web Services Based on Task Dependency Specification[J]. Journal of Computer Research and Development, 2006, 43(12): 2110-2116. |
[10] | Wang Zhiming, Cai Lianhong, Ai Haizhou. Automatic Estimation of Visual Speech Parameters[J]. Journal of Computer Research and Development, 2005, 42(7): 1185-1190. |