• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Gu Beibei, Qiu Jiyan, Wang Ning, Chen Jian, Chi Xuebin. A Performance Data Collection Method for Computing Software in Heterogeneous Systems[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440512
Citation: Gu Beibei, Qiu Jiyan, Wang Ning, Chen Jian, Chi Xuebin. A Performance Data Collection Method for Computing Software in Heterogeneous Systems[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440512

A Performance Data Collection Method for Computing Software in Heterogeneous Systems

Funds: This work was supported by the National Natural Science Foundation of China (62372428).
More Information
  • Author Bio:

    Gu Beibei: born in 1982. PhD candidate, associate professor. Member of CCF. Her main research interest includes performance evaluation and analysis of high performance computing software applications

    Qiu Jiyan: born in 1998. PhD candidate. His main research interests include AI for science and high performance computing

    Wang Ning: born in 1988. Bachelor. His main research interest includes performance analysis and optimization of general-purpose processors

    Chen Jian: born in 1977. PhD. Vice chairman of CCF, standing committee member of CCF TCHPC, executive committee member of CCF TCAIPR. His main research interests include AI and high performance computing

    Chi Xuebin: born in 1963. PhD, professor, PhD supervisor. His main research interest includes parallel computing

  • Received Date: June 16, 2023
  • Revised Date: January 14, 2025
  • Accepted Date: January 25, 2025
  • Available Online: January 25, 2025
  • Supercomputing has rapidly developed from traditional CPU clusters to heterogeneous platforms. With the type conversion of hardware platforms, it faces significant challenges in optimizing computing software programs and performance evaluation. Currently, some international mainstream parallel program performance analysis tools and software generally have low compatibility with domestic supercomputing heterogeneous system processor products, often requiring instrumentation and recompilation of code, and low accuracy in single node performance data collection. To improve these shortcomings, this article proposes a floating-point performance data collection method for heterogeneous system computing software. This method is based on the domestic supercomputing system verification platform to develop and verify the floating-point performance collection prototype. At present, effective collection of single node and multi node performance indicator data has been achieved, and it is non-invasive to the original program. There is no need to modify the code of the monitored program for monitoring in a plug-in manner, making it highly versatile. Finally, we conducted comparative experimental analysis with three types of programs: rocHPL, Cannon, and mixbench, and conducted performance data collection monitoring research on ResNet (residual network, ResNet) program for AI computing. We have demonstrated that the collection method proposed in this article has high accuracy, achieves the expected collection effect in experiments, and has good reference value for program optimization, verifying the effectiveness of the proposed method.

  • [1]
    Szegedy C, Liu Wei, Jia Yangqing, et al. Going deeper with convolutions[C/OL]//Proc of the 28th Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015[2025-01-09]. https://ieeexplore.ieee.org/document/7298594
    [2]
    Madsen J R, Awan M G, Brunie H, et al. Timemory: Modular performance analysis for HPC[C]//Proc of the 35th Conf on ISC High Performance 2020(ISC 2020). Berlin: Springer, 2020: 434–452
    [3]
    Martin B, Kim B D, Jeff D, et al. PerfExpert: An easy-to-use performance diagnosis tool for HPC applications[C/OL]//Proc of the 24th Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: EEE, 2010[2025-01-09]. https://ieeexplore.ieee.org/document/5644905
    [4]
    Dieter M,Scott B. Bischof C,et al. Score-P:A unified performance measurement system for petascale applications[C]//Proc of Competence in High Performance Computing,2010. Berlin:Springer,2012:85–97(没有届
    [5]
    Miceli R, Civario G, Sikora A, et. al. Autotune: A plugin-driven approach to the automatic tuning of parallel applications[C]//Proc of the 11th Int Conf on Applied Parallel and Scientic Computing. Berlin: Springer, 2013: 328−342
    [6]
    Gerndt M, Kereku E. Periscope: Advanced techniques for performance analysis[C]//Proc of the Int Conf on Parallel Computing: Current & Future Issues of High-End Computing 2005. Julich: John von Neumann Institute for Computing, 2006: 15-
    [7]
    Parasyris K, Lna I, Menon H, et al. HPC-MixPBench: An HPC benchmark suite for mixed-precision analysis[C]//Proc of the 17th Int Conf in 2020 IEEE Int Symp on Workload Characterization. Piscataway, NJ: IEEE, 2020: 25−36
    [8]
    Chalmers N, Kurzak J, McDougall D, et al. Optimizing high-performance linpack for exascale accelerated architectures[J]. arXiv preprint, arXiv: 2304.10397v1, 2023
    [9]
    Dongarra J, Luszczek P, Petitet A. The LINPACK benchmark: Past, present and future[J]. Concurrency and Computation: Practice and Experience, 2003, 15(9): 803−820 doi: 10.1002/cpe.728
    [10]
    黎雷生,杨文浩,马文静,等. 复杂异构计算系统 HPL的优化[J]. 软件学报,2021,32(8):2307−2318

    Li Leisheng, Yang Wenhao, Ma Wenjing, et al. Optimization of HPL on complex heterogeneous computing system[J]. Journal of Software, 2021, 32(8): 2307−2318 (in Chinese)
    [11]
    Eustace A, Srivastava A. ATOM: A flexible interface for building high performance program analysis tools[C/OL]//Proc of the Winter 1995 USENIX Conf. New York: ACM, 1995[2025-01-09]. https://dl.acm.org/doi/abs/10.5555/1267411.1267436(没有届

    Eustace A, Srivastava A. ATOM: A flexible interface for building high performance program analysis tools[C/OL]//Proc of the Winter 1995 USENIX Conf. New York: ACM, 1995[2025-01-09]. https://dl.acm.org/doi/abs/10.5555/1267411.1267436(没有届)
    [12]
    Browne S, Dongarra J, Garner N, et al. A scalable cross-platform infrastructure for application performance tuning using hardware counters[C]//Proc of the 12th Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2000: 42−55
    [13]
    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]//Proc of the 29th Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
  • Related Articles

    [1]Wu Zehui, Wei Qiang, Wang Xinlei, Wang Yunchao, Yan Chenyu, Chen Jing. Survey of Automatic Software Vulnerability Exploitation[J]. Journal of Computer Research and Development, 2024, 61(9): 2261-2274. DOI: 10.7544/issn1000-1239.202220410
    [2]Li Jinpeng, Zhang Chuang, Chen Xiaojun, Hu Yue, Liao Pengcheng. Survey on Automatic Text Summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21. DOI: 10.7544/issn1000-1239.2021.20190785
    [3]Ma Yanchun, Liu Yongjian, Xie Qing, Xiong Shengwu, Tang Lingli. Review of Automatic Image Annotation Technology[J]. Journal of Computer Research and Development, 2020, 57(11): 2348-2374. DOI: 10.7544/issn1000-1239.2020.20190793
    [4]Xie Juanying, Hou Qi, Shi Yinghuan, Lü Peng, Jing Liping, Zhuang Fuzhen, Zhang Junping, Tan Xiaoyang, Xu Shengquan. The Automatic Identification of Butterfly Species[J]. Journal of Computer Research and Development, 2018, 55(8): 1609-1618. DOI: 10.7544/issn1000-1239.2018.20180181
    [5]Ling Jimin, Zhang Li. An Approach to Automatically Build Customizable Reference Process Models[J]. Journal of Computer Research and Development, 2017, 54(3): 642-653. DOI: 10.7544/issn1000-1239.2017.20151047
    [6]You Feng, Zhao Ruilian, Lü Shanshan. Output Domain Based Automatic Test Case Generation[J]. Journal of Computer Research and Development, 2016, 53(3): 541-549. DOI: 10.7544/issn1000-1239.2016.20148045
    [7]Hao Fanchang, Luan Junfeng, Zhu Daming, Zhang Peng, and Li Ming. A Faster Algorithm for Sorting Genomes by Reciprocal Translocation, Insertion and Deletion[J]. Journal of Computer Research and Development, 2010, 47(11): 2011-2023.
    [8]Ma Peijun, Wang Tiantian, and Su Xiaohong. Automatic Grading of Student Programs Based on Program Understanding[J]. Journal of Computer Research and Development, 2009, 46(7): 1136-1142.
    [9]Shi Yuliang, Huang Guang'an, Ye Wei, Zhang Liang, Shi Baile. Automatic Composition of Web Services Based on Task Dependency Specification[J]. Journal of Computer Research and Development, 2006, 43(12): 2110-2116.
    [10]Wang Zhiming, Cai Lianhong, Ai Haizhou. Automatic Estimation of Visual Speech Parameters[J]. Journal of Computer Research and Development, 2005, 42(7): 1185-1190.

Catalog

    Article views (49) PDF downloads (14) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return