• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Chen Yao, Zhao Yonghua, Zhao Wei, Zhao Lian. GPU-Accelerated Incomplete Cholesky Factorization Preconditioned Conjugate Gradient Method[J]. Journal of Computer Research and Development, 2015, 52(4): 843-850. DOI: 10.7544/issn1000-1239.2015.20131919
Citation: Chen Yao, Zhao Yonghua, Zhao Wei, Zhao Lian. GPU-Accelerated Incomplete Cholesky Factorization Preconditioned Conjugate Gradient Method[J]. Journal of Computer Research and Development, 2015, 52(4): 843-850. DOI: 10.7544/issn1000-1239.2015.20131919

GPU-Accelerated Incomplete Cholesky Factorization Preconditioned Conjugate Gradient Method

More Information
  • Published Date: March 31, 2015
  • Incomplete Cholesky factorization preconditioned conjugate gradient (ICCG) method is effective to solve large sparse symmetric positive definite linear systems. However, ICCG method requires solving two sparse triangular linear systems during each iteration. The inherent serialism of solving sparse triangular becomes a bottleneck which prevents high efficient parallelization of ICCG method on GPU platform. In this paper, an effective method to accelerate solving sparse triangular on GPU platform is proposed. In order to increase the multi-thread parallelism of solving sparse triangular on GPU platform, level scheduling is exploited for the sparse triangular matrixes which incomplete Cholesky factorization generates. For further improving the parallel performance of solving sparse triangular, approximate minimum degree (AMD) algorithm is used to reorder the coefficient matrix before level scheduling. Moreover, a novel method, taking advantage of the level information to reorder the sparse triangular matrices after level scheduling, is applied. These two methods can decrease the number of levels during level scheduling and optimize GPU memory access pattern to utilize memory coalescing in solving sparse triangular, respectively. Numerical experiments indicate that compared with ICCG method implemented with NVIDIA CUSPARSE, applying the above methods can obtain more than 100% performance improvement on average.
  • Related Articles

    [1]Zhang Yuan, Cao Huawei, Zhang Jie, Shen Yue, Sun Yiming, Dun Ming, An Xuejun, Ye Xiaochun. Survey on Key Technologies of Graph Processing Systems Based on Multi-core CPU and GPU Platforms[J]. Journal of Computer Research and Development, 2024, 61(6): 1401-1428. DOI: 10.7544/issn1000-1239.202440073
    [2]Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang. Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey[J]. Journal of Computer Research and Development, 2020, 57(6): 1191-1207. DOI: 10.7544/issn1000-1239.2020.20200113
    [3]Duan Qiong, Tian Bo, Chen Zheng, Wang Jie, He Zengyou. CUDA-TP: A GPU-Based Parallel Algorithm for Top-Down Intact Protein Identification[J]. Journal of Computer Research and Development, 2018, 55(7): 1525-1538. DOI: 10.7544/issn1000-1239.2018.20170080
    [4]Feng Jiaying, Zhang Xiaowang, Feng Zhiyong. Parallel Algorithms for RDF Type-Isomorphism on GPU[J]. Journal of Computer Research and Development, 2018, 55(3): 651-661. DOI: 10.7544/issn1000-1239.2018.20160845
    [5]Su Huayou, Wen Wen, Li Dongsheng. Optimization and Parallelization Single Particle Cryo-EM Software RELION with GPU[J]. Journal of Computer Research and Development, 2018, 55(2): 409-417. DOI: 10.7544/issn1000-1239.2018.20160873
    [6]Zhang Heng, Zhang Libo, WuYanjun. Large-Scale Graph Processing on Multi-GPU Platforms[J]. Journal of Computer Research and Development, 2018, 55(2): 273-288. DOI: 10.7544/issn1000-1239.2018.20170697
    [7]Zheng Zhen, Zhai Jidong, Li Yan, Chen Wenguang. Workload Analysis for Typical GPU Programs Using CUPTI Interface[J]. Journal of Computer Research and Development, 2016, 53(6): 1249-1262. DOI: 10.7544/issn1000-1239.2016.20148354
    [8]Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu. SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing[J]. Journal of Computer Research and Development, 2013, 50(7): 1491-1500.
    [9]Cai Yong, Li Guangyao, and Wang Hu. Parallel Computing of Central Difference Explicit Finite Element Based on GPU General Computing Platform[J]. Journal of Computer Research and Development, 2013, 50(2): 412-419.
    [10]Hu Wei and Qin Kaihuai. A New Rendering Technology of GPU-Accelerated Radiosity[J]. Journal of Computer Research and Development, 2005, 42(6): 945-950.

Catalog

    Article views (1847) PDF downloads (786) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return