• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu. SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing[J]. Journal of Computer Research and Development, 2013, 50(7): 1491-1500.
Citation: Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu. SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing[J]. Journal of Computer Research and Development, 2013, 50(7): 1491-1500.

SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing

More Information
  • Published Date: July 14, 2013
  • Recently some EDA researchers try using the high computing performance of the graphic processing unit (GPU) to speed up IC parameter analysis. In order to apply GPU for efficient analyses of power/ground(P/G) networks, a novel efficient parallel analysis algorithm is proposed based on the traditional successive over-relaxation (SOR) algorithm. According to the accelerating principles of GPU parallel computing, the algorithm has made the following improvements. 1) Use odd/even based red/black strategy to classify all nodes so that all neighbors of a red node are black ones and a black node has all red neighbors. Red nodes and black nodes are relaxed alternatively for data consistency of GPU computing. As for a P/G network of N nodes, one relaxation process of red or black nodes will relax N/2 nodes, which means that the red/black SOR will implement N/2 threads at one time from the viewpoint of theory. 2) Optimize the data structure to implement data coalesced access. 3) Local shared memory is used to parallel reduce relaxation ending flags and the compacted flags are zero copied to the main memory of the computer system at high speed. In turn, CPU can decide fast whether to activate the next round of relaxation kernels or end the relaxation. A large number of experiments demonstrate that compared with the single CPU thread, the speedup times of our algorithm increase linearly as GPU physical threads increase, and the algorithm can provide 242X speedup at maximum. Thus, according to our best knowledge, the algorithm provides the best GPU accelerating efficiency among all present EDA researches.
  • Related Articles

    [1]Zhang Xuguang, Chen Mingkai, Wei Xin. Ubiquitous Video Transmission Scheduling Supported by Computing Power Network[J]. Journal of Computer Research and Development, 2023, 60(4): 786-796. DOI: 10.7544/issn1000-1239.202330005
    [2]Wang Qinglin, Li Dongsheng, Mei Songzhu, Lai Zhiquan, Dou Yong. Optimizing Winograd-Based Fast Convolution Algorithm on Phytium Multi-Core CPUs[J]. Journal of Computer Research and Development, 2020, 57(6): 1140-1151. DOI: 10.7544/issn1000-1239.2020.20200107
    [3]Duan Qiong, Tian Bo, Chen Zheng, Wang Jie, He Zengyou. CUDA-TP: A GPU-Based Parallel Algorithm for Top-Down Intact Protein Identification[J]. Journal of Computer Research and Development, 2018, 55(7): 1525-1538. DOI: 10.7544/issn1000-1239.2018.20170080
    [4]Ren Gang, Deng Pan, Yang Chao, Wu Changmao. MapReduce Back Propagation Algorithm Based on Structure Parallelism[J]. Journal of Computer Research and Development, 2018, 55(6): 1308-1319. DOI: 10.7544/issn1000-1239.2018.20170024
    [5]Ma Chao, Dai Zibin, Li Wei, Nan Longmei, Jin Yu. RPRU: A Unified Architecture for Rotation and Bit-Extraction Operations in General-Propose Processor[J]. Journal of Computer Research and Development, 2018, 55(2): 426-437. DOI: 10.7544/issn1000-1239.2018.20160775
    [6]Li Zhe, Li Zhanshan, Li Ying. A Constraint Network Model and Parallel Arc Consistency Algorithms Based on GPU[J]. Journal of Computer Research and Development, 2017, 54(3): 514-528. DOI: 10.7544/issn1000-1239.2017.20150912
    [7]Zhang Jianmin, Li Tiejun, Li Sikun. An Address Cache of Interconnect Network in Parallel Computers[J]. Journal of Computer Research and Development, 2016, 53(2): 390-398. DOI: 10.7544/issn1000-1239.2016.20148039
    [8]Zhang Shuai, Li Tao, Jiao Xiaofan, Wang Yifeng, Yang Yulu. Parallel TNN Spectral Clustering Algorithm in CPU-GPU Heterogeneous Computing Environment[J]. Journal of Computer Research and Development, 2015, 52(11): 2555-2567. DOI: 10.7544/issn1000-1239.2015.20148151
    [9]Luo Zuying, Zhang Yubin, Yu Xianchuan. A Single Open-Defect Analysis Method for Power/Ground Networks[J]. Journal of Computer Research and Development, 2009, 46(7): 1234-1240.
    [10]Xu Guoshi, Lu Fakai, Xu Zhuoqun, Yu Huashan, and Ding Wenkui. A Network Parallel Computing Scheme for Alternative Splicing of Biology Genome[J]. Journal of Computer Research and Development, 2007, 44(10): 1682-1687.

Catalog

    Article views (915) PDF downloads (478) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return