SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing

Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu

Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu. SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU ComputingJ. Journal of Computer Research and Development, 2013, 50(7): 1491-1500.

Citation:

Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu. SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU ComputingJ. Journal of Computer Research and Development, 2013, 50(7): 1491-1500.

Citation:

Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu. SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU ComputingJ. Journal of Computer Research and Development, 2013, 50(7): 1491-1500.

SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing

Tang Liang, Luo Zuying, Zhao Guoxing, and Yang Xu

Graphical Abstract

Abstract

Abstract

Recently some EDA researchers try using the high computing performance of the graphic processing unit (GPU) to speed up IC parameter analysis. In order to apply GPU for efficient analyses of power/ground(P/G) networks, a novel efficient parallel analysis algorithm is proposed based on the traditional successive over-relaxation (SOR) algorithm. According to the accelerating principles of GPU parallel computing, the algorithm has made the following improvements. 1) Use odd/even based red/black strategy to classify all nodes so that all neighbors of a red node are black ones and a black node has all red neighbors. Red nodes and black nodes are relaxed alternatively for data consistency of GPU computing. As for a P/G network of N nodes, one relaxation process of red or black nodes will relax N/2 nodes, which means that the red/black SOR will implement N/2 threads at one time from the viewpoint of theory. 2) Optimize the data structure to implement data coalesced access. 3) Local shared memory is used to parallel reduce relaxation ending flags and the compacted flags are zero copied to the main memory of the computer system at high speed. In turn, CPU can decide fast whether to activate the next round of relaxation kernels or end the relaxation. A large number of experiments demonstrate that compared with the single CPU thread, the speedup times of our algorithm increase linearly as GPU physical threads increase, and the algorithm can provide 242X speedup at maximum. Thus, according to our best knowledge, the algorithm provides the best GPU accelerating efficiency among all present EDA researches.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

SOR-Based P/G Solving Algorithm of Linear Parallelism for GPU Computing

Abstract

Catalog

Export File

Citation

Format

Content