ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (6): 1173-1185.doi: 10.7544/issn1000-1239.2016.20150114

    Next Articles

Two-Stage Synchronization Based Thread Block Compaction Scheduling Method of GPGPU

Zhang Jun1,3, He Yanxiang1,2, Shen Fanfan1, Jiang Nan1,4, Li Qing’an1,2   

  1. 1(Computer School, Wuhan University, Wuhan 430072);2(State Key Laboratory of Software Engineering (Wuhan University), Wuhan 430072);3(School of Software, East China University of Technology, Nanchang 330013);4(School of Computer Science, Hubei University of Technology, Wuhan 430068)
  • Online:2016-06-01

Abstract: The application of general purpose graphics processing unit (GPGPU) has become increasingly extensive in the general purpose computing fields facing high performance computing and high throughput. The powerful computing capability of GPGPU comes from single instruction multiple data (SIMD) execution model it takes. Currently, it has become the main stream for GPGPU to implement the efficient execution of the computing tasks via massive high parallel threads. However the parallel computing capability is affected during dealing with the branch divergent control flow as different branch path is processed sequentially. In this paper, we propose TSTBC (two-stage synchronization based thread block compaction scheduling) method based on analyzing the previously proposed thread block compaction scheduling methods in inefficient dealing with divergent branches. This method analyzes the effectiveness of thread block compaction and reconstruction via taking the use of the adequacy decision logic of thread block compaction and decreases the number of inefficient thread block compaction. The simulation experiment results show that the effectiveness of thread block compaction and reconstruction is improved to some extent relative to the other same type of methods, and the destruction on data locality inside the thread group and the on-chip level-one data cache miss rate can be reduced effectively. The performance of the whole system is increased by 1927% over the baseline architecture.

Key words: general purpose graphics processing unit (GPGPU), thread scheduling, thread block compaction and reconstruction, two-stage synchronization, branch divergence

CLC Number: