ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (4): 750-763.doi: 10.7544/issn1000-1239.2017.20160138

Previous Articles     Next Articles

The Data-Flow Block Based Spatial Instruction Scheduling Method

Liu Bingtao1,2,3, Wang Da1, Ye Xiaochun1, Fan Dongrui1,2, Zhang Zhimin1, Tang Zhimin1   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190); 2(School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049); 3(Institute of Information and Control, Hangzhou Dianzi University, Hangzhou 310018)
  • Online:2017-04-01

Abstract: Clustered superscalar processors partition hardware resources to circumvent the energy and cycle time penalties incurred by large, monolithic structures. Dynamic multi-core processors fuse hardware resources of several physical cores to provide the computation capability adapting to applications. Energy-efficient computation is achieved in these architectures with a carefully orchestrated utilization of spatially distributed hardware resources. Problems such as instruction load imbalance and operand forwarding latency between partitions may cause performance penalties, so an effective spatial instruction scheduling method is needed to distribute the computation among the partitions of spatial architectures. We present the data-flow block(DFB) based spatial instruction scheduling method. DFBs are dynamically constructed, cached and reused schedule patterns for one or more sequentially executed instruction basic blocks. DFB scheduling algorithm models the data-flow constraints of dynamic instruction stream and the scheduling space defined by hardware resources, then makes the scheduling decision according to the relative criticality, which is the quantitative scheduling slack of instructions. We present the framework and algorithm related to DFB scheduling. Through experimenting with various microarchitecture parameters closely related to scheduling method such as partition count, inter-partition latency and schedule window capacity, we prove that ideal DFB scheduling performs better and stabler than round-robin and dependence-based scheduling. At last, we show that the scheduling performance with a DFB cache implementation example closes to ideal DFB scheduling.

Key words: processor microarchitecture, load balancing, instruction scheduling, data-flow, critical path

CLC Number: