Abstract:
Clustered superscalar processors partition hardware resources to circumvent the energy and cycle time penalties incurred by large, monolithic structures. Dynamic multi-core processors fuse hardware resources of several physical cores to provide the computation capability adapting to applications. Energy-efficient computation is achieved in these architectures with a carefully orchestrated utilization of spatially distributed hardware resources. Problems such as instruction load imbalance and operand forwarding latency between partitions may cause performance penalties, so an effective spatial instruction scheduling method is needed to distribute the computation among the partitions of spatial architectures. We present the data-flow block(DFB) based spatial instruction scheduling method. DFBs are dynamically constructed, cached and reused schedule patterns for one or more sequentially executed instruction basic blocks. DFB scheduling algorithm models the data-flow constraints of dynamic instruction stream and the scheduling space defined by hardware resources, then makes the scheduling decision according to the relative criticality, which is the quantitative scheduling slack of instructions. We present the framework and algorithm related to DFB scheduling. Through experimenting with various microarchitecture parameters closely related to scheduling method such as partition count, inter-partition latency and schedule window capacity, we prove that ideal DFB scheduling performs better and stabler than round-robin and dependence-based scheduling. At last, we show that the scheduling performance with a DFB cache implementation example closes to ideal DFB scheduling.