Abstract:
In order to exploit instruction level parallelism (ILP), multiple functional units with multi-ports register file are often used in very long instruction word (VLIW) processor. As the number of functional units rises, the number of register file ports will grow accordingly. At some point, the multiplexing logic on register ports can come to dominate the processor's cycle time. A reasonable solution is to partition the register file into independent clusters. Although clustered architectures reduce register file ports per cluster without performance degradation, they present new challenges to compiler which must assign every operation and operand to a specific cluster and coordinate data movement between clusters to achieve fine ILP. In this paper, a scheduling algorithm for clustered VLIW architectures—hyperblock-based unified cluster assignment and modulo scheduling (HBUCAMS) is proposed. Compared with basic block, hyperblock can provide more larger schedule region for exploiting ILP. Furthermore, because loop bodies with control flow can be converted into hyperblocks, there are more opportunities to apply modulo scheduling. Instead of performing clustered assignment and modulo scheduling sequentially, HBUCAMS put them into a single phase. This unified approach is more effective than phase-ordered approach, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Experiments in YHFT-DSP/700 compiler show that the proposed algorithm can obtain more optimized result than the ITSS algorithm.