Abstract:
Ensuring deadlock-free data transmission in the Network-on-Chip (NoC) is a prerequisite for providing reliable communication services for Multi-processor System-on-Chip (MPSoC), directly determining the availability of NoC and even MPSoC. Existing general-purpose deadlock-free strategies are oriented to arbitrary topologies, making it challenging to utilize the features and advantages of a specific topology. Moreover, these strategies may even increase network latency, power consumption, and hardware complexity. In addition, due to significant differences in the regular network between routing-level and protocol-level deadlocks, existing solutions struggle to simultaneously address both types of deadlock issues, affecting the MPSoC reliability. This paper proposes a deadlock-free strategy with synchronous Hamiltonian rings based on the inherent Hamiltonian characteristics of the Triplet-based many-core architecture (TriBA). This method uses the topology's symmetric axes and Hamiltonian edge to allocate independent store-and-forward buffers for data transmission, preventing protocol-level deadlocks and improving data transfer speed. Additionally, we designed a directional determination method for data transmission within the same buffer using cyclic linked-list technology. This method ensures data independence and synchronous forward transmission, eliminates routing-level deadlocks, and reduces data transfer latency. Based on optimizing redundant calculations in look-ahead routing algorithms, we propose a deadlock-free routing mechanism called Hamiltonian Shortest Path Routing (HamSPR) based on a synchronous Hamiltonian ring. GEM5 simulation results show that, compared with existing solutions in the TriBA, HamSPR reduces average packet latency and power consumption in synthetic traffic patterns by 18.78%~65.40% and 6.94%~34.15%, respectively, while improving throughput by 8.00%~59.17%. In the PARSEC benchmark, HamSPR achieved maximum reductions of 16.51% in application runtime and 42.75% in average packet latency, respectively. Moreover, compared to the 2D-Mesh, TriBA demonstrated an application performance improvement of 1%~10% in the PARSEC benchmark.