Citation: | Li Chunfeng, Karim Soliman, Ji Weixing, Shi Feng. Deadlock-Free Strategies Based on Synchronized Hamiltonian Ring in Triplet-Based Many-Core Architecture[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202331042 |
Ensuring deadlock-free data transmission in the network-on-chip (NoC) is a prerequisite for providing reliable communication services for multi-processor system-on-chip (MPSoC), directly determining the availability of NoC and even MPSoC. Existing general-purpose deadlock-free strategies are oriented to arbitrary topologies, making it challenging to utilize the features and advantages of a specific topology. Moreover, these strategies may even increase network latency, power consumption, and hardware complexity. In addition, due to significant differences in the regular network between routing-level and protocol-level deadlocks, existing solutions struggle to simultaneously address both types of deadlock issues, affecting MPSoC reliability. We propose a deadlock-free strategy with synchronous Hamiltonian rings based on the inherent Hamiltonian characteristics of the triplet-based many-core architecture (TriBA). This method uses the topology’s symmetric axes and Hamiltonian edge to allocate independent store-and-forward buffers for data transmission, preventing protocol-level deadlocks and improving data transfer speed. Additionally, we design a directional determination method for data transmission within the same buffer using cyclic linked-list technology. This method ensures data independence and synchronous forward transmission, eliminates routing-level deadlocks, and reduces data transfer latency. Based on optimizing redundant calculations in look-ahead routing algorithms, we propose a deadlock-free routing mechanism called Hamiltonian shortest path routing (HamSPR) based on a synchronous Hamiltonian ring. GEM5 simulation results show that, compared with existing solutions in the TriBA, HamSPR reduces average packet latency and power consumption in synthetic traffic patterns by 18.78%−65.40% and 6.94%−34.15%, respectively, while improving throughput by 8.00%−59.17%. In the PARSEC benchmark, HamSPR achieves maximum reductions of 16.51% in application runtime and 42.75% in average packet latency, respectively. Moreover, compared with 2D-Mesh, TriBA demonstrates an application performance improvement of 1%−10% in PARSEC benchmark.
[1] |
Yazdanpanah F, Afsharmazayejani R. A systematic analysis of power saving techniques for wireless network-on-chip architectures[J]. Journal of Systems Architecture, 2022, 126: 102485 doi: 10.1016/j.sysarc.2022.102485
|
[2] |
Della Vecchia G, Sanges C. A recursively scalable network VLSI implementation[J]. Future Generation Computer Systems, 1988, 4(3): 235−243 doi: 10.1016/0167-739X(88)90007-6
|
[3] |
石峰,陈旭,尹飞,等. 基于 S3 变换的 TriBA-Net 最短路径路由机制[J]. 中国科学:信息科学,2018,48(1):100−114 doi: 10.1360/N112017-00065
Shi Feng, Chen Xu, Yin Fei, et al. A shortest path routing mechanism based on S3 for TriBA-Net[J]. SCIENTIA SINICA Informationis, 2018, 48(1): 100−114 (in Chinese) doi: 10.1360/N112017-00065
|
[4] |
张洋,王达,叶笑春,等. 众核处理器片上网络的层次化全局自适应路由机制[J]. 计算机研究与发展,2016,53(6):1211−1220 doi: 10.7544/issn1000-1239.2016.20150149
Zhang Yang, Wang Da, Ye Xiaochun, et al. A global hierarchical adaptive routing mechanism in many-core processor network-on-chip[J]. Journal of Computer Research and Development, 2016, 53(6): 1211−1220 (in Chinese) doi: 10.7544/issn1000-1239.2016.20150149
|
[5] |
Feng Yinxiao, Xiang Dong, Ma Kaisheng. Heterogeneous die-to-die interfaces: Enabling more flexible chiplet interconnection systems[C]//Proc of the 56th Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2023: 930−943
|
[6] |
Bakhouya M, Suboh S, Gaber J, et al. Analytical performance comparison of 2D mesh, wk-recursive, and spidergon nocs [C/OL]//Proc of the 19th IEEE Int Symp on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW). Piscataway, NJ: IEEE, 2010[2024-05-22]. https://ieeexplore.ieee.org/abstract/document/5470784
|
[7] |
石峰,魏增辉,王一拙,等. 一种多/众核架构TriBA-CMPs的布局布线方案tMesh:中国,CN107526894A[P]. 2017-12-29
Shi Feng, Wei Zenghui, Wang Yizhuo, et al. A layout and routing scheme named tMesh for a multi/many-core architecture TriBA-CMPs: China, CN107526894A[P]. 2017-12-29 (in Chinese)
|
[8] |
Braham C C, Ji Weixinga, Zhang Lingyu. The design and implementation of a hierarchical network on chip[J]. Procedia Engineering, 2012, 29: 2966−2974
|
[9] |
Feng Shi, Yang Zhang. Design and evaluation of low-latency and shortest-path routing algorithm for triplet-based hierarchical interconnection network[J]. Journal of Testing and Evaluation, 2013, 41(4): 541−550 doi: 10.1520/JTE20120212
|
[10] |
Khan HUR, Shi Feng, Ji Weixing, et al. Computationally efficient locality-aware interconnection topology for multi-processor system-on-chip (MP-SoC)[J]. Chinese Science Bulletin, 2010, 55(29): 3363−3371 doi: 10.1007/s11434-010-4118-z
|
[11] |
Wang Zuo, Zuo Qi, Li Jiaxin. Triplet-based topology for on-chip networks[J]. WSEAS Transactions on Computers, 2009, 8(3): 516−525
|
[12] |
Xue Licheng, Shi Feng, Ji Weixing. 3D floorplanning of low-power and area-efficient network-on-chip architecture[J]. Microprocessors and Microsystems, 2011, 35(5): 484−495 doi: 10.1016/j.micpro.2011.04.001
|
[13] |
Zhang Yang, Shi Feng, Zuo Qi, et al. Express router microarchitecture for triplet-based hierarchical interconnection network[C]//Proc of the 14th IEEE Int Conf on High Performance Computing and Communication & 9th IEEE Int Conf on Embedded Software and Systems. Piscataway, NJ: IEEE, 2012: 295−302
|
[14] |
Wang Xiaojun, Shi Feng, Zhang Hong. KLSAT: An application mapping algorithm based on Kernighan–Lin partition and simulated annealing for a specific WK-recursive NoC architecture[C]//Proc of the 16th IFIP WG 10.3 Int Conf on Network and Parallel Computing. Berlin: Springer, 2019: 31−42
|
[15] |
Zhang Hong, Wang Xiaojun. KGT: An application mapping algorithm based on Kernighan–Lin partition and genetic algorithm for WK-recursive NoC architecture[C]//Proc of the 17th Int Conf on Intelligent Computing Theories and Application (ICIC). Berlin: Springer, 2021: 86−101
|
[16] |
Shi Feng, Li Jiaxin, Bai Ziru. Performance of triplet-based interconnection strategy for multi-core on-chip processors[C]//Proc of the 11th IEEE Int Conf on High Performance Computing and Communications. Piscataway, NJ: IEEE, 2009: 163−170
|
[17] |
Wang Xiaojun, Shi Feng, Zhang Hong. A novel speedup evaluation for multicore architecture based topology of on-chip memory[C]//Proc of the 10th Int Symp on Parallel Architectures, Algorithms and Programming. Berlin: Springer, 2019: 35−47
|
[18] |
Feng Yinxiao, Xiang Dong, Ma Kaishen. A scalable methodology for designing efficient interconnection network of chiplets [C]//Proc of the 2023 IEEE Int Symp on High-Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2023: 1059−1071
|
[19] |
Kasan H, Kim J. VVQ: Virtualizing virtual channel for cost-efficient protocol deadlock avoidance[C]//Proc of the 2023 IEEE Int Symp on High-Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2023: 1072−1084
|
[20] |
Romanov A Y, Myachin N M, Lezhnev E V, et al. Ring-Split: Deadlock-free routing algorithm for circulant networks on chip[J]. Micromachines, 2023, 14(1): 141 doi: 10.3390/mi14010141
|
[21] |
Kwauk G, Kang S, Kasan H, et al. Boomgate: Deadlock avoidance in non-minimal routing for high-radix networks[C]//Proc of the 2021 IEEE Int Sym on High-Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2021: 696−708
|
[22] |
Flich J, Skeie T, Mejia A, et al. A survey and evaluation of topology-agnostic deterministic routing algorithms[J]. IEEE Transactions on Parallel and Distributed Systems, 2011, 23(3): 405−425
|
[23] |
Wu Yibo, Wang Liang, Wang Xiaohang, et al. A deflection-based deadlock recovery framework to achieve high throughput for faulty NoCs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 40(10): 2170−2183
|
[24] |
Razavi S, Sarbazi-Azad H. The triangular pyramid: Routing and topological properties[J]. Information Sciences, 2010, 180(11): 2328−2339 doi: 10.1016/j.ins.2010.02.016
|
[25] |
Wang Nengchung, Yen Chengpang, Chu Chihping. Multicast communication in wormhole-routed symmetric networks with Hamiltonian cycle model[J]. Journal of Systems Architecture, 2005, 51(3): 165−183 doi: 10.1016/j.sysarc.2004.11.001
|
[26] |
El-Obaid A. Three-dimension Hamiltonian broadcast wormhole-routing[J]. International Journal of Computer Networks & Communications, 2015, 7(3): 31−46
|
[27] |
Chen Xu, Shi Feng, Yin Fei, et al. An efficient distributed minimal routing algorithm for triplet-based WK-recursive network[C]//Proc of the 20th IEEE Int Conf on High Performance Computing and Communications; 16th IEEE Int Conf on Smart City; 4th IEEE Int Conf on Data Science and Systems (HPCC/SmartCity/DSS). Piscataway, NJ: IEEE, 2018: 547−554
|
[28] |
Fan Dongrui, Zhang Hao, Wang Da, et al. Godson-T: An efficient many-core processor exploring thread-level parallelism[J]. IEEE Micro, 2012, 32(2): 38−47 doi: 10.1109/MM.2012.32
|
[29] |
Parasar M, Farrokhbakht H, Jerger N E, et al. DRAIN: Deadlock removal for arbitrary irregular networks[C]//Proc of the 2020 IEEE Int Symp on High Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2020: 447−460
|
[30] |
Dally W J, Towles B P. Principles and Practices of Interconnection Networks[M]. Amsterdam: Elsevier, 2004
|
[31] |
Ramey C. Tile-GX100 manycore processor: Acceleration interfaces and architecture[C/OL]//Proc of the 23rd IEEE on Hot Chips Symp. Piscataway, NJ: IEEE, 2011[2024-05-22]. https://ieeexplore.ieee.org/abstract/document/7477491
|
[32] |
Wentzlaff D, Griffin P, Hoffmann H, et al. On-chip interconnection architecture of the tile processor[J]. IEEE Micro, 2007, 27(5): 15−31 doi: 10.1109/MM.2007.4378780
|
[33] |
Sodani A, Gramunt R, Corbal J, et al. Knights landing: Second-generation intel Xeon Phi product[J]. IEEE Micro, 2016, 36(2): 34−46 doi: 10.1109/MM.2016.25
|
[34] |
Rahman R. Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers[M]. Berlin: Springer, 2013
|
[35] |
Qiao Baojun, Shi Feng, Ji Weixing. A new hierarchical interconnection network for multi-core processor[C]//Proc of the 2nd IEEE Conf on Industrial Electronics and Applications. Piscataway, NJ: IEEE, 2007: 246−250
|
[36] |
Wang Zuo, Shi Feng. A shortest path routing algorithm in triplet-based network[J]. Transactions of Beijing Institute of Technology, 2009, 29(5): 410−414
|
[37] |
Zhang Yang, Shi Feng, Ji Weixing, et al. SPORT: A shortest path routing algorithm for triplet-based hierarchical interconnection[J]. Transactions of Beijing institute of Technology, 2013, 33(1): 57−61
|
[38] |
Chen Xu, Shi Feng, Yin Fei, et al. A novel look-ahead routing algorithm based on graph theory for triplet-based network-on-chip router[J]. IEICE Electronics Express, 2018, 15(8): 1−12
|
[39] |
石峰,阮生强,尹飞,等. 基于尖端子图分类传输数据的基三核间网络防死锁方法:中国,CN112698960B[P]. 2022-08-09
Shi Feng, Ruan Shenqiang, Yin Fei, et al. A deadlock prevention method for inter-core networks based on apex-subgraph classification for data transmission in triplet-based architecture: China, CN112698960B[P]. 2022-08-09 (in Chinese)
|
[40] |
Tutte W T. Graph Theory[M]. Cambridge, UK: Cambridge University Press, 2001
|
[41] |
Levine G N. The classification of deadlock prevention and avoidance is erroneous[J]. ACM SIGOPS Operating Systems Review, 2005, 39(2): 47−50 doi: 10.1145/1055218.1055221
|
[42] |
Binkert N, Beckmann B, Black G, et al. The GEM5 simulator[J]. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1−7 doi: 10.1145/2024716.2024718
|
[43] |
Bharadwaj S, Yin J, Beckmann B, et al. Kite: A family of heterogeneous interposer topologies enabled via accurate interconnect modeling[C/OL]//Proc of the 57th ACM/IEEE Design Automation Conf (DAC). Piscataway, NJ: IEEE, 2020[2024-05-22]. https://ieeexplore.ieee.org/abstract/document/9218539
|
[44] |
Sun Cheng, Chen C H O, Kurian G, et al. DSENT―A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling[C]//Proc of the 6th IEEE/ACM Int Symp on Networks-on-Chip. Piscataway, NJ: IEEE, 2012: 201−210
|
[45] |
Zhan Xusheng, Bao Yungang, Bienia C, et al. PARSEC3.0: A multicore benchmark suite with network stacks and SPLASH-2X[J]. ACM SIGARCH Computer Architecture News, 2017, 44(5): 1−16
|
[46] |
Parasar M, Jerger N E, Gratz P V, et al. SWAP: Synchronized weaving of adjacent packets for network deadlock resolution[C]//Proc of the 52nd Annual IEEE/ACM Int Symp on Microarchitecture. New York: ACM, 2019: 873−885
|
[47] |
Joshi B, Thakur M K. Genetic algorithm-and cuckoo search algorithm-based routing optimizations in network-on-chip[J]. Arabian Journal for Science and Engineering, 2023, 48(8): 9635−9644 doi: 10.1007/s13369-022-07272-9
|
[48] |
Luo Yaoying, Meyer M C, Jiang Xin, et al. A hotspot-pattern-aware routing algorithm for networks-on-chip[C]//Proc of the 13th IEEE Int Symp on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). Piscataway, NJ: IEEE, 2019: 229−235
|
[1] | Li Song, Cao Wenqi, Hao Xiaohong, Zhang Liping, Hao Zhongxiao. Collective Spatial Keyword Query Based on Time-Distance Constrained and Cost Aware[J]. Journal of Computer Research and Development, 2025, 62(3): 808-819. DOI: 10.7544/issn1000-1239.202330815 |
[2] | Ren Hao, Liu Baisong, Sun Jinyang, Dong Qian, Qian Jiangbo. A Time and Relation-Aware Graph Collaborative Filtering for Cross-Domain Sequential Recommendation[J]. Journal of Computer Research and Development, 2023, 60(1): 112-124. DOI: 10.7544/issn1000-1239.202110545 |
[3] | Zhang Li, Zhang Shukui, Liu Hai, Zhang Yang, Tao Ye, Long Hao, Yu Chunqing, Zhu Qiding. Task Distribution Based on User Attention and Time Supervision[J]. Journal of Computer Research and Development, 2022, 59(4): 813-825. DOI: 10.7544/issn1000-1239.20200565 |
[4] | Wang Chao, Chen Xianglan, Zhang Bo, Li Xi, Wang Chao, Zhou Xuehai. A Real-Time Processor Model with Timing Semantics[J]. Journal of Computer Research and Development, 2021, 58(6): 1176-1191. DOI: 10.7544/issn1000-1239.2021.20210157 |
[5] | Cui Yuanning, Li Jing, Shen Li, Shen Yang, Qiao Lin, Bo Jue. Duration-HyTE: A Time-Aware Knowledge Representation Learning Method Based on Duration Modeling[J]. Journal of Computer Research and Development, 2020, 57(6): 1239-1251. DOI: 10.7544/issn1000-1239.2020.20190253 |
[6] | Wang Chong, Lü Yinrun, Chen Li, Wang Xiuli, Wang Yongji. Survey on Development of Solving Methods and State-of-the-Art Applications of Satisfiability Modulo Theories[J]. Journal of Computer Research and Development, 2017, 54(7): 1405-1425. DOI: 10.7544/issn1000-1239.2017.20160303 |
[7] | Zhou Hang, Huang Zhiqiu, Zhu Yi, Xia Liang, Liu Linyuan. Real-Time Systems Contact Checking and Resolution Based on Time Petri Net[J]. Journal of Computer Research and Development, 2012, 49(2): 413-420. |
[8] | Wang Xiaoping, Luo Jun, Shen Changxiang. Theory and Algorithms on Localization in Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2011, 48(3): 353-363. |
[9] | Zhou Hang, Huang Zhiqiu, Hu Jun, Zhu Yi. Real-Time System Resource Conflict Checking Based on Time Petri Nets[J]. Journal of Computer Research and Development, 2009, 46(9): 1578-1585. |
[10] | Xu Chaonong, Xu Yongjun, Li Xiaowei. New Time Synchronization Techniques for Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2008, 45(1): 138-145. |