Citation: | Wang Yanwei, Li Rengang, Xu Ran, Liu Junkai. Data Center Heterogeneous Acceleration Software-Hardware System-Level Platform Based on Reconfigurable Architecture[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440041 |
Constructing a software and hardware system-level prototype platform for accelerating data center services requires the consideration of factors such as high computing power, scalability, flexibility, and low cost. To enhance data center capabilities, research from the perspective of software-hardware synergy has been conducted on the innovation of heterogeneous computing in cloud platform architecture, hardware implementation, high-speed interconnection, and applications. A reconfigurable and combinable software-hardware acceleration prototype system is designed and built to simplify existing processor-centric system-level computing platform construction methods, enabling rapid deployment and system-level prototype validation of target software-hardware designs. To achieve these objectives, methods such as decoupled reconfigurable architecture device virtualization and remote mapping are utilized to uncover the potential of independent computing units. An ISOF (independent system of FPGA) software-hardware computing platform system is constructed to surpass the capabilities of conventional server designs, enabling low-cost and efficient expansion of computing units while allowing clients to flexibly utilize peripheral resources. To address system-level communication challenges, a communication hardware platform and interaction mechanism between computing units are designed. Additionally, to enhance the agility of the software-hardware system-level platform, ISOF provides a flexible and unified invocation interface. Finally, through the analysis and evaluation of the system-level objectives of the platform, it has been verified that the platform meets the current computing and acceleration requirements, ensuring high-speed, low-latency communication, as well as good throughput and efficient elastic scalability. In addition, improvements have been made in congestion avoidance and packet recovery mechanisms based on high-speed communication, meeting the stability requirements of communication at data center scale.
[1] |
Zhu Zongwei, Zhang Junneng, Zhao Jinjin, et al. A hardware and software task-scheduling framework based on CPU+FPGA heterogeneous architecture in edge computing[J]. IEEE Access, 2019, 7: 148975−148988 doi: 10.1109/ACCESS.2019.2943179
|
[2] |
Choi Y, Cong J, Fang Zhenman, et al. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms[C/OL]//Proc of the 53rd Annual Design Automation Conf. New York: ACM, 2016[2024-07-09]. https://dl.acm.org/doi/abs/10.1145/2897937.2897972
|
[3] |
Man Xingchen, Zhu Jianfeng, Song Guihuan, et al. CaSMap: Agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process[C]//Proc of the 49th Annual Int Symp on Computer Architecture. New York: ACM, 2022: 259−273
|
[4] |
齐乐,常轶松,陈欲晓,等. 基于SoC-FPGA的RISC-V处理器软硬件系统级平台[J]. 计算机研究与发展,2023,60(6):1204−1215 doi: 10.7544/issn1000-1239.202330060
Qi Le, Chang Yisong, Chen Yuxiao, et al. A system-level platform with SoC-FPGA for RISC-V hardware-software integration[J]. Journal of Computer Research and Development, 2023, 60(6): 1204−1215 (in Chinese) doi: 10.7544/issn1000-1239.202330060
|
[5] |
Zha Yue, Li Jing. Virtualizing FPGAs in the cloud[C]//Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 845−858
|
[6] |
Chung E, Fowers J, Ovtcharov K, et al. Serving DNNs in real time at datacenter scale with project brainwave[J]. IEEE Micro, 2018, 38(2): 8−20 doi: 10.1109/MM.2018.022071131
|
[7] |
Suda N, Chandra V, Dasika G, et al. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]//Proc of the 2016 ACM/SIGDA Int Symp Field-Programmable Gate Arrays. New York: ACM, 2016: 16−25
|
[8] |
Zhang Jialiang, Li Jing. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network[C]//Proc of the 2017 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2017: 25−34
|
[9] |
Tine B, Yalamarthy K P, Elsabbagh F, et al. Vortex: Extending the RISC-V ISA for GPGPU and 3D-graphics[C]//Proc of the 54th Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2021: 754−766
|
[10] |
Caulfield A M, Chung E S, Putnam A, et al. A cloud-scale acceleration architecture[C/OL]//Proc of the 49th Annual IEEE/ACM Int Symp on microarchitecture (MICRO). Piscataway, NJ: IEEE, 2016[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/7783710
|
[11] |
Amazon Web Services EC2. FPGA hardware and software development kit[EB/OL]. [2023-01-28]. https://github.com/aws/aws-fpga
|
[12] |
Tarafdar N, Thomas L, Fukuda E, et al. Enabling flexible network FPGA clusters in a heterogeneous cloud data center[C]//Proc of the 2017 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2017: 237−246
|
[13] |
Shu Ran, Cheng Peng, Chen Guo, et al. Direct universal access: Making data center resources available to FPGA[C]//Proc of the 16th USENIX Symp on Networked Systems Design and Implementation (NSDI 19). Berkeley, CA: USENIX Association, 2019: 127−140
|
[14] |
Yu Xiaoyu, Wang Yuwei, Miao Jie, et al. A data-center FPGA acceleration platform for convolutional neural networks[C]//Proc of the 29th Int Conf on Field Programmable Logic and Applications (FPL). Piscataway, NJ: IEEE, 2019: 151−158
|
[15] |
Choi Y K, Jason C, Fang Zheman, et al. In-depth analysis on microarchitectures of modern heterogeneous CPU-FPGA platforms[J]. ACM Transactions on Reconfigurable Technology and Systems, 2019, 12(1): 1−20
|
[16] |
Fleming K, Adler M. The LEAP FPGA Operating System[M]//FPGAs for Software Programmers. Berlin: Springer, 2016: 245−258
|
[17] |
Khawaja A, Landgraf J, Prakash R, et al. Sharing, protection, and compatibility for reconfigurable fabric with AmorphOS[C]//Proc of the 13th USENIX Symp on Operating Systems Design and Implementation (OSDI'18). Berkeley, CA: USENIX Association, 2018: 107−127
|
[18] |
Baxter R, Booth S, Bull M, et al. Maxwell-a 64 FPGA supercomputer[C]//Proc of the 2nd NASA/ESA Conf on Adaptive Hardware and Systems (AHS 2007). Piscataway, NJ: IEEE, 2007: 287−294
|
[19] |
Jeremy F, Kalin O, Michael P, et al. A configurable cloud-scale DNN processor for real-time AI[C/OL]//Proc of the 45th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2018[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/8416814
|
[20] |
Ouyang J, Shiding L, Qi Wei, et al. SDA: Software-defined accelerator for large-scale DNN systems[C]//Proc of the 26th IEEE Hot Chips Symp (HCS). Piscataway, NJ: IEEE, 2014: 10–12
|
[21] |
Vesper M, Koch D, Vipin K, et al. JetStream: An open-source high-performance PCI express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication[C/OL]//Proc of the 26th Int Conf on Field Programmable Logic and Applications (FPL). Piscataway, NJ: IEEE, 2016[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/7577334
|
[22] |
Jacobsen, M, Richmond, D, Hogains, M, et al. RIFFA 2.1: A reusable integration framework for FPGA accelerators[J]. ACM Transactions on Reconfigurable Technology and Systems, 2015, 8(4): 1−23
|
[23] |
Zeke W, Zhang Shuhao, He Bingsheng, et al. Melia: A MapReduce framework on OpenCL-Based FPGAs[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(12): 3547−3560 doi: 10.1109/TPDS.2016.2537805
|
[24] |
Sharma D D, Blankenship R, Berger D S. An introduction to the compute express link (CXL) interconnect[J]. arXiv preprint, arXiv: 2306.11227, 2023
|
[25] |
Wang Fu, Yan Fulong, Xue Xuwei, et al. Traffic load balancing based on probabilistic routing in data center networks[C/OL]//Proc of the Int Conf on Optical Network Design and Modeling (ONDM). Piscataway, NJ: IEEE, 2020[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/9133002
|
[26] |
Mittal R, Shpiner A, Panda A, et al. Revisiting network support for RDMA[C]//Proc of the 2018 Conf of the ACM Special Interest Group on Data Communication. New York: ACM, 2018: 313−326
|
[27] |
Biookaghazadeh S, Zhao Ming, Ren Fengbo. Are FPGAs suitable for edge computing[C]//Proc of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge'18). Berkeley, CA: USENIX Association, 2018. https://www.usenix.org/conference/hotedge18/presentation/biookaghazadeh
|
[28] |
Belabed T, Coutinho M G F, Fernandes M A C, et al. User driven FPGA-based design automated framework of deep neural networks for low-power low-cost edge computing[J]. IEEE Access, 2021, 9: 89162−89180 doi: 10.1109/ACCESS.2021.3090196
|
[29] |
Ross S M. Introduction to Probability Models[M]. Amsterdam, Netherlands: Elsevier, 2014
|
[30] |
段田田,郭仪,李博,等. PieBridge:一种按需可扩展的跨链架构[J]. 计算机研究与发展,2023,60(11):2520−2533 doi: 10.7544/issn1000-1239.202230284
Duan Tiantian, Guo Y, Li Bo, et al. PieBridge: An on-demand scalable cross-chain architecture[J]. Journal of Computer Research and Development, 2023, 60(11): 2520−2533 (in Chinese) doi: 10.7544/issn1000-1239.202230284
|
[31] |
张帆,胡成臣. 一种适用突发流量的数据中心网络流调度策略[J]. 软件学报,2018,28(s2):81−89
Zhang Fan, Hu Chengchen. Flow scheduling policy for burst traffic in data center networks[J]. Journal of Software, 2018, 28(s2): 81−89 (in Chinese)
|
[1] | Qian Zhongsheng, Huang Heng, Zhu Hui, Liu Jinping. Multi-Perspective Graph Contrastive Learning Recommendation Method with Layer Attention Mechanism[J]. Journal of Computer Research and Development, 2025, 62(1): 160-178. DOI: 10.7544/issn1000-1239.202330804 |
[2] | Zhang Jinyu, Ma Chenxi, Li Chao, Zhao Zhongying. Towards Lightweight Cross-Domain Sequential Recommendation via Tri-Branches Graph External Attention Network[J]. Journal of Computer Research and Development, 2024, 61(8): 1930-1944. DOI: 10.7544/issn1000-1239.202440197 |
[3] | Xie Jun, Wang Yuzhu, Chen Bo, Zhang Zehua, Liu Qin. Aspect-Based Sentiment Analysis Model with Bi-Guide Attention Network[J]. Journal of Computer Research and Development, 2022, 59(12): 2831-2843. DOI: 10.7544/issn1000-1239.20210708 |
[4] | Qian Zhongsheng, Yang Jiaxiu, Li Duanming, Ye Zulai. Event Recommendation Strategy Combining User Long-Short Term Interest and vent Influence[J]. Journal of Computer Research and Development, 2022, 59(12): 2803-2815. DOI: 10.7544/issn1000-1239.20210693 |
[5] | Sun Qian, Xue Leiqi, Gao Ling, Wang Hai, Wang Yuxiang. Selection of Network Defense Strategies Based on Stochastic Game and Tabu Search[J]. Journal of Computer Research and Development, 2020, 57(4): 767-777. DOI: 10.7544/issn1000-1239.2020.20190870 |
[6] | Xu Jinghang, Zuo Wanli, Liang Shining, Wang Ying. Causal Relation Extraction Based on Graph Attention Networks[J]. Journal of Computer Research and Development, 2020, 57(1): 159-174. DOI: 10.7544/issn1000-1239.2020.20190042 |
[7] | Sun Xiaowan, Wang Ying, Wang Xin, Sun Yudong. Aspect-Based Sentiment Analysis Model Based on Dual-Attention Networks[J]. Journal of Computer Research and Development, 2019, 56(11): 2384-2395. DOI: 10.7544/issn1000-1239.2019.20180823 |
[8] | Zhang Han, Guo Yuanbo, Li Tao. Domain Named Entity Recognition Combining GAN and BiLSTM-Attention-CRF[J]. Journal of Computer Research and Development, 2019, 56(9): 1851-1858. DOI: 10.7544/issn1000-1239.2019.20180733 |
[9] | Guo Chi, Wang Lina, Guan Yiping, Zhang Xiaoying. A Network Immunization Strategy Based on Dynamic Preference Scan[J]. Journal of Computer Research and Development, 2012, 49(4): 717-724. |
[10] | Wang Bailing, Fang Binxing, Yun Xiaochun, Zhang Hongli, Chen Bo, Liu Yixuan. A New Friendly Worm Propagation Strategy Based on Diffusing Balance Tree[J]. Journal of Computer Research and Development, 2006, 43(9): 1593-1602. |