Citation: | Wang Yanwei, Li Rengang, Xu Ran, Liu Junkai. Data Center Heterogeneous Acceleration Software-Hardware System-Level Platform Based on Reconfigurable Architecture[J]. Journal of Computer Research and Development, 2025, 62(4): 963-977. DOI: 10.7544/issn1000-1239.202440041 |
Constructing a software and hardware system-level prototype platform for accelerating data center services requires the consideration of factors such as high computing power, scalability, flexibility, and low cost. To enhance data center capabilities, research from the perspective of software-hardware synergy has been conducted on the innovation of heterogeneous computing in cloud platform architecture, hardware implementation, high-speed interconnection, and applications. A reconfigurable and combinable software-hardware acceleration prototype system is designed and built to simplify existing processor-centric system-level computing platform construction methods, enabling rapid deployment and system-level prototype validation of target software-hardware designs. To achieve these objectives, methods such as decoupled reconfigurable architecture device virtualization and remote mapping are utilized to uncover the potential of independent computing units. An ISOF (independent system of FPGA) software-hardware computing platform system is constructed to surpass the capabilities of conventional server designs, enabling low-cost and efficient expansion of computing units while allowing clients to flexibly utilize peripheral resources. To address system-level communication challenges, a communication hardware platform and interaction mechanism between computing units are designed. Additionally, to enhance the agility of the software-hardware system-level platform, ISOF provides a flexible and unified invocation interface. Finally, through the analysis and evaluation of the system-level objectives of the platform, it has been verified that the platform meets the current computing and acceleration requirements, ensuring high-speed, low-latency communication, as well as good throughput and efficient elastic scalability. In addition, improvements have been made in congestion avoidance and packet recovery mechanisms based on high-speed communication, meeting the stability requirements of communication at data center scale.
[1] |
Zhu Zongwei, Zhang Junneng, Zhao Jinjin, et al. A hardware and software task-scheduling framework based on CPU+FPGA heterogeneous architecture in edge computing[J]. IEEE Access, 2019, 7: 148975−148988 doi: 10.1109/ACCESS.2019.2943179
|
[2] |
Choi Y, Cong J, Fang Zhenman, et al. A quantitative analysis on microarchitectures of modern CPU-FPGA platforms[C/OL]//Proc of the 53rd Annual Design Automation Conf. New York: ACM, 2016[2024-07-09]. https://dl.acm.org/doi/abs/10.1145/2897937.2897972
|
[3] |
Man Xingchen, Zhu Jianfeng, Song Guihuan, et al. CaSMap: Agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process[C]//Proc of the 49th Annual Int Symp on Computer Architecture. New York: ACM, 2022: 259−273
|
[4] |
齐乐,常轶松,陈欲晓,等. 基于SoC-FPGA的RISC-V处理器软硬件系统级平台[J]. 计算机研究与发展,2023,60(6):1204−1215 doi: 10.7544/issn1000-1239.202330060
Qi Le, Chang Yisong, Chen Yuxiao, et al. A system-level platform with SoC-FPGA for RISC-V hardware-software integration[J]. Journal of Computer Research and Development, 2023, 60(6): 1204−1215 (in Chinese) doi: 10.7544/issn1000-1239.202330060
|
[5] |
Zha Yue, Li Jing. Virtualizing FPGAs in the cloud[C]//Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 845−858
|
[6] |
Chung E, Fowers J, Ovtcharov K, et al. Serving DNNs in real time at datacenter scale with project brainwave[J]. IEEE Micro, 2018, 38(2): 8−20 doi: 10.1109/MM.2018.022071131
|
[7] |
Suda N, Chandra V, Dasika G, et al. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]//Proc of the 2016 ACM/SIGDA Int Symp Field-Programmable Gate Arrays. New York: ACM, 2016: 16−25
|
[8] |
Zhang Jialiang, Li Jing. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network[C]//Proc of the 2017 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2017: 25−34
|
[9] |
Tine B, Yalamarthy K P, Elsabbagh F, et al. Vortex: Extending the RISC-V ISA for GPGPU and 3D-graphics[C]//Proc of the 54th Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2021: 754−766
|
[10] |
Caulfield A M, Chung E S, Putnam A, et al. A cloud-scale acceleration architecture[C/OL]//Proc of the 49th Annual IEEE/ACM Int Symp on microarchitecture (MICRO). Piscataway, NJ: IEEE, 2016[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/7783710
|
[11] |
Amazon Web Services EC2. FPGA hardware and software development kit[EB/OL]. [2023-01-28]. https://github.com/aws/aws-fpga
|
[12] |
Tarafdar N, Thomas L, Fukuda E, et al. Enabling flexible network FPGA clusters in a heterogeneous cloud data center[C]//Proc of the 2017 ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2017: 237−246
|
[13] |
Shu Ran, Cheng Peng, Chen Guo, et al. Direct universal access: Making data center resources available to FPGA[C]//Proc of the 16th USENIX Symp on Networked Systems Design and Implementation (NSDI 19). Berkeley, CA: USENIX Association, 2019: 127−140
|
[14] |
Yu Xiaoyu, Wang Yuwei, Miao Jie, et al. A data-center FPGA acceleration platform for convolutional neural networks[C]//Proc of the 29th Int Conf on Field Programmable Logic and Applications (FPL). Piscataway, NJ: IEEE, 2019: 151−158
|
[15] |
Choi Y K, Jason C, Fang Zheman, et al. In-depth analysis on microarchitectures of modern heterogeneous CPU-FPGA platforms[J]. ACM Transactions on Reconfigurable Technology and Systems, 2019, 12(1): 1−20
|
[16] |
Fleming K, Adler M. The LEAP FPGA Operating System[M]//FPGAs for Software Programmers. Berlin: Springer, 2016: 245−258
|
[17] |
Khawaja A, Landgraf J, Prakash R, et al. Sharing, protection, and compatibility for reconfigurable fabric with AmorphOS[C]//Proc of the 13th USENIX Symp on Operating Systems Design and Implementation (OSDI’18). Berkeley, CA: USENIX Association, 2018: 107−127
|
[18] |
Baxter R, Booth S, Bull M, et al. Maxwell-a 64 FPGA supercomputer[C]//Proc of the 2nd NASA/ESA Conf on Adaptive Hardware and Systems (AHS 2007). Piscataway, NJ: IEEE, 2007: 287−294
|
[19] |
Jeremy F, Kalin O, Michael P, et al. A configurable cloud-scale DNN processor for real-time AI[C/OL]//Proc of the 45th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2018[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/8416814
|
[20] |
Ouyang J, Shiding L, Qi Wei, et al. SDA: Software-defined accelerator for large-scale DNN systems[C]//Proc of the 26th IEEE Hot Chips Symp (HCS). Piscataway, NJ: IEEE, 2014: 10–12
|
[21] |
Vesper M, Koch D, Vipin K, et al. JetStream: An open-source high-performance PCI express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication[C/OL]//Proc of the 26th Int Conf on Field Programmable Logic and Applications (FPL). Piscataway, NJ: IEEE, 2016[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/7577334
|
[22] |
Jacobsen, M, Richmond, D, Hogains, M, et al. RIFFA 2.1: A reusable integration framework for FPGA accelerators[J]. ACM Transactions on Reconfigurable Technology and Systems, 2015, 8(4): 1−23
|
[23] |
Zeke W, Zhang Shuhao, He Bingsheng, et al. Melia: A MapReduce framework on OpenCL-Based FPGAs[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(12): 3547−3560 doi: 10.1109/TPDS.2016.2537805
|
[24] |
Sharma D D, Blankenship R, Berger D S. An introduction to the compute express link (CXL) interconnect[J]. arXiv preprint, arXiv: 2306.11227, 2023
|
[25] |
Wang Fu, Yan Fulong, Xue Xuwei, et al. Traffic load balancing based on probabilistic routing in data center networks[C/OL]//Proc of the Int Conf on Optical Network Design and Modeling (ONDM). Piscataway, NJ: IEEE, 2020[2024-07-09]. https://ieeexplore.ieee.org/abstract/document/9133002
|
[26] |
Mittal R, Shpiner A, Panda A, et al. Revisiting network support for RDMA[C]//Proc of the 2018 Conf of the ACM Special Interest Group on Data Communication. New York: ACM, 2018: 313−326
|
[27] |
Biookaghazadeh S, Zhao Ming, Ren Fengbo. Are FPGAs suitable for edge computing[C]//Proc of the USENIX Workshop on Hot Topics in Edge Computing (HotEdge’18). Berkeley, CA: USENIX Association, 2018. https://www.usenix.org/conference/hotedge18/presentation/biookaghazadeh
|
[28] |
Belabed T, Coutinho M G F, Fernandes M A C, et al. User driven FPGA-based design automated framework of deep neural networks for low-power low-cost edge computing[J]. IEEE Access, 2021, 9: 89162−89180 doi: 10.1109/ACCESS.2021.3090196
|
[29] |
Ross S M. Introduction to Probability Models[M]. Amsterdam, Netherlands: Elsevier, 2014
|
[30] |
段田田,郭仪,李博,等. PieBridge:一种按需可扩展的跨链架构[J]. 计算机研究与发展,2023,60(11):2520−2533 doi: 10.7544/issn1000-1239.202230284
Duan Tiantian, Guo Y, Li Bo, et al. PieBridge: An on-demand scalable cross-chain architecture[J]. Journal of Computer Research and Development, 2023, 60(11): 2520−2533 (in Chinese) doi: 10.7544/issn1000-1239.202230284
|
[31] |
张帆,胡成臣. 一种适用突发流量的数据中心网络流调度策略[J]. 软件学报,2018,28(s2):81−89
Zhang Fan, Hu Chengchen. Flow scheduling policy for burst traffic in data center networks[J]. Journal of Software, 2018, 28(s2): 81−89 (in Chinese)
|
[1] | Attention-enhanced Semantic Fusion Knowledge Graph Representation Learning Framework[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440669 |
[2] | Ke Jing, Xie Zheyong, Xu Tong, Chen Yuhao, Liao Xiangwen, Chen Enhong. An Implicit Semantic Enhanced Fine-Grained Fake News Detection Method Based on Large Language Models[J]. Journal of Computer Research and Development, 2024, 61(5): 1250-1260. DOI: 10.7544/issn1000-1239.202330967 |
[3] | Zhang Wenhan, Liu Xiaoming, Yang Guan, Liu Jie. Cross-Domain Named Entity Recognition of Multi-Level Structured Semantic Knowledge Enhancement[J]. Journal of Computer Research and Development, 2023, 60(12): 2864-2876. DOI: 10.7544/issn1000-1239.202220413 |
[4] | Qi Peng, Cao Juan, Sheng Qiang. Semantics-Enhanced Multi-Modal Fake News Detection[J]. Journal of Computer Research and Development, 2021, 58(7): 1456-1465. DOI: 10.7544/issn1000-1239.2021.20200804 |
[5] | Wu Famin, Lü Guangyi, Liu Qi, He Ming, Chang Biao, He Weidong, Zhong Hui, Zhang Le. Deep Semantic Representation of Time-Sync Comments for Videos[J]. Journal of Computer Research and Development, 2019, 56(2): 293-305. DOI: 10.7544/issn1000-1239.2019.20170752 |
[6] | Cheng Xiaoyang, Zhan Yongzhao, Mao Qirong, Zhan Zhicai. Video Semantic Analysis Based on Topographic Sparse Pre-Training CNN[J]. Journal of Computer Research and Development, 2018, 55(12): 2703-2714. DOI: 10.7544/issn1000-1239.2018.20170579 |
[7] | Yang Lin, Zhang Libo, Luo Tiejian, Wan Qiyang, Wu Yanjun. Knowledge Schematization Method Based on Link and Semantic Relationship[J]. Journal of Computer Research and Development, 2017, 54(8): 1655-1664. DOI: 10.7544/issn1000-1239.2017.20170177 |
[8] | Han Jun, Fan Ju, Zhou Lizhu. Semantic-Enhanced Spatial Keyword Search[J]. Journal of Computer Research and Development, 2015, 52(9): 1954-1964. DOI: 10.7544/issn1000-1239.2015.20140686 |
[9] | Ma Yuchi, Yang Ning, Xie Lin, Li Chuan, and Tang Changjie. Social Roles Discovery of Moving Objects Based on Spatial-Temporal Associated Semantics and Temporal Entropy of Trajectories[J]. Journal of Computer Research and Development, 2012, 49(10): 2153-2160. |
[10] | Liu Yanan, Wu Fei, and Zhuang Yueting. Video Semantics Mining Using Multi-Modality Subspace Correlation Propagation[J]. Journal of Computer Research and Development, 2009, 46(1): 1-8. |
1. |
台建玮,杨双宁,王佳佳,李亚凯,刘奇旭,贾晓启. 大语言模型对抗性攻击与防御综述. 计算机研究与发展. 2025(03): 563-588 .
![]() | |
2. |
布文茹,王昊,李晓敏,周抒,邓三鸿. 古诗词中的探赜索隐:决策层融合大模型修正的典故引用识别方法. 科技情报研究. 2024(04): 37-52 .
![]() | |
3. |
付志远,陈思宇,陈骏帆,海翔,石岩松,李晓琦,李益红,岳秋玲,张玉清. 大语言模型安全的挑战与机遇. 信息安全学报. 2024(05): 26-55 .
![]() |