Citation: | Lin Hanyue, Wu Jingya, Lu Wenyan, Zhong Langhui, Yan Guihai. Neptune: A Framework for Generic Network Processor Microarchitecture Modeling and Performance Simulation[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440084 |
Network packet processing is a fundamental function of network devices, involving tasks such as packet modification, checksum and Hash computation, mirroring, filtering, and packet metering. As a domain-specific processor, network processor (NP) can provide line-rate performance and programmability for network packet processing. However, due to different design requirement, architecture of NP differs, including single-phase NP and multi-phase NP, posing challenges for NP designers. Existing simulation methods mainly target single NP or single architecture and are not available to explore both of the architectures. We propose Neptune, an analyzing framework for generic network processor microarchitecture modeling and performance simulation. Based on detailed analysis, Neptune adopts multi-phase NP architecture as the hardware model while providing ability to simulate single-phase architecture. Besides, Neptune employs event list mechanism and inter-core queues to support simulation of different data paths and various scheduling strategies in multi-phase NP. Furthermore, Neptune utilizes bulk synchronous parallel graph computing mechanism and takes advantage of both event-driven and time-driven simulation, ensuring accuracy and efficiency. Our experiment shows that Neptune achieves over 95% accuracy in simulating both of the architectures and simulates network processors at a performance of 3.31 MIPS, achieving an order of magnitude improvement over PFPSim. We illustrate the universality and capability of the Neptune simulation framework through three specific cases. Firstly, we evaluate multi-phase and single-phase NP, showing that single-phase NP can achieve up to a 1.167 times performance improvement. Secondly, we optimize the packet parsing module using a programmable pipeline and analyze its performance differences. Finally, we use Neptune to test the performance of the network packet processing engine under different thread counts, providing insights for software and hardware multi-threading optimization.
[1] |
Gadre G, Badhe S, Kulkarni K. Network processor—A simplified approach for transport layer offloading on NIC[C]//Proc of the 2016 Int Conf on Advances in Computing, Communications and Informatics (ICACCI). Piscataway, NJ: IEEE, 2016: 2542−2548
|
[2] |
Yang Mingran, Baban A, Kugel V, et al. Using trio: Juniper networks’ programmable chipset-for emerging in-network applications[C]//Proc of the ACM SIGCOMM 2022 Conf. New York: ACM, 2022: 633−648
|
[3] |
Krude J, Rüth J, Schemmel D, et al. Determination of throughput guarantees for processor-based smartnics[C]//Proc of the 17th Int Conf on Emerging Networking Experiments and Technologies. New York: ACM, 2021: 267−281
|
[4] |
赵玉宇,程光,刘旭辉,等. 下一代网络处理器及应用综述[J]. 软件学报,2021,32(2):445−474
Zhao Yuyu, Cheng Guang, Liu Xuhui, et al. Survey and applications of next generation network processor[J]. Journal of Software, 2021, 32(2): 445−474 (in Chinese)
|
[5] |
鄢贵海,卢文岩,李晓维,等. 专用处理器比较分析[J]. 中国科学:信息科学,2022,52(2):358−375 doi: 10.1360/SSI-2021-0274
Yan Guihai, Lu Wenyan, Li Xiaowei, et al. Comparative study of the domain-specific processors[J]. SCIENTIA SINICA Informationis, 2022, 52(2): 358−375 (in Chinese) doi: 10.1360/SSI-2021-0274
|
[6] |
Luo Yan, Yang Jun, Bhuyan L N, et al. NePSim: A network processor simulator with a power evaluation framework[J]. IEEE Micro, 2004, 24(5): 34−44 doi: 10.1109/MM.2004.52
|
[7] |
Abdi S, Aftab U, Bailey G, et al. PFPSim: A programmable forwarding plane simulator[C]//Proc of the 2016 Symp on Architectures for Networking and Communications Systems. New York: ACM, 2016: 55−60
|
[8] |
Bosshart P, Gibb G, Kim H S, et al. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(4): 99−110 doi: 10.1145/2534169.2486011
|
[9] |
Moon Y G, Lee S E, Jamshed M A, et al. AccelTCP: Accelerating network applications with stateful TCP offloading[C]//Proc of the 17th USENIX Symp on Networked Systems Design and Implementation (NSDI’20). Berkeley, CA: USENIX Association, 2020: 77−92
|
[10] |
Choi S, Shahbaz M, Prabhakar B, et al. λ-nic: Interactive serverless compute on programmable smartnics[C]//Proc of the 40th Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE, 2020: 67−77
|
[11] |
Xi Shaoke, Li Fuliang, Wang Xingwei. FlowValve: Packet scheduling offloaded on NP-based SmartNICs[C]//Proc of the 42nd Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE, 2022: 347−358
|
[12] |
Hypolite J, Sonchack J, Hershkop S, et al. DeepMatch: Practical deep packet inspection in the data plane using network processors[C]//Proc of the 16th Int Conf on Emerging Networking Experiments and Technologies. New York: ACM, 2020: 336−350
|
[13] |
Cisco. Cisco Silicon One P100 processor data sheet [EB/OL]. (2021-10-25)[2024-01-18]. https://www.cisco.com/c/en/us/solutions/collateral/silicon-one/silicon-one-p100-processor-ds.html
|
[14] |
Vlachos K, Orphanoudakis T, Papaeftathiou Y, et al. Design and performance evaluation of a programmable packet processing engine (PPE) suitable for high-speed network processors units[J]. Microprocessors and Microsystems, 2007, 31(3): 188−199 doi: 10.1016/j.micpro.2006.09.001
|
[15] |
刘思远,任敏华,谷航平. 基于硬件多线程机制的网络处理器微引擎设计[J]. 微型电脑应用,2022,38(2):106−108
Liu Siyuan, Ren Minhua, Gu Hangping. Design of network processor micro-engine based on hardware multi-threading mechanism[J]. Microcomputer Application, 2022, 38(2): 106−108 (in Chinese)
|
[16] |
Chole S, Fingerhut A, Ma Sha, et al. dRMT: Disaggregated programmable switching[C]//Proc of the 2017 Conf of the ACM Special Interest Group on Data Communication. New York: ACM, 2017: 1−14
|
[17] |
Sundar N, Burres B, Li Yadong, et al. 9.4 An in-depth look at the Intel IPU E2000[C]//Proc of the 2023 IEEE Int Solid-State Circuits Conf (ISSCC). Piscataway, NJ: IEEE, 2023: 162−164
|
[18] |
Netronome. NFP−4000 theory of operation[EB/OL]. 2018[2024-01-18]. https://d3ncevyc0dfnh8.cloudfront.net/media/documents/WP_NFP4000_TOO.pdf
|
[19] |
Yazdinejad A, Parizi R M, Bohlooli A, et al. A high-performance framework for a network programmable packet processor using P4 and FPGA[J]. Journal of Network and Computer Applications, 2020, 156: 102564 doi: 10.1016/j.jnca.2020.102564
|
[20] |
李韬,杨惠,厉俊男 等. ChipletNP:基于芯粒的敏捷可定制网络处理器架构[J]. 计算机研究与发展,2024,61(12):2952−2968
Li Tao, Yang Hui, Li Junnan, et al. ChipletNP: Chiplet-based agile customizable network processor architecture[J]. Journal of Computer Research and Development, 2024, 61(12): 2952−2968
|
[21] |
Ahmadi M, Wong S. A performance model for network processor architectures in packet processing system[C]//Proc of the 19th IASTED Int Conf on Parallel and Distributed Computing and Systems. Calgary, AB, Canada: ACTA Press, 2007: 176−181
|
[22] |
Keslassy I, Kogan K, Scalosub G, et al. Providing performance guarantees in multipass network processors[J]. IEEE/ACM Transactions on Networking, 2012, 20(6): 1895−1909 doi: 10.1109/TNET.2012.2186979
|
[23] |
Zolfaghari H, Mustafa H, Nurmi J. Run-to-completion versus pipelined: The case of 100 Gbps packet parsing[C/OL]//Proc of the 22nd Int Conf on High Performance Switching and Routing (HPSR). Piscataway, NJ: IEEE, 2021[2024-01-18]. https://ieeexplore.ieee.org/abstract/document/9481797
|
[24] |
Wehrie K, Gunes M, Gross J. Modeling and Tools for Network Simulation[M]. Berlin: Springer, 2010
|
[25] |
Fan Chengze, Bi Jun, Zhou Yu, et al. NS4: A P4-driven network simulator[C]//Proc of the 2017 SIGCOMM Posters and Demos. New York: ACM, 2017: 105−107
|
[26] |
Gao Kaihui, Chen Li, Li Dan, et al. Dons: Fast and affordable discrete event network simulation with automatic parallelization[C]//Proc of the ACM SIGCOMM 2023 Conf. New York: ACM, 2023: 167−181
|
[27] |
Ahn J H, Li Sheng, Seongil O, et al. McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling[C]//Proc of the 2013 IEEE Int Symp on Performance Analysis of Systems and Software (ISPASS). Piscataway, NJ: IEEE, 2013: 74−85
|
[28] |
Ren Pengju, Lis M, Cho M H, et al. HORNET: A cycle-level multicore simulator[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2012, 31(6): 890−903 doi: 10.1109/TCAD.2012.2184760
|
[29] |
Qureshi Y M, Simon W A, Zapater M, et al. Gem5-X: A gem5-based system level simulation framework to optimize many-core platforms[C/OL]//Proc of the 2019 Spring Simulation Conf (SpringSim). Piscataway, NJ: IEEE, 2019[2024-01-18]. https://ieeexplore.ieee.org/abstract/document/8732862
|
[30] |
Arashloo M T, Lavrov A, Ghobadi M, et al. Enabling programmable transport protocols in high-speed NICs[C]//Proc of the 17th USENIX Symp on Networked Systems Design and Implementation (NSDI’20). Berkeley, CA: USENIX Association, 2020: 93−109
|
[31] |
Wagner J, Leupers R. A fast simulator and debugger for a network processor[C/OL]//Proc of Embedded Intelligence Conf. 2002[2024-03-21]. https://www.researchgate.net/publication/228724737_A_fast_simulator_and_debugger_for_a_network_processor
|
[32] |
Koohi M, Bayadi H, Khaless M N. A simulation environment for network processor based on simultaneous multi thread architecture[J]. Indian Journal of Science and Technology, 2012, 5(10): 1−6
|
[33] |
Bosshart P, Daly D, Gibb G, et al. P4: Programming protocol-independent packet processors[J]. ACM SIGCOMM Computer Communication Review, 2014, 44(3): 87−95 doi: 10.1145/2656877.2656890
|
[34] |
Li Hejing, Li Jialin, Kaufmann A. SimBricks: End-to-end network system evaluation with modular simulation[C]//Proc of the ACM SIGCOMM 2022 Conf. New York: ACM, 2022: 380−396
|
[35] |
Netronome. Programmer studio 6.0[EB/OL]. 2016[2024-03-18]. https://d1agld16eywpip.cloudfront.net/media/documents/PB_Programmer_Studio_6.0_rURUo4Y.pdf
|
[36] |
Sokolowski J A, Banks C M. Modeling and Simulation Fundamentals: Theoretical Underpinnings and Practical Domains[M]. Hoboken, NJ: John Wiley & Sons, 2010
|
[37] |
Shah N, Kurt K. Network processors: Origin of species[C]//Proc of the 17th Int Symp on Computer and Information Science (ISCIS XVII). Boca Raton, FL: CRC, 2002: 41−45
|
[38] |
Sun Yifan, Baruah T, Mojumder S A, et al. MGPUSim: Enabling multi-GPU performance modeling and optimization[C]//Proc of the 46th Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2019: 197−209
|
[39] |
Guo Xuan, Mullins R. Accelerate cycle-level full-system simulation of multi-core RISC-V systems with binary translation[J]. arXiv preprint, arXiv: 2005.11357, 2020
|
[40] |
Liu Huan, Qiu Zhiliang, Pan Weitao, et al. HyperParser: A high-performance parser architecture for next generation programmable switch and SmartNIC[C]//Proc of the 5th Asia-Pacific Workshop on Networking (APNet 2021). New York: ACM, 2021: 50−56
|
[1] | Zhang Ziqing, Shi Kan, Xu Shuoxiang, Wang Lianghui, Bao Yungang. Design of SystemVerilog Assertions Hardware Towards Efficient Processor Functional Verification[J]. Journal of Computer Research and Development, 2024, 61(6): 1436-1449. DOI: 10.7544/issn1000-1239.202331003 |
[2] | Zhang Qianlong, Hou Rui, Yang Sibo, Zhao Boyan, Zhang Lixin. The Role of Architecture Simulators in the Process of CPU Design[J]. Journal of Computer Research and Development, 2019, 56(12): 2702-2719. DOI: 10.7544/issn1000-1239.2019.20190044 |
[3] | Ma Jiuyue, Yu Zihao, Bao Yungang, Sun Ninghui. A Programmable Data Plane Design in Computer Architecture[J]. Journal of Computer Research and Development, 2017, 54(1): 123-133. DOI: 10.7544/issn1000-1239.2017.20160102 |
[4] | Zhu Pengfei, Lu Tianyue, Chen Mingyu. A Trace-Driven Simulation of Memory System in Multithread Applications[J]. Journal of Computer Research and Development, 2015, 52(6): 1266-1277. DOI: 10.7544/issn1000-1239.2015.20150160 |
[5] | Liu Yuchen, Wang Jia, Chen Yunji, Jiao Shuai. Survey on Computer System Simulator[J]. Journal of Computer Research and Development, 2015, 52(1): 3-15. DOI: 10.7544/issn1000-1239.2015.20140104 |
[6] | Lü Huiwei, Cheng Yuan, Bai Lu, Chen Mingyu, Fan Dongrui, Sun Ninghui. Parallel Simulation of Many-Core Processor and Many-Core Clusters[J]. Journal of Computer Research and Development, 2013, 50(5): 1110-1117. |
[7] | Qiu Tie, Guo He, Feng Lin, Si Weisheng, Liu Xiaoyan. A New Analysis Model for Task Buffer of Pipeline Simulator Based on Queueing Network[J]. Journal of Computer Research and Development, 2012, 49(1): 103-110. |
[8] | Xia Hui, Jia Zhiping, Zhang Feng, Li Xin, Chen Renhai, Edwin H.-M. Sha. The Research and Application of a Specific Instruction Processor for AES[J]. Journal of Computer Research and Development, 2011, 48(8): 1554-1562. |
[9] | Sun Hongquan and Han Jiqing. Fast Simulation of Immiscible Liquids Interaction[J]. Journal of Computer Research and Development, 2010, 47(11): 1865-1870. |
[10] | Zhang Heng, Shen Haihua. Function Verification of Godson-2 Processor[J]. Journal of Computer Research and Development, 2006, 43(6): 974-979. |