• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Lin Hanyue, Wu Jingya, Lu Wenyan, Zhong Langhui, Yan Guihai. Neptune: A Framework for Generic Network Processor Microarchitecture Modeling and Performance Simulation[J]. Journal of Computer Research and Development, 2025, 62(5): 1091-1107. DOI: 10.7544/issn1000-1239.202440084
Citation: Lin Hanyue, Wu Jingya, Lu Wenyan, Zhong Langhui, Yan Guihai. Neptune: A Framework for Generic Network Processor Microarchitecture Modeling and Performance Simulation[J]. Journal of Computer Research and Development, 2025, 62(5): 1091-1107. DOI: 10.7544/issn1000-1239.202440084

Neptune: A Framework for Generic Network Processor Microarchitecture Modeling and Performance Simulation

Funds: This work was supported by the National Natural Science Foundation of China (62002340, 61872336, 61572470) and the Program of the Youth Innovation Promotion Association, CAS (Y201923).
More Information
  • Author Bio:

    Lin Hanyue: born in 1999. PhD candidate. Member of CCF. His main research interests include domain-specific computer architecture and network computing systems

    Wu Jingya: born in 1994. PhD. Member of CCF. Her main research interests include domain-specific computer architecture and heterogeneous computing system optimization

    Lu Wenyan: born in 1990. PhD, associate professor, master supervisor. Member of CCF. His main research interests include deep learning accelerator, database accelerator, domain-specific computer architecture, and heterogeneous computing system optimization

    Zhong Langhui: born in 1974. PhD, senior engineer. Member of CCF. His main research interests include low latency technology and securities quotation processing

    Yan Guihai: born in 1982. PhD, professor, PhD supervisor. Member of CCF. His main research interests include computer architecture, domain-specific accelerator design, and intelligent chip architecture

  • Received Date: February 01, 2024
  • Revised Date: September 02, 2024
  • Accepted Date: October 15, 2024
  • Available Online: October 21, 2024
  • Network packet processing is a fundamental function of network devices, involving tasks such as packet modification, checksum and Hash computation, mirroring, filtering, and packet metering. As a domain-specific processor, network processor (NP) can provide line-rate performance and programmability for network packet processing. However, due to different design requirement, architecture of NP differs, including single-phase NP and multi-phase NP, posing challenges for NP designers. Existing simulation methods mainly target single NP or single architecture and are not available to explore both of the architectures. We propose Neptune, an analyzing framework for generic network processor microarchitecture modeling and performance simulation. Based on detailed analysis, Neptune adopts multi-phase NP architecture as the hardware model while providing ability to simulate single-phase architecture. Besides, Neptune employs event list mechanism and inter-core queues to support simulation of different data paths and various scheduling strategies in multi-phase NP. Furthermore, Neptune utilizes bulk synchronous parallel graph computing mechanism and takes advantage of both event-driven and time-driven simulation, ensuring accuracy and efficiency. Our experiment shows that Neptune achieves over 95% accuracy in simulating both of the architectures and simulates network processors at a performance of 3.31 MIPS, achieving an order of magnitude improvement over PFPSim. We illustrate the universality and capability of the Neptune simulation framework through three specific cases. Firstly, we evaluate multi-phase and single-phase NP, showing that single-phase NP can achieve up to a 1.167 times performance improvement. Secondly, we optimize the packet parsing module using a programmable pipeline and analyze its performance differences. Finally, we use Neptune to test the performance of the network packet processing engine under different thread counts, providing insights for software and hardware multi-threading optimization.

  • [1]
    Gadre G, Badhe S, Kulkarni K. Network processor—A simplified approach for transport layer offloading on NIC[C]//Proc of the 2016 Int Conf on Advances in Computing, Communications and Informatics (ICACCI). Piscataway, NJ: IEEE, 2016: 2542−2548
    [2]
    Yang Mingran, Baban A, Kugel V, et al. Using trio: Juniper networks’ programmable chipset-for emerging in-network applications[C]//Proc of the ACM SIGCOMM 2022 Conf. New York: ACM, 2022: 633−648
    [3]
    Krude J, Rüth J, Schemmel D, et al. Determination of throughput guarantees for processor-based smartnics[C]//Proc of the 17th Int Conf on Emerging Networking Experiments and Technologies. New York: ACM, 2021: 267−281
    [4]
    赵玉宇,程光,刘旭辉,等. 下一代网络处理器及应用综述[J]. 软件学报,2021,32(2):445−474

    Zhao Yuyu, Cheng Guang, Liu Xuhui, et al. Survey and applications of next generation network processor[J]. Journal of Software, 2021, 32(2): 445−474 (in Chinese)
    [5]
    鄢贵海,卢文岩,李晓维,等. 专用处理器比较分析[J]. 中国科学:信息科学,2022,52(2):358−375 doi: 10.1360/SSI-2021-0274

    Yan Guihai, Lu Wenyan, Li Xiaowei, et al. Comparative study of the domain-specific processors[J]. SCIENTIA SINICA Informationis, 2022, 52(2): 358−375 (in Chinese) doi: 10.1360/SSI-2021-0274
    [6]
    Luo Yan, Yang Jun, Bhuyan L N, et al. NePSim: A network processor simulator with a power evaluation framework[J]. IEEE Micro, 2004, 24(5): 34−44 doi: 10.1109/MM.2004.52
    [7]
    Abdi S, Aftab U, Bailey G, et al. PFPSim: A programmable forwarding plane simulator[C]//Proc of the 2016 Symp on Architectures for Networking and Communications Systems. New York: ACM, 2016: 55−60
    [8]
    Bosshart P, Gibb G, Kim H S, et al. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(4): 99−110 doi: 10.1145/2534169.2486011
    [9]
    Moon Y G, Lee S E, Jamshed M A, et al. AccelTCP: Accelerating network applications with stateful TCP offloading[C]//Proc of the 17th USENIX Symp on Networked Systems Design and Implementation (NSDI’20). Berkeley, CA: USENIX Association, 2020: 77−92
    [10]
    Choi S, Shahbaz M, Prabhakar B, et al. λ-nic: Interactive serverless compute on programmable smartnics[C]//Proc of the 40th Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE, 2020: 67−77
    [11]
    Xi Shaoke, Li Fuliang, Wang Xingwei. FlowValve: Packet scheduling offloaded on NP-based SmartNICs[C]//Proc of the 42nd Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE, 2022: 347−358
    [12]
    Hypolite J, Sonchack J, Hershkop S, et al. DeepMatch: Practical deep packet inspection in the data plane using network processors[C]//Proc of the 16th Int Conf on Emerging Networking Experiments and Technologies. New York: ACM, 2020: 336−350
    [13]
    Cisco. Cisco Silicon One P100 processor data sheet [EB/OL]. (2021-10-25)[2024-01-18]. https://www.cisco.com/c/en/us/solutions/collateral/silicon-one/silicon-one-p100-processor-ds.html
    [14]
    Vlachos K, Orphanoudakis T, Papaeftathiou Y, et al. Design and performance evaluation of a programmable packet processing engine (PPE) suitable for high-speed network processors units[J]. Microprocessors and Microsystems, 2007, 31(3): 188−199 doi: 10.1016/j.micpro.2006.09.001
    [15]
    刘思远,任敏华,谷航平. 基于硬件多线程机制的网络处理器微引擎设计[J]. 微型电脑应用,2022,38(2):106−108

    Liu Siyuan, Ren Minhua, Gu Hangping. Design of network processor micro-engine based on hardware multi-threading mechanism[J]. Microcomputer Application, 2022, 38(2): 106−108 (in Chinese)
    [16]
    Chole S, Fingerhut A, Ma Sha, et al. dRMT: Disaggregated programmable switching[C]//Proc of the 2017 Conf of the ACM Special Interest Group on Data Communication. New York: ACM, 2017: 1−14
    [17]
    Sundar N, Burres B, Li Yadong, et al. 9.4 An in-depth look at the Intel IPU E2000[C]//Proc of the 2023 IEEE Int Solid-State Circuits Conf (ISSCC). Piscataway, NJ: IEEE, 2023: 162−164
    [18]
    Netronome. NFP−4000 theory of operation[EB/OL]. 2018[2024-01-18]. https://d3ncevyc0dfnh8.cloudfront.net/media/documents/WP_NFP4000_TOO.pdf
    [19]
    Yazdinejad A, Parizi R M, Bohlooli A, et al. A high-performance framework for a network programmable packet processor using P4 and FPGA[J]. Journal of Network and Computer Applications, 2020, 156: 102564 doi: 10.1016/j.jnca.2020.102564
    [20]
    李韬,杨惠,厉俊男,等. ChipletNP:基于芯粒的敏捷可定制网络处理器架构[J]. 计算机研究与发展,2024,61(12):2952−2968

    Li Tao, Yang Hui, Li Junnan, et al. ChipletNP: Chiplet-based agile customizable network processor architecture[J]. Journal of Computer Research and Development, 2024, 61(12): 2952−2968
    [21]
    Ahmadi M, Wong S. A performance model for network processor architectures in packet processing system[C]//Proc of the 19th IASTED Int Conf on Parallel and Distributed Computing and Systems. Calgary, AB, Canada: ACTA Press, 2007: 176−181
    [22]
    Keslassy I, Kogan K, Scalosub G, et al. Providing performance guarantees in multipass network processors[J]. IEEE/ACM Transactions on Networking, 2012, 20(6): 1895−1909 doi: 10.1109/TNET.2012.2186979
    [23]
    Zolfaghari H, Mustafa H, Nurmi J. Run-to-completion versus pipelined: The case of 100 Gbps packet parsing[C/OL]//Proc of the 22nd Int Conf on High Performance Switching and Routing (HPSR). Piscataway, NJ: IEEE, 2021[2024-01-18]. https://ieeexplore.ieee.org/abstract/document/9481797
    [24]
    Wehrie K, Gunes M, Gross J. Modeling and Tools for Network Simulation[M]. Berlin: Springer, 2010
    [25]
    Fan Chengze, Bi Jun, Zhou Yu, et al. NS4: A P4-driven network simulator[C]//Proc of the 2017 SIGCOMM Posters and Demos. New York: ACM, 2017: 105−107
    [26]
    Gao Kaihui, Chen Li, Li Dan, et al. Dons: Fast and affordable discrete event network simulation with automatic parallelization[C]//Proc of the ACM SIGCOMM 2023 Conf. New York: ACM, 2023: 167−181
    [27]
    Ahn J H, Li Sheng, Seongil O, et al. McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling[C]//Proc of the 2013 IEEE Int Symp on Performance Analysis of Systems and Software (ISPASS). Piscataway, NJ: IEEE, 2013: 74−85
    [28]
    Ren Pengju, Lis M, Cho M H, et al. HORNET: A cycle-level multicore simulator[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2012, 31(6): 890−903 doi: 10.1109/TCAD.2012.2184760
    [29]
    Qureshi Y M, Simon W A, Zapater M, et al. Gem5-X: A gem5-based system level simulation framework to optimize many-core platforms[C/OL]//Proc of the 2019 Spring Simulation Conf (SpringSim). Piscataway, NJ: IEEE, 2019[2024-01-18]. https://ieeexplore.ieee.org/abstract/document/8732862
    [30]
    Arashloo M T, Lavrov A, Ghobadi M, et al. Enabling programmable transport protocols in high-speed NICs[C]//Proc of the 17th USENIX Symp on Networked Systems Design and Implementation (NSDI’20). Berkeley, CA: USENIX Association, 2020: 93−109
    [31]
    Wagner J, Leupers R. A fast simulator and debugger for a network processor[C/OL]//Proc of Embedded Intelligence Conf. 2002[2024-03-21]. https://www.researchgate.net/publication/228724737_A_fast_simulator_and_debugger_for_a_network_processor
    [32]
    Koohi M, Bayadi H, Khaless M N. A simulation environment for network processor based on simultaneous multi thread architecture[J]. Indian Journal of Science and Technology, 2012, 5(10): 1−6
    [33]
    Bosshart P, Daly D, Gibb G, et al. P4: Programming protocol-independent packet processors[J]. ACM SIGCOMM Computer Communication Review, 2014, 44(3): 87−95 doi: 10.1145/2656877.2656890
    [34]
    Li Hejing, Li Jialin, Kaufmann A. SimBricks: End-to-end network system evaluation with modular simulation[C]//Proc of the ACM SIGCOMM 2022 Conf. New York: ACM, 2022: 380−396
    [35]
    Netronome. Programmer studio 6.0[EB/OL]. 2016[2024-03-18]. https://d1agld16eywpip.cloudfront.net/media/documents/PB_Programmer_Studio_6.0_rURUo4Y.pdf
    [36]
    Sokolowski J A, Banks C M. Modeling and Simulation Fundamentals: Theoretical Underpinnings and Practical Domains[M]. Hoboken, NJ: John Wiley & Sons, 2010
    [37]
    Shah N, Kurt K. Network processors: Origin of species[C]//Proc of the 17th Int Symp on Computer and Information Science (ISCIS XVII). Boca Raton, FL: CRC, 2002: 41−45
    [38]
    Sun Yifan, Baruah T, Mojumder S A, et al. MGPUSim: Enabling multi-GPU performance modeling and optimization[C]//Proc of the 46th Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2019: 197−209
    [39]
    Guo Xuan, Mullins R. Accelerate cycle-level full-system simulation of multi-core RISC-V systems with binary translation[J]. arXiv preprint, arXiv: 2005.11357, 2020
    [40]
    Liu Huan, Qiu Zhiliang, Pan Weitao, et al. HyperParser: A high-performance parser architecture for next generation programmable switch and SmartNIC[C]//Proc of the 5th Asia-Pacific Workshop on Networking (APNet 2021). New York: ACM, 2021: 50−56
  • Related Articles

    [1]Hong Zhen, Feng Wanglei, Wen Zhenyu, Wu Di, Li Taotao, Wu Yiming, Wang Cong, Ji Shouling. Detecting Free-Riding Attack in Federated Learning Based on Gradient Backtracking[J]. Journal of Computer Research and Development, 2024, 61(9): 2185-2198. DOI: 10.7544/issn1000-1239.202330886
    [2]Shu Chang, Li Qingshan, Wang Lu, Wang Ziqi, Ji Yajiang. A Networked Software Optimization Mechanism Based on Gradient-Play[J]. Journal of Computer Research and Development, 2022, 59(9): 1902-1913. DOI: 10.7544/issn1000-1239.20220016
    [3]Dong Ye, Hou Wei, Chen Xiaojun, Zeng Shuai. Efficient and Secure Federated Learning Based on Secret Sharing and Gradients Selection[J]. Journal of Computer Research and Development, 2020, 57(10): 2241-2250. DOI: 10.7544/issn1000-1239.2020.20200463
    [4]Sun Jian, Li Zhanhuai, Li Qiang, Zhang Xiao, Zhao Xiaonan. SSD Power Modeling Method Based on the Gradient of Energy Consumption[J]. Journal of Computer Research and Development, 2019, 56(8): 1772-1782. DOI: 10.7544/issn1000-1239.2019.20170694
    [5]Li Shengdong, Lü Xueqiang. Static Restart Stochastic Gradient Descent Algorithm Based on Image Question Answering[J]. Journal of Computer Research and Development, 2019, 56(5): 1092-1100. DOI: 10.7544/issn1000-1239.2019.20180472
    [6]Chen Yao, Zhao Yonghua, Zhao Wei, Zhao Lian. GPU-Accelerated Incomplete Cholesky Factorization Preconditioned Conjugate Gradient Method[J]. Journal of Computer Research and Development, 2015, 52(4): 843-850. DOI: 10.7544/issn1000-1239.2015.20131919
    [7]Shen Yan, Zhu Yuquan, Liu Chunhua. Incremental FP_GROWTH Algorithm Based on Disk-resident 1-itemsets Counting[J]. Journal of Computer Research and Development, 2015, 52(3): 569-578. DOI: 10.7544/issn1000-1239.2015.20131436
    [8]Li Zhidan, He Hongjie, Yin Zhongke, Chen Fan. A Sparsity Image Inpainting Algorithm Combining Color with Gradient Information[J]. Journal of Computer Research and Development, 2014, 51(9): 2081-2093. DOI: 10.7544/issn1000-1239.2014.20130071
    [9]Mei Yuan, Sun Huaijiang, and Xia Deshen. A Gradient-Based Robust Method for Estimation of Fingerprint Orientation Field[J]. Journal of Computer Research and Development, 2007, 44(6): 1022-1031.
    [10]Zhao Qianjin, Hu Min, Tan Jieqing. Adaptive Many-Knot Splines Image Interpolation Based on Local Gradient Features[J]. Journal of Computer Research and Development, 2006, 43(9): 1537-1542.

Catalog

    Article views PDF downloads Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return