Processing math: 100%
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向深度神经网络大规模分布式数据并行训练的MC2能耗模型

魏嘉, 张兴军, 王龙翔, 赵明强, 董小社

魏嘉, 张兴军, 王龙翔, 赵明强, 董小社. 面向深度神经网络大规模分布式数据并行训练的MC2能耗模型[J]. 计算机研究与发展, 2024, 61(12): 2985-3004. DOI: 10.7544/issn1000-1239.202330164
引用本文: 魏嘉, 张兴军, 王龙翔, 赵明强, 董小社. 面向深度神经网络大规模分布式数据并行训练的MC2能耗模型[J]. 计算机研究与发展, 2024, 61(12): 2985-3004. DOI: 10.7544/issn1000-1239.202330164
Wei Jia, Zhang Xingjun, Wang Longxiang, Zhao Mingqiang, Dong Xiaoshe. MC2 Energy Consumption Model for Massively Distributed Data Parallel Training of Deep Neural Network[J]. Journal of Computer Research and Development, 2024, 61(12): 2985-3004. DOI: 10.7544/issn1000-1239.202330164
Citation: Wei Jia, Zhang Xingjun, Wang Longxiang, Zhao Mingqiang, Dong Xiaoshe. MC2 Energy Consumption Model for Massively Distributed Data Parallel Training of Deep Neural Network[J]. Journal of Computer Research and Development, 2024, 61(12): 2985-3004. DOI: 10.7544/issn1000-1239.202330164
魏嘉, 张兴军, 王龙翔, 赵明强, 董小社. 面向深度神经网络大规模分布式数据并行训练的MC2能耗模型[J]. 计算机研究与发展, 2024, 61(12): 2985-3004. CSTR: 32373.14.issn1000-1239.202330164
引用本文: 魏嘉, 张兴军, 王龙翔, 赵明强, 董小社. 面向深度神经网络大规模分布式数据并行训练的MC2能耗模型[J]. 计算机研究与发展, 2024, 61(12): 2985-3004. CSTR: 32373.14.issn1000-1239.202330164
Wei Jia, Zhang Xingjun, Wang Longxiang, Zhao Mingqiang, Dong Xiaoshe. MC2 Energy Consumption Model for Massively Distributed Data Parallel Training of Deep Neural Network[J]. Journal of Computer Research and Development, 2024, 61(12): 2985-3004. CSTR: 32373.14.issn1000-1239.202330164
Citation: Wei Jia, Zhang Xingjun, Wang Longxiang, Zhao Mingqiang, Dong Xiaoshe. MC2 Energy Consumption Model for Massively Distributed Data Parallel Training of Deep Neural Network[J]. Journal of Computer Research and Development, 2024, 61(12): 2985-3004. CSTR: 32373.14.issn1000-1239.202330164

面向深度神经网络大规模分布式数据并行训练的MC2能耗模型

基金项目: 国家自然科学基金项目(62172327)
详细信息
    作者简介:

    魏嘉: 1997年生. 博士研究生. 主要研究方向为计算机体系结构、高性能计算、深度学习

    张兴军: 1969年生. 博士,教授,博士生导师,CCF高级会员. 主要研究方向为计算机体系结构、高性能计算、大数据存储系统、计算机网络

    王龙翔: 1988年生. 博士. CCF会员. 主要研究方向为数据去重、大数据存储系统、人工智能

    赵明强: 1998年生. 硕士研究生. 主要研究方向为作业调度、高性能计算、强化学习

    董小社: 1963年生. 博士,教授,博士生导师,CCF会员. 主要研究方向为高性能计算系统、存储系统、云计算

    通讯作者:

    张兴军(xjzhang@xjtu.edu.cn

  • 中图分类号: TP302

MC2 Energy Consumption Model for Massively Distributed Data Parallel Training of Deep Neural Network

Funds: This work was supported by the National Natural Science Foundation of China (62172327).
More Information
    Author Bio:

    Wei Jia: born in 1997. PhD candidate. His research interests include computer architecture, high performance computing, and deep learning

    Zhang Xingjun: born in 1969. PhD, professor, PhD, supervisor. Senior member of CCF. His main research interests include computer architect­ure, high performance computing, big data storage system, and computer networks

    Wang Longxiang: born in 1988. PhD. Member of CCF. His main research interests include data deduplication, big data storage system, and artificial intelligence

    Zhao Mingqiang: born in 1998. Master candidate. His main research interests include job scheduling, high performance computing, and reinforcement learning

    Dong Xiaoshe: born in 1963. PhD, professor, PhD supervisor. Member of CCF. His main research interests include high performance computer system, storage system, and cloud computing

  • 摘要:

    深度神经网络(deep neural network,DNN)在许多现代人工智能(artificial intelligence,AI)任务中取得了最高的精度. 近年来,使用高性能计算平台进行大规模分布式并行训练DNN越来越普遍. 能耗模型在设计和优化DNN大规模并行训练和抑制高性能计算平台过量能耗方面起着至关重要的作用. 目前,大部分的能耗模型都是从设备的角度出发对单个设备或多个设备构成的集群进行能耗建模,由于缺乏从能耗角度对分布式并行DNN应用进行分解剖析,导致罕有针对分布式DNN应用特征进行建模的能耗模型. 针对目前最常用的DNN分布式数据并行训练模式,从DNN模型训练本质特征角度出发,提出了“数据预处理(materials preprocessing)–前向与反向传播(computing)–梯度同步与更新(communicating)”三阶段MC2能耗模型,并通过在国产E级原型机天河三号上使用最多128个MT节点和32个FT节点训练经典的VGG16和ResNet50网络以及最新的Vision Transformer网络验证了模型的有效性和可靠性. 实验结果表明,MC2与真实能耗测量结果相差仅为2.84%,相较4种线性比例能耗模型以及AR,SES,ARIMA时间预测模型准确率分别提升了69.12个百分点,69.50个百分点,34.58个百分点,13.47个百分点,5.23个百分点,22.13个百分点,10.53个百分点. 通过使用的模型可以在超算平台得到DNN模型的各阶段能耗和总体能耗结果,为评估基于能耗感知的DNN大规模分布式数据并行训练及推理各阶段任务调度、作业放置、模型分割、模型裁剪等优化策略的效能提供了基础.

    Abstract:

    Deep neural network (DNN) have achieved state-of-the-art accuracy in many modern artificial intelligence (AI) tasks. In recent years, it has become increasingly popular to use high performance computing platforms for massively distributed parallel training of DNN. Energy consumption models have been crucial in designing and optimizing DNN for massively parallel training and restraining excessive energy consumption on HPC (high performance computing) platforms. Currently, most energy consumption models model the energy consumption of a single device or a cluster of multiple devices from a hardware perspective. From an energy consumption perspective, the need for disaggregated analysis of distributed parallel DNN applications has resulted in a dearth of energy consumption models that model the characteristics of distributed DNN applications. In this paper, we propose the “materials preprocessing-computing-communicating” three-stage MC2 model from the perspective of the essential features of DNN model training for the most commonly used DNN distributed data parallel training model. The model is validated by training the classical VGG16, ResNet50 networks and the latest Vision Transformer network using up to 128 MT nodes and 32 FT nodes on the domestic E-class prototype Tianhe-3. The experimental results show that the difference between MC2 and the actual energy measurements is only 2.84%. Compared with the four linear proportional energy models and the AR, SES, and ARIMA time prediction models, the accuracy of the model proposed is improved by 69.12%, 69.50%, 34.58%, 13.47%, 5.23%, 22.13%, and 10.53%, respectively. By using the models proposed in this paper, the energy consumption of DNN models at each stage and the overall energy consumption can be obtained on a supercomputer platform, which provides a basis for evaluating the efficiency of DNN energy-aware massively distributed parallel training and inference, as well as optimizing the strategies of task scheduling, job scheduling, model partitioning, and model pruning.

  • 图  1   高性能计算平台系统结构

    Figure  1.   System structure of HPC platform

    图  2   深度神经网络模型数据并行训练示意图

    Figure  2.   Illustration of data parallel training of DNN model

    图  3   MC2能耗模型结构

    Figure  3.   Structure of MC2 energy consumption model

    图  4   VGG16神经网络模型结构[55]

    Figure  4.   Structure of VGG16 neural network model[55]

    图  5   ResNet中的构建块[56]

    Figure  5.   Building blocks in ResNet[56]

    图  6   ResNet34与ResNet50结构[56]

    Figure  6.   Structures of ResNet34 and ResNet50[56]

    图  7   ViT模型原理图[54]

    Figure  7.   Principle diagram of ViT model[54]

    图  8   MTP 128计算、通信和数据读取时间

    Figure  8.   Time for MTP 128 calculation, communication, and data I/O

    图  9   训练过程中CPU频率随时间变化

    Figure  9.   CPU frequencies change with time during training

    表  1   相关工作中的关键变量

    Table  1   Key Variables in Related Work

    符号 含义
    W 神经网络权值
    λlr 神经网络学习率
    n 批大小
    E 能耗
    P 功率
    T 时间
    t 时刻
    Ptotal 总功率
    Pdynamic 动态功率
    Pstatic 静态功率
    A 每个时钟周期内的开关数
    V 电压
    C 电容
    Istatic 漏电电流
    Td 温度
    Ki 技术常数
    αi 某个事件发生的次数
    Ci 某个事件所花费的能耗
    m 处理器核数
    ρ 处理器核利用率
    Ps 速度为s时的处理器功率
    λ 任务到达率
    R 每个任务平均执行指令数目
    αai 活动因子
    fmaxi 处理器组件最大频率
    Pidlesm 空闲流处理组件功率
    EDRAMtoSRAM 从DRAM读写数据到SRAM的能耗
    ESRAM SRAM实现激活和权值缓冲区的能耗
    Erest TPU其余组件能耗
    PDRAM 动态随机存储器(DRAM)功率
    μwrite 写吞吐量
    μread 读吞吐量
    Eicache 指令高速缓存能耗
    Edcache 数据高速缓存能耗
    Ebuses 高速缓存指令开销
    Epads 主存与外设间总线能耗
    Emem 内存能耗
    α(A) 算法A激活阶段指令周期数
    R(A) 算法A激活阶段指令读次数
    W(A) 算法A激活阶段指令写次数
    Pcke 内存静态功率
    Pstby 激活存储组件和等待指令功率
    Eact 激活时能耗
    Prdwr 内存读写功率
    Pser 整个服务器的功率
    Pcpu 服务器中CPU功率
    Ccpu 每台服务器CPU的数量
    Cmem 每个服务器中内存DIMMs的数量
    Cd 每个服务器磁盘数量
    Pd 每个磁盘的功率
    (Pi,Ti) MapReduce初始化阶段
    (Pm,Tm) MapReduce映射阶段
    (Pshu,Tshu) MapReduce洗牌阶段
    (Pr,Tr) MapReduce规约阶段
    Elayer 每一层的能耗
    Ecomp 计算开销
    Edata 数据输入、数据输出以及卷积核的访存开销
    下载: 导出CSV

    表  2   天河三号原型机基本信息

    Table  2   Basic Information of Tianhe-3 Prototype

    类别 属性 FT-2000+ MT-2000+
    硬件节点数128512
    单节点核数3232
    频率/GHz2.42.0
    内存容量/GB6416
    网络互联带宽/Gbps200
    软件操作系统kylin 4.0-1a OS with kernel v4.4.0
    文件系统lustre
    MPI版本mpich v3.2.1
    编译器gcc v4.9.1/v4.9.3
    Boost,BLAS,openBLAS,scalapack等
    下载: 导出CSV

    表  3   各组件功耗

    Table  3   Power Consumption of Each Component

    组件类型 功率/W
    MT-Pca 60
    MT-Pci 15
    MT-Pma 75
    MT-Pmi 18.75
    FT-Pca 50
    FT-Pci 12.5
    FT-Pma 46
    FT-Pmi 11.5
    注:PcaPci分别表示处理器正常工作和空闲时的开销;PmaPmi分别表示内存设备正常工作和空闲时的功率.
    下载: 导出CSV

    表  4   VGG16-MT128各阶段训练时间

    Table  4   Training Time of Each Stage in VGG16-MT128

    类型 训练时间/s
    Tsetdataset 4.5603
    Tloaddata 0.1891
    Tpreprocess 0.7302
    Tinoutmodel 0.5636
    Tf 21.2910
    Tb 28.6782
    Tcommunication 2.1815
    Tupdate 0.2425
    Tadd 1.8468
    下载: 导出CSV

    表  5   ViT-MT128各阶段训练时间

    Table  5   Training Time of Each Stage in ViT-MT128

    类型 训练时间/s
    Tsetdataset 13.6556
    Tloaddata 1.0237
    Tpreprocess 1.8221
    Tinoutmodel 0.9213
    Tf 140.0465
    Tb 323.5359
    Tcommunication 21.2551
    Tupdate 16.7213
    Tadd 20.0640
    下载: 导出CSV

    表  6   VGG16-FT32各阶段训练时间

    Table  6   Training Time of Each Stage in VGG16-FT32

    类型 训练时间/s
    Tsetdataset 3.2940
    Tloaddata 0.1932
    Tpreprocess 0.4508
    Tinoutmodel 0.2546
    Tf 76.4447
    Tb 100.1702
    Tcommunication 6.3135
    Tupdate 0.6579
    Tadd 4.7971
    下载: 导出CSV

    表  7   VGG MT128训练中的能耗

    Table  7   Energy Consumption in VGG MT128 Training

    阶段 能耗/(kW·h) 总能耗/(kW·h)
    MT128-数据预处理0.02050.2743
    MT128-前向与反向传播0.2399
    MT128-梯度同步与更新0.0139
    FT32-数据预处理0.00230.1600
    FT32-前向与反向传播0.1507
    FT32-梯度同步与更新0.0070
    下载: 导出CSV

    表  8   ViT MT128训练中的能耗

    Table  8   Energy Consumption in ViT MT128 Training

    类别 能耗/(kW·h)
    Esetdataset0.065546674
    Eloaddata0.003275884
    Epreprocess0.00874608
    Einoutmodel0.00294816
    Ef0.672223157
    Eb1.552972412
    Ecommunication0.10202448
    Eupdate0.080256
    Etotal2.5120
    下载: 导出CSV

    表  9   MT128 ResNet50各阶段训练时间

    Table  9   Training Time of Each Stage in MT128 ResNet50

    阶段 训练时间/min
    Tsetdataset 40.1352
    Tloaddata 6.0920
    Tpreprocess 4.3322
    Tinoutmodel 2.2662
    Tf 115.5126
    Tb 157.6198
    Tcommunication 13.0800
    Tupdate 4.9034
    Tadd 17.9000
    下载: 导出CSV

    表  10   ResNet50训练能耗

    Table  10   Energy Consumption in ResNet50 Training

    阶段 能耗/(kW·h)
    MT128-数据预处理 6.28
    MT128-前向与反向传播 64.69
    MT128-梯度同步与更新 6.47
    MT128-总能耗 77.44
    下载: 导出CSV

    表  11   能耗模型对比

    Table  11   Comparison of Energy Consumption Model

    类别 速度 成本 部署难易程度 准确性
    测量方法 困难
    CPU估算 简单
    MC2 中等 较高
    下载: 导出CSV
  • [1]

    Orhan A E. Robustness properties of Facebook’s ResNeXt WSL models[J]. arXiv preprint, arXiv: 1907.07640, 2019

    [2]

    Gao Yongqiang, Guan Haibing, Qi Zhengwei, et al. Quality of service aware power management for virtualized data centers[J]. Journal of Systems Architecture, 2013, 59(4/5): 245−259

    [3]

    Bilal K, Malik S U R, Khan S U, et al. Trends and challenges in cloud datacenters[J]. IEEE Cloud Computing, 2014, 1(1): 10−20 doi: 10.1109/MCC.2014.26

    [4]

    Whitehead B, Andrews D, Shah A, et al. Assessing the environmental impact of data centres part 1: Background, energy use and metrics[J]. Building and Environment, 2014, 82(4): 151−159

    [5]

    Rivoire S M. Designing Energy-Efficient Computer Systems[M]//Models and Metrics for Energy-Efficient Computer Systems. Ann Arbor, MI: ProQuest, 2008: 29−65

    [6]

    vor dem Berge M, Da Costa G, Kopecki A, et al. Modeling and simulation of data center energy-efficiency in coolemall[C]//Proc of the 1st Int Workshop on Energy Efficient Data Centers. Berlin: Springer, 2012: 25−36

    [7]

    Floratou A, Bertsch F, Patel J M, et al. Towards building wind tunnels for data center design[J]. Proceedings of the VLDB Endowment, 2014, 7(9): 781−784 doi: 10.14778/2732939.2732950

    [8]

    Dean J, Corrado G S, Monga R, et al. Large scale distributed deep networks[C]//Proc of the 26th Neural Information and Processing System. Cambridge, MA: MIT, 2012: 1223−1231

    [9]

    Raina R, Madhavan A, Ng A Y. Large-scale deep unsupervised learning using graphics processors[C]//Proc of the 26th Int Conf on Machine Learning. New York: ACM, 2009: 873−88

    [10]

    Li Teng, Dou Yong, Jiang Jingfei, et al. Optimized deep belief networks on CUDA GPUs[C]//Proc of the 27th Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2015: 1688−1696

    [11]

    Bottleson J, Kim S Y, Andrews J, et al. clCaffe: OpenCL accelerated Caffe for convolutional neural networks[C]//Proc of the 30th IEEE Int Parallel and Distributed Processing Symp Workshops. Piscataway, NJ: IEEE, 2016: 50−57

    [12]

    Kaler T, Stathas N, Ouyang A, et al. Accelerating training and inference of graph neural networks with fast sampling and pipelining[C]//Proc of the 5th Machine Learning and Systems. New York: ACM, 2022: 172−189

    [13]

    Tan Sijun, Knott B, Tian Yuan, et al. CryptGPU: Fast privacy-preserving machine learning on the GPU[C]//Proc of the 42nd IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2021: 1021−1038

    [14]

    Viebke A, Memeti S, Pllana S, et al. CHAOS: A parallelization scheme for training convolutional neural networks on Intel Xeon Phi[J]. The Journal of Supercomputing, 2019, 75(1): 197−227 doi: 10.1007/s11227-017-1994-x

    [15]

    Liu Junjie, Wang Haixia, Wang Dongsheng, et al. Parallelizing convolutional neural networks on Intel many integrated core architecture[C]//Proc of the 28th Int Conf on Architecture of Computing Systems. Berlin: Springer, 2015: 71−82

    [16]

    Zlateski A, Lee K, Seung H S. Scalable training of 3D convolutional networks on multi-and many-cores[J]. Journal of Parallel and Distributed Computing, 2017, 106(7): 195−204

    [17]

    Suda N, Chandra V, Dasika G, et al. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks[C]//Proc of the 24th ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2016: 16−25

    [18]

    Zhang Jialiang, Li Jing. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network[C]//Proc of the 25th ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2017: 25−34

    [19]

    Aydonat U, O’Connell S, Capalija D, et al. An OpenCL™ deep learning accelerator on Arria 10[C]//Proc of the 25th ACM/SIGDA Int Symp on Field-Programmable Gate Arrays. New York: ACM, 2017: 55−64

    [20] 朱传家,刘鑫,方佳瑞. 基于“神威太湖之光”的Caffe分布式扩展研究[J]. 计算机应用与软件,2020,37(1):15−20

    Zhu Chuanjia, Liu Xin, Fang Jiarui. Research on distributed extension of Caffe based on “Light of Taihu Lake” in Shenwei[J]. Computer Applications and Software, 2020, 37(1): 15−20 (in Chinese)

    [21] 魏嘉,张兴军,纪泽宇,等. 天河三号原型机分布式并行深度神经网络性能评测及调优[J]. 计算机工程与科学,2021,43(5):782−791 doi: 10.3969/j.issn.1007-130X.2021.05.003

    Wei Jia, Zhang Xingjun, Ji Zeyu, et al. Performance evaluation and optimization of distributed parallel deep neural networks on the Tianhe-3 prototype[J]. Computer Engineering and Science, 2021, 43(5): 782−791 doi: 10.3969/j.issn.1007-130X.2021.05.003

    [22]

    Ji Shihao, Satish N, Li Sheng, et al. Parallelizing Word2Vec in shared and distributed memory[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(9): 2090−2100 doi: 10.1109/TPDS.2019.2904058

    [23]

    Das D, Avancha S, Mudigere D, et al. Distributed deep learning using synchronous stochastic gradient descent[J]. arXiv preprint, arXiv: 1602.06709, 2016

    [24]

    Roy P, Song S L, Krishnamoorthy S, et al. Numa-caffe: Numa-aware deep learning neural networks[J]. ACM Transactions on Architecture and Code Optimization, 2018, 15(2): 1−26

    [25]

    Mittal S, Rajput P, Subramoney S. A survey of deep learning on CPUs: Opportunities and co-optimizations[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(10): 5095−5115

    [26]

    Awan A A, Hamidouche K, Hashmi J M, et al. S-Caffe: Co-designing MPI runtimes and Caffe for scalable deep learning on modern GPU clusters//Proc of the 22nd ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2017: 193−205

    [27]

    Yin J, Gahlot S, Laanait N, et al. Strategies to deploy and scale deep learning on the summit supercomputer[C]//Proc of the 3rd IEEE/ACM Workshop on Deep Learning on Supercomputers. Piscataway, NJ: IEEE, 2019: 84−94

    [28]

    Duan Qingyang, Wang Zeqin, Xu Yuedong, et al. Mercury: A simple transport layer scheduler to accelerate distributed DNN training[C]//Proc of the 41st IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2022: 350−359

    [29]

    Huang Yanping, Cheng Youlong, Bapna A, et al. Gpipe: Efficient training of giant neural networks using pipeline parallelism[J]. Advances in Neural Information Processing Systems, 2019, 32(1): 103−112

    [30]

    Narayanan D, Harlap A, Phanishayee A, et al. PipeDream: Generalized pipeline parallelism for DNN training[C]//Proc of the 27th ACM Symp on Operating Systems Principles. New York: ACM, 2019: 1−15

    [31]

    Rajbhandari S, Ruwase O, Rasley J, et al. Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning[C]//Proc of the 34th Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2021: 826−840

    [32]

    Dayarathna M, Wen Yonggang, Fan Rui. Data center energy consumption modeling: A survey[J]. IEEE Communications Surveys & Tutorials, 2015, 18(1): 732−794

    [33]

    Ge Rong, Feng Xizhou, Cameron K W. Performance-constrained distributed dvs scheduling for scientific applications on power-aware clusters[C]//Proc of the 18th ACM/IEEE Conf on Supercomputing. Piscataway, NJ: IEEE, 2005: 34−34

    [34]

    Yeo S, Lee H H S. Peeling the Power Onion of Data Centers[M]//Energy Efficient Thermal Management of Data Centers. Berlin: Springer, 2012: 137−168

    [35]

    Gao Yongqiang, Guan Haibing, Qi Zhengwei, et al. Quality of service aware power management for virtualized data centers[J]. Journal of Systems Architecture, 2013, 59(4): 245−259

    [36]

    Shin D, Kim J, Chang N, et al. Energy-optimal dynamic thermal management for green computing[C]//Proc of the 22nd IEEE/ACM Int Conf on Computer-Aided Design-Digest of Technical Papers. Piscataway, NJ: IEEE, 2009: 652−657

    [37]

    Merkel A, Bellosa F. Balancing power consumption in multiprocessor systems[J]. ACM SIGOPS Operating Systems Review, 2006, 40(4): 403−414 doi: 10.1145/1218063.1217974

    [38]

    Bertran R, Becerra Y, Carrera D, et al. Accurate energy accounting for shared virtualized environments using pmc-based power modeling techniques[C/OL]//Proc of the 11th IEEE/ACM Int Conf on Grid Computing. Piscataway, NJ: IEEE, 2010[2023-07-20].https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5697889

    [39]

    Li Hui, Casale G, Ellahi T. SLA-driven planning and optimization of enterprise applications[C]//Proc of the 1st Joint WOSP/SIPEW Int Conf on Performance Engineering. New York: ACM, 2010: 117−128

    [40]

    Li Keqin. Optimal configuration of a multicore server processor for managing the power and performance tradeoff[J]. The Journal of Supercomputing, 2012, 61(1): 189−214 doi: 10.1007/s11227-011-0686-1

    [41]

    Kim S, Roy I, Talwar V. Evaluating integrated graphics processors for data center workloads[C]//Proc of the 46th Workshop on Power-Aware Computing and Systems. New York: ACM, 2013: 41−45

    [42]

    Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit[C]//Proc of the 44th Annual Int Symp on Computer Architecture. New York: ACM, 2017: 1−12

    [43]

    Zhang Boyu, Davoodi A, Hu Yuhen. Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2018, 8(4): 836−848 doi: 10.1109/JETCAS.2018.2833383

    [44]

    Ellison B, Minas L. The problem of power consumption in servers[J]. Energy Efficiency for Information Technology, 2009, 41(2): 1−17

    [45]

    Giridhar B, Cieslak M, Duggal D, et al. Exploring DRAM organizations for energy-efficient and resilient exascale memories[C]//Proc of the 26th Int Conf on High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2013: 277−289

    [46]

    Lin Jiang, Zheng Hongzhong, Zhu Zhichun, et al. Thermal modeling and management of DRAM memory systems[C]//Proc of the 34th Annual Int Symp on Computer Architecture. New York: ACM, 2007: 312−322

    [47]

    Vijaykrishnan N, Kandemir M, Irwin M J, et al. Energy-driven integrated hardware-software optimizations using SimplePower[J]. ACM SIGARCH Computer Architecture News, 2000, 28(2): 95−106 doi: 10.1145/342001.339659

    [48]

    Shiue W T, Chakrabarti C. Memory exploration for low power, embedded systems[C]//Proc of the 36th Design Automation Conf. Piscataway, NJ: IEEE, 1999: 140−145

    [49]

    Roy S, Rudra A, Verma A. An energy complexity model for algorithms[C]//Proc of the 4th Conf on Innovations in Theoretical Computer Science. New York: ACM, 2013: 283−304

    [50]

    Poess M, Othayoth Nambiar R. A power consumption analysis of decision support systems[C]//Proc of the 1st Joint WOSP/SIPEW Int Conf on Performance Engineering. New York: ACM, 2010: 147−152

    [51]

    Feng Boliang, Lu Jiaheng, Zhou Yongluan, et al. Energy efficiency for MapReduce workloads: An in-depth study[C]//Proc of the 33rd Australasian Database Conf. Canberra, Australian: ACS, 2012: 61−70

    [52]

    Yang T, Chen Y, Sze V. Designing energy-efficient convolutional neural networks using energy-aware pruning[C]//Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 5687−5695

    [53]

    Wang Ruibo, Lu Kai, Chen Juan, et al. Brief introduction of TianHe exascale prototype system[J]. Tsinghua Science and Technology, 2020, 26(3): 361−369

    [54]

    Wei Jia, Zhang Xingjun, Ji Zeyu, et al. Deploying and scaling distributed parallel deep neural networks on the Tianhe-3 prototype system[J]. Scientific Reports, 2021, 11(1): 1−14

    [55]

    Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint, arXiv:1409.1556, 2014

    [56]

    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]//Proc of the 29th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778

    [57]

    Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint, arXiv: 2010.11929, 2021

    [58]

    Marcel S, Rodriguez Y. Torchvision the machine-vision package of torch[C]//Proc of the 18th ACM Int Conf on Multimedia. New York: ACM, 2010: 1485−1488

    [59]

    Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(1): 1097−1105

    [60]

    Akaike H. Autoregressive model fitting for control[M]//Selected Papers of Hirotugu Akaike. Berlin: Springer, 1998: 153−170

    [61]

    Gardner Jr E S. Exponential smoothing: The state of the art[J]. Journal of Forecasting, 1985, 4(1): 1−28

    [62]

    Box G E P, Pierce D A. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models[J]. Journal of the American statistical Association, 1970, 332(65): 1509−1526

    [63]

    Hanusz Z, Tarasinska J, Zielinski W. Shapiro-Wilk test with known mean[J]. REVSTAT-Statistical Journal, 2016, 14(1): 89−100

    [64]

    Henning J L. SPEC CPU2000: Measuring CPU performance in the new millennium[J]. Computer, 2000, 33(7): 28−35

    [65]

    Nishtala R, Petrucci V, Carpenter P, et al. Twig: Multi-agent task management for colocated latency-critical cloud services[C]//Proc of the 26th IEEE Int Symp on High Performance Computer Architecture. Piscataway, NJ: IEEE, 2020: 167−179

    [66]

    Jahanshahi A, Yu Nanpeng, Wong D. PowerMorph: QoS-aware server power reshaping for data center gegulation service[J]. ACM Transactions on Architecture and Code Optimization, 2022, 19(3): 1−27

    [67]

    Zhao Laiping, Yang Yanan, Zhang Kaixuan, et al. Rhythm: Component-distinguishable workload deployment in datacenters[C]//Proc of the 15th European Conf on Computer Systems. New York: ACM, 2020: 153−170

  • 期刊类型引用(68)

    1. 陈泽明,方序鸿,李家叶,汪孟尧,陈爱芳,尹玲. 机器学习模型在城市内涝模拟预报中的应用综述. 人民珠江. 2025(01): 9-22 . 百度学术
    2. 邱凯乐. 图信号处理综述. 物联网技术. 2025(05): 111-113 . 百度学术
    3. 练培格,李英冰,刘波,冯晓珂. 基于多元时间序列动态图神经网络的交通速度预测. 地球信息科学学报. 2025(03): 636-652 . 百度学术
    4. 许明家,孙龙,李爽,鲁程鹏. 基于图神经网络的地下水位动态模拟模型. 水文. 2025(01): 30-36 . 百度学术
    5. 胡书林,张华军,邓小涛,王征华. 结合依存图卷积的中文文本相似度计算研究. 计算机工程. 2025(03): 76-85 . 百度学术
    6. 朱海,谭文安,郭凯. 基于图卷积的云制造服务编码算法. 河南科技大学学报(自然科学版). 2024(01): 43-50+7 . 百度学术
    7. 肖国庆,李雪琪,陈玥丹,唐卓,姜文君,李肯立. 大规模图神经网络研究综述. 计算机学报. 2024(01): 148-171 . 百度学术
    8. 林晶晶,冶忠林,赵海兴,李卓然. 超图神经网络综述. 计算机研究与发展. 2024(02): 362-384 . 本站查看
    9. 谢楠. 内容个性化推荐优化探索. 数字通信世界. 2024(01): 70-72 . 百度学术
    10. 柳德云,李莹,周震,吉根林. 基于时空依赖关系和特征融合的弱监督视频异常检测. 数据采集与处理. 2024(01): 204-214 . 百度学术
    11. 李挺,金福生,李荣华,王国仁,段焕中,路彦雄. Light-HGNN:用于圈层内容推荐的轻量同质超图神经网络. 计算机研究与发展. 2024(04): 877-888 . 本站查看
    12. 杨洁祎 ,董一鸿 ,钱江波 . 基于图神经网络的小样本学习方法研究进展. 计算机研究与发展. 2024(04): 856-876 . 本站查看
    13. 胡昊,孙爽,马鑫,李擎,徐鹏. 基于图注意力网络的城市内涝积水预测与研究. 人民黄河. 2024(04): 43-48 . 百度学术
    14. 龙志,陈湘州. 基于图注意力LSTM深度学习的季度GDP预测应用. 湖南工程学院学报(社会科学版). 2024(01): 54-64+118 . 百度学术
    15. 张陶,廖彬,于炯,李敏,孙瑞娜. 图神经网络节点分类任务基准测试及分析. 计算机科学. 2024(04): 132-150 . 百度学术
    16. 刘润雨,贾路楠. 基于分班图神经网络的度不平衡节点分类. 信息技术与信息化. 2024(04): 114-117 . 百度学术
    17. 袁立宁,蒋萍,莫嘉颖,刘钊. 基于二阶图卷积自编码器的图表示学习. 计算机工程与应用. 2024(10): 180-187 . 百度学术
    18. 侯磊,刘金环,于旭,杜军威. 图神经网络研究综述. 计算机科学. 2024(06): 282-298 . 百度学术
    19. 楚小茜,张建辉,张德升,苏珲. 基于改进GraphSAGE算法的浏览器指纹追踪. 计算机科学. 2024(06): 409-415 . 百度学术
    20. 刘振威,黄影平,梁振明,杨静怡. 基于点云图卷积神经网络的3D目标检测. 上海理工大学学报. 2024(03): 320-330 . 百度学术
    21. 张强,彭骨,薛陈斌. 基于改进图注意力网络的油井产量预测模型. 吉林大学学报(理学版). 2024(04): 933-942 . 百度学术
    22. 李平,宋舒寒,张园,曹华伟,叶笑春,唐志敏. HSEGRL:一种分层可自解释的图表示学习模型. 计算机研究与发展. 2024(08): 1993-2007 . 本站查看
    23. 焦鹏飞,陈舒欣,郭翾,何东晓,刘栋. 图神经常微分方程综述. 计算机研究与发展. 2024(08): 2045-2066 . 本站查看
    24. 王长刚,王先伟,曹宇,李扬,吕琪,张耀心. 基于改进图注意力网络的电力系统脆弱性关键环节辨识. 电力系统保护与控制. 2024(15): 36-45 . 百度学术
    25. 李航程,钟勇. 基于多特征驱动图注意卷积网络的关系抽取. 计算机应用. 2024(S1): 24-28 . 百度学术
    26. 熊辛,涂志炜,唐韬,闵仕琦,冯雨欣,汤涛,叶海涛. 复杂场景下人体跌倒行为监测模型构建的研究. 中国数字医学. 2024(09): 91-96 . 百度学术
    27. 李鑫,陆伟,马召祎,朱攀,康彬. 基于图注意力和改进Transformer的节点分类方法. 电子学报. 2024(08): 2799-2810 . 百度学术
    28. 冯拓宇,刘佳宁,曹子奇,郭静,杨云祥. 社区发现方法研究综述. 中国电子科学研究院学报. 2024(06): 487-498+503 . 百度学术
    29. 贺鸣,郭熹,秦守浩,张珂珂. 一种图智能应用开发平台及电信运营商应用实践. 邮电设计技术. 2024(10): 73-77 . 百度学术
    30. 李鹏辉,翟正利,冯舒. 针对图神经网络的单节点扰动攻击. 计算机与数字工程. 2024(10): 3003-3008 . 百度学术
    31. 孙秀娟,孙福振,李鹏程,王澳飞,王绍卿. 融合掩码自编码器的自适应增强序列推荐. 计算机科学与探索. 2024(12): 3324-3334 . 百度学术
    32. 庞俊,程俊澳,林晓丽,王蒙湘. 基于动态超图小波神经网络的半监督超图节点分类. 计算机应用研究. 2024(12): 3735-3741 . 百度学术
    33. 周宇,肖健梅,王锡淮. 基于GCN和HGP-SL的电力系统暂态稳定评估. 电气工程学报. 2024(04): 246-254 . 百度学术
    34. 张蕾,钱峰,赵姝,陈洁,杨雪洁,张燕平. 基于卷积图神经网络的多粒度表示学习框架. 南京大学学报(自然科学). 2023(01): 43-54 . 百度学术
    35. 李洁莹,马佳瑛. 英语翻译机器人翻译错误自动检测系统研究. 自动化与仪器仪表. 2023(02): 242-246 . 百度学术
    36. 蒋玉英,陈心雨,李广明,王飞,葛宏义. 图神经网络及其在图像处理领域的研究进展. 计算机工程与应用. 2023(07): 15-30 . 百度学术
    37. 陈东洋,郭进利. 基于图注意力的高阶网络节点分类方法. 计算机应用研究. 2023(04): 1095-1100+1136 . 百度学术
    38. 韩冰,张鑫云,任爽. 基于三维点云的卷积运算综述. 计算机研究与发展. 2023(04): 873-902 . 本站查看
    39. 马东岭,吴鼎辉,陈家阁,姚国标,毛力波. 基于增强图注意力网络的高光谱影像分类方法. 山东建筑大学学报. 2023(02): 97-104 . 百度学术
    40. 尹拓凯,岳文静,陈志. 面向拜占庭攻击的认知用户分类. 计算机技术与发展. 2023(04): 102-107 . 百度学术
    41. 安波. 结构信息增强的文献分类方法研究. 农业图书情报学报. 2023(03): 15-24 . 百度学术
    42. 代祖华,刘园园,狄世龙. 语义增强的图神经网络方面级文本情感分析. 计算机工程. 2023(06): 71-80 . 百度学术
    43. 袁满,褚润夫,袁靖舒,陈萍. 融合上下文信息的图神经网络推荐模型研究. 吉林大学学报(信息科学版). 2023(04): 693-700 . 百度学术
    44. 刘佰阳,郑宇,魏琳,刘梅,金龙. 面向模型未知的冗余机器人运动规划方案. 兰州大学学报(自然科学版). 2023(04): 506-511 . 百度学术
    45. 梁龙跃,王浩竹. 基于图卷积神经网络的个人信用风险预测. 计算机工程与应用. 2023(17): 275-285 . 百度学术
    46. 陈淑娴. 基于知识图谱与图神经网络下无线业务预测手段优化. 软件. 2023(07): 83-85 . 百度学术
    47. 马华,姜伟,陈明,钟世杰. 基于图滤波器的符号属性图链路关系预测算法. 计算机技术与发展. 2023(09): 126-132 . 百度学术
    48. 柳博文,刘星. 多尺度卷积神经网络模型优化在矿物识别中的应用. 矿物岩石. 2023(03): 10-19 . 百度学术
    49. 田春生,陈雷,王源,王硕,周婧,王卓立,庞永江,杜忠. 基于图神经网络的电子设计自动化技术研究进展. 电子与信息学报. 2023(09): 3069-3082 . 百度学术
    50. 张华辉,邱晓莹,徐航. 文本情感分类方法研究综述. 延边大学学报(自然科学版). 2023(03): 275-282 . 百度学术
    51. 谷振宇,陈聪,郑家佳,孙棣华. 考虑时空相似性的动态图卷积神经网络交通流预测. 控制与决策. 2023(12): 3399-3408 . 百度学术
    52. 王松,骆莹,刘新民. 基于文本语义与关联网络双链路融合的用户生成内容价值早期识别研究. 数据分析与知识发现. 2023(11): 101-113 . 百度学术
    53. 曹汉童,陈璟. 融合Doc2vec与GCN的多类型蛋白质相互作用预测方法. 智能系统学报. 2023(06): 1165-1172 . 百度学术
    54. 丁红发,傅培旺,彭长根,龙士工,吴宁博. 混洗差分隐私保护的度分布直方图发布算法. 西安电子科技大学学报. 2023(06): 219-236 . 百度学术
    55. 马汉达,梁文德. 基于左归一化图卷积网络的推荐模型. 计算机应用. 2023(S2): 111-116 . 百度学术
    56. 闫明路,连航宇,朱丹青,程平. 图计算在反洗钱领域的应用. 金融会计. 2023(10): 65-72 . 百度学术
    57. 刘俊奇. 联合编码属性图聚类算法研究. 信息记录材料. 2022(04): 176-178 . 百度学术
    58. 熊晗. 图神经网络的开发与应用研究. 电视技术. 2022(04): 142-145 . 百度学术
    59. 杜雨晅,王巍,张闯,郑小丽,苏嘉涛,王杨洋. 基于自适应图卷积注意力神经协同推荐算法. 计算机应用研究. 2022(06): 1760-1766 . 百度学术
    60. 任嘉睿,张海燕,朱梦涵,马波. 基于元图卷积的异质网络嵌入学习算法. 计算机研究与发展. 2022(08): 1683-1693 . 本站查看
    61. 徐上上,孙福振,王绍卿,董家玮,吴田慧. 基于图神经网络的异构信任推荐算法. 计算机工程. 2022(09): 89-95+104 . 百度学术
    62. 张博,宋淑彩,赵一航. 基于GCN的节点分类研究. 河北建筑工程学院学报. 2022(02): 196-200 . 百度学术
    63. 卢锦玲,周阳,颜禄涵,张艺萱. 基于残差时空图神经网络的电力系统暂态稳定评估. 电力科学与工程. 2022(09): 54-64 . 百度学术
    64. 罗袆沅,蒋亚楠,许强,廖露,燕翱翔,刘陈伟. 基于深度学习的滑坡位移时空预测. 测绘学报. 2022(10): 2160-2170 . 百度学术
    65. 邢小雷,赵超,郑江文,温可欣. 图神经网络预训练综述. 小型微型计算机系统. 2022(12): 2487-2498 . 百度学术
    66. 闫如雪,余泽霖,刘泽宇. 基于高效层级学习图卷积网络的高光谱图像分类. 电脑与信息技术. 2022(06): 15-17+50 . 百度学术
    67. 邹长宽,田小平,张晓燕,张雨晴,杜磊. 基于GraphSage节点度重要性聚合的网络节点分类研究. 科学技术与工程. 2022(32): 14306-14312 . 百度学术
    68. 王宏朝,李魏琦,郭耀华,刘瀛,焦秀秀. 基于安全监测公共服务平台的企业供应链安全风险预控方法研究. 物流科技. 2022(20): 28-32 . 百度学术

    其他类型引用(305)

图(9)  /  表(11)
计量
  • 文章访问数:  271
  • HTML全文浏览量:  63
  • PDF下载量:  109
  • 被引次数: 373
出版历程
  • 收稿日期:  2023-03-16
  • 修回日期:  2023-08-14
  • 网络出版日期:  2024-03-13
  • 刊出日期:  2024-11-30

目录

    /

    返回文章
    返回