Processing math: 100%
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向边缘智能的协同推理综述

王睿, 齐建鹏, 陈亮, 杨龙

王睿, 齐建鹏, 陈亮, 杨龙. 面向边缘智能的协同推理综述[J]. 计算机研究与发展, 2023, 60(2): 398-414. DOI: 10.7544/issn1000-1239.202110867
引用本文: 王睿, 齐建鹏, 陈亮, 杨龙. 面向边缘智能的协同推理综述[J]. 计算机研究与发展, 2023, 60(2): 398-414. DOI: 10.7544/issn1000-1239.202110867
Wang Rui, Qi Jianpeng, Chen Liang, Yang Long. Survey of Collaborative Inference for Edge Intelligence[J]. Journal of Computer Research and Development, 2023, 60(2): 398-414. DOI: 10.7544/issn1000-1239.202110867
Citation: Wang Rui, Qi Jianpeng, Chen Liang, Yang Long. Survey of Collaborative Inference for Edge Intelligence[J]. Journal of Computer Research and Development, 2023, 60(2): 398-414. DOI: 10.7544/issn1000-1239.202110867
王睿, 齐建鹏, 陈亮, 杨龙. 面向边缘智能的协同推理综述[J]. 计算机研究与发展, 2023, 60(2): 398-414. CSTR: 32373.14.issn1000-1239.202110867
引用本文: 王睿, 齐建鹏, 陈亮, 杨龙. 面向边缘智能的协同推理综述[J]. 计算机研究与发展, 2023, 60(2): 398-414. CSTR: 32373.14.issn1000-1239.202110867
Wang Rui, Qi Jianpeng, Chen Liang, Yang Long. Survey of Collaborative Inference for Edge Intelligence[J]. Journal of Computer Research and Development, 2023, 60(2): 398-414. CSTR: 32373.14.issn1000-1239.202110867
Citation: Wang Rui, Qi Jianpeng, Chen Liang, Yang Long. Survey of Collaborative Inference for Edge Intelligence[J]. Journal of Computer Research and Development, 2023, 60(2): 398-414. CSTR: 32373.14.issn1000-1239.202110867

面向边缘智能的协同推理综述

基金项目: 国家自然科学基金项目(62173158,72004147)
详细信息
    作者简介:

    王睿: 1975年生.博士,教授. CCF高级会员.主要研究方向为物联网、边缘智能和智慧医疗

    齐建鹏: 1992年生.博士研究生.CCF学生会员.主要研究方向为边缘智能和资源管理

    陈亮: 1997年生.硕士研究生.主要研究方向为边缘智能与可靠性

    杨龙: 1999年生.硕士研究生.主要研究方向为轻量级模型与方法

  • 中图分类号: TP391

Survey of Collaborative Inference for Edge Intelligence

Funds: This work was supported by the National Natural Science Foundation of China (62173158,72004147).
  • 摘要:

    近年来,信息技术的不断变革伴随数据量的急剧爆发,使主流的云计算解决方案面临实时性差、带宽受限、高能耗、维护费用高、隐私安全等问题. 边缘智能的出现与快速发展有效缓解了此类问题,它将用户需求处理下沉到边缘,避免了海量数据在网络中的流动,得到越来越多的关注. 由于边缘计算中资源性能普遍较低,通过资源实现协同推理正成为热点.通过对边缘智能发展的趋势分析,得出边缘协同推理目前仍处于增长期,还未进入稳定发展期. 因此,在对边缘协同推理进行充分调研的基础上,将边缘协同推理划分为智能化方法与协同推理架构2个部分,分别对其中涉及到的关键技术进行纵向归纳整理,并从动态场景角度出发,对每种关键技术进行深入分析,对不同关键技术进行横向比较以及适用场景分析.最后对动态场景下的边缘协同推理给出值得研究的若干发展方向.

    Abstract:

    At present, the continuous change of information technology along with the dramatic explosion of data quantity makes the cloud computing solutions face many problems such as high latency, limited bandwidth, high carbon footprint, high maintenance cost, and privacy concerns. In recent years, the emergence and rapid development of edge computing has effectively alleviated such dilemmas, sinking user demand processing to the edge and avoiding the flow of massive data in the network. As a typical scenario of edge computing, edge intelligence is gaining increasing attention, in which one of the most important stages is the inference phase. Due to the general low performance of resources in edge computing, collaborative inference through resources is becoming a hot topic. By analyzing the trends of edge intelligence development, we conclude that collaborative inference at the edge is still in the increasing phase and has not yet entered a stable phase. We divide edge-edge collaborative inference into two parts: Intelligent methods and collaborative inference architecture, based on a thorough investigation of edge collaborative inference. The involved key technologies are summarized vertically and organized from the perspective of dynamic scenarios. Each key technology is analyzed in more detail, and the different key technologies are compared horizontally and analyzed on the application scenarios. Finally, we propose several directions that deserve further studying in collaborative edge inference in dynamic scenarios.

  • 国密SM4算法[1]是一种常用的分组密码算法,广泛应用于数据保护、加密通信等领域. SM4算法常见工作模式有ECB(electronic codebook),CBC(cipher block chaining)等,对于相同的明文块,ECB模式下会产生完全相同的密文,而在CBC模式下,当前的明文块会与前一块的密文异或后进行运算. 因此,即使是完全相同的明文输入也可能会有完全不同的密文输出. 相比于ECB模式,CBC模式提供了更高的安全性和抵抗攻击的能力,有着更高的应用需求. 提高SM4算法在CBC模式下的性能,对于在边缘设备中使用SM4算法是至关重要的. 但是,在CBC模式下存在着难以提高吞吐率的问题:每组的输入必须等待前一组运算结束后才能获得,因而难以使用流水线方法提升吞吐率.

    文献[2]中提到了一种改进方法,将电路中的S盒以外的其他逻辑结构进行预计算,并把预计算的结果与S盒进行融合构成新的查找表,从而提高SM4算法在CBC模式下吞吐率. 本文基于此工作进一步优化了S盒的表示,并针对轮函数的迭代过程进行了优化,最终减少了轮函数关键路径上的2次异或运算,有效提高了算法的性能.

    本文的设计针对CBC模式下的SM4算法,在TSMC 40 nm,SMIC 55 nm工艺下,使用Synopsys Design Compiler分别进行了ASIC综合. 综合结果显示,本文所提出的设计在CBC模式下的吞吐率达到了4.2 Gb/s,同时单位面积吞吐量达到了129.4 Gb·s−1·mm−2,明显优于已发表的类似设计. 这些结果表明本文所提出的化简方法在改进SM4算法性能方面具有很大的潜力.

    本文的结构为:首先介绍了SM4算法及其在CBC模式下存在的性能瓶颈问题. 然后,详细描述了本文提出的2个化简方法,并解释了它们在轮函数迭代和S盒置换过程中的作用. 接下来,介绍了实验设计并给出了实验结果分析和对比. 最后,对进一步改进和应用的方向进行了展望.

    SM4算法是一种对称密钥密码算法,被广泛应用于数据加密和保护领域,它是中国密码算法的标准之一,具有较高的安全性和良好的性能.

    SM4采用了分组密码的设计思想,将明文数据划分为128 b的数据块,并通过密钥对每个数据块进行加密和解密操作. 对单组数据进行加解密的流程如图1所示,分为密钥扩展算法和加解密算法2部分. 图1中的FK是系统预设的参数,与用户密钥进行异或运算后作为密钥扩展算法的输入. 加解密算法接受密钥扩展算法产生的32轮轮密钥rki对明文进行加解密,最后经反序变换输出. 加解密使用的是同一套计算流程,唯一的区别是解密时使用轮密钥的顺序与加密过程相反.

    图  1  SM4算法工作流程
    Figure  1.  Workflow of SM4 algorithm

    密钥扩展算法和加解密算法2部分均由32次轮函数迭代构成,整体结构均采用4路并行的Feistel结构,在计算过程中,以128 b数据为输入、128 b数据为输出,其内部的运算逻辑如图2所示. 输出中的前96 b数据等于输入中的后96 b数据,输出后的32 b数据通过轮函数运算产生.

    图  2  4路并行的Feistel结构
    Figure  2.  Four parallel Feistel structure

    在密钥扩展算法中使用的密钥是算法给定的固定密钥,记作cki. 在加解密算法中使用的密钥是由密钥扩展算法通过用户给的密钥扩展出来的轮密钥,记作rki.

    SM4密钥扩展算法结构如图3所示,密钥扩展的主要过程包括32轮密钥扩展的轮函数,其中,密钥为128 b,FK为SM4标准中规定的一个128 b常数. 二者异或后的值将会作为密钥扩展轮函数的首轮输入,并通过一个选择器进行循环迭代,总计迭代32轮产生32个轮密钥.

    图  3  SM4的密钥扩展算法结构
    Figure  3.  Key expansion algorithm structure of SM4

    设用户输入的密钥为MK,则该密钥对应的32轮轮密钥可以按照式(1)求出:

    {(k0,k1,k2,k3)=MKFK,ki+4=kiF(ki+1ki+2ki+3cki),rki=ki+4, (1)

    其中,cki是系统预设的32 b参数,rki代表第i轮的轮密钥,F代表密钥扩展轮函数,其由S盒置换算法τ:Z322Z322和线性变换算法L(x)=x(x<<<13)(x<<<23)组成,其中<<<表示循环左移运算.

    SM4算法的加解密算法的整体结构与密钥扩展算法类似,均包含32轮的轮函数迭代,区别在于加解密算法中额外包含1次反序变换.

    SM4算法的轮函数迭代流程如图4所示,X1~X4为第1轮的输入,X2~X5为第1轮的输出,同时也是第2轮的输入. rk1为第1轮的轮密钥,T函数代表加解密模块的轮函数. 与密钥扩展部分的轮函数F类似,由S盒置换算法τ和一个线性变换算法L(x)=x(x<<<2)(x<<<10) (x<<<18)(x<<<24)组成.

    图  4  SM4加解密模块轮函数结构
    Figure  4.  Round function structure of SM4 encryption and decryption modules

    通过多轮的迭代过程,SM4算法能够实现高强度的数据加密和解密. 然而,在CBC模式下,由于相邻数据之间的依赖关系,传统的流水线技术难以提高算法的吞吐率. 因此,针对这一问题,本文提出了2种化简方法,以减少关键路径上的运算,从而提高SM4算法在CBC模式下的性能.

    加解密模块的轮函数的结构如图4所示,若不考虑T函数带来的时序延迟,单次轮函数迭代的关键路径上共包含3次异或运算. 以公式的形式描述SM4算法加解密轮函数的迭代关系可得到式(2):

    Xi+4=Xi(Xi+1Xi+2Xi+3rki). (2)

    若考虑相邻的2次轮函数迭代,则有:

    {Xi+4=XiT(Xi+1Xi+2Xi+3rki),Xi+5=XiT(Xi+2Xi+3Xi+4rki+1). (3)

    观察式(1)~(3)不难发现,由于SM4采用了4条数据线路的Feistel结构进行设计,在相邻的2次轮函数迭代过程中,均有96 b的输入是完全一致的,在式(3)的计算过程中,相邻2轮的轮函数将Xi+2Xi+3计算了2次.

    因此,一个简单的优化思路便是,我们在轮函数之间传递数据时,额外传递Xi+2Xi+3rki+1的运算结果,并作用于下一次计算,得到的流程图如图5所示.

    图  5  优化的轮函数结构
    Figure  5.  Optimized round function structure

    相比于图4的运算流程,在计算当前轮次的输出时,二次优化过后的轮函数通过提前获取下一轮次使用的密钥,并利用2轮之间相同的数据提前计算,可以使得在加解密的流程中总计节省32次异或运算的时间.

    S盒是密码学领域的一个基本组件,其功能是实现数据的非线性变换,在DES,AES,SM1,SM4等算法中均有应用. 在SM4算法中,其提供了一个8 b到8 b的非线性变换.

    在SM4算法中,S盒模块通常与另一个线性变换函数L组合使用,即图4图5中的T函数,其位于加解密算法轮函数的关键路径上,因此,如果能找到优化T函数关键路径的方法延时,也可以使得整个加解密模块的延时变小,进而提高运算效率.T函数的内部结构如图6所示,图中的<<<表示对32 b数据进行循环左移,关键路径包括1个S盒和3次异或运算. 在硬件实现中,循环移位可以通过硬件连线来实现,不会带来额外的路径延时.

    图  6  SM4加解密模块T函数结构
    Figure  6.  T function structure of SM4 encryption and decryption modules

    T函数中包含4次异或运算,反映到电路设计中,其关键路径上至少存在3次异或运算. 因此,一个优化思路便是,将算法中的S盒的输入输出修改为8 b输入、32 b输出[2-3] ,并提前将L函数作用于图中的4个S盒,如图7所示. 图7中,通过编码的形式保存其运行结果,将图6中的SBox与后续的线性变换L组合形成exSBox,之后仅需要将4个exSBox的输出异或即可,从而减少了1次异或运算.

    图  7  优化的T函数结构
    Figure  7.  Optimized T-function structure

    虽然修改后的S盒比原先的S盒输出了更多的数据,但在硬件实现中,仍然是通过相同数量的多路选择器查表输出. 因此修改前后的S盒的路径延时及其安全性并未改变.

    图7中的exSBox1为例,使用0xff作为输入展示exSBox1的构造方式,首先获得0xff作用于S盒后的运行结果0x48. 由于exSBox1的输入对应最高四位,因此,将其拓展为32 b数据为0x48000000. 在经过L函数后,得到的值是0x68492121. 如表1所示,表中前5行加粗部分表示传入的数据及其循环移位后所处位置,其余位置在任意输入下都恒等于0.

    表  1  搜索空间降低比率和命中率
    Table  1.  Search Space Reduction Rate and Hit Rate
    原数据 01001000 00000000 00000000 00000000
    <<<2 00100000 00000000 00000000 00000001
    <<<10 00000000 00000000 00000001 00100000
    <<<18 00000000 00000001 00100000 00000000
    <<<24 00000000 01001000 00000000 00000000
    异或和 01101000 01001001 00100001 00100001
    注:加粗部分表示传入的数据及其循环移位后所处位置.
    下载: 导出CSV 
    | 显示表格

    观察表1的运算结果不难发现,除最后一行加粗数字表示的第0~5位,第14,15位由异或运算产生,其余的24位均是输入的8位数据的排列组合,因此在硬件设计时,可以仅使用8 b输入、16 b输出的S盒实现. 对于图7中剩余的3个exSBox,在相同的输入下,可以通过对表1中的数据进行循环移位,得到对应的输出. 上述结论对4个位于不同部位的S盒均成立.

    具体而言,令p为输入的8 b数据,τ(p)为标准SM4算法中S盒的输出. X=(x0,x1,,x15)为exSBox1中存储的16 b数据,Y=(y0,y1,,y31)为优化后的T函数中需要的32 b输出. τ为SM4算法标准中使用的S盒置换函数,其对于8 b输入,产生对应的8 b输出,则X可以由式(4)产生:

    {(x0,x1,,x7)=τ(p),(x8,x9,,x15)=τ(p)(τ(p)<<<2). (4)

    表1可知,Y的取值实际上可以由X经过排列组合得到,对于exSBox2,exSBox3,exSBox4的取值,可以通过Y循环移位得到,且由于该过程中仅包含赋值运算,在电路设计中可以通过物理连线完成. 相比于文献[2]中的设计,节约了1/3的面积消耗. 具体的计算方式如式(5)所示.

    {(y0,y1,,y5)=(x8,x9,,x13)(y6,y7=(x6,x7)(y8,y9,,y13)=(x0,x1,,x5)(y14,y15=(x14,x15)(y16,y17,,y21)=(x2,x3,,x7)(y22,y23=(x0,x1)(y24,y25,,y29)=(x2,x3,,x7)(y30,y31=(x0,x1). (5)

    现场可编程逻辑门阵列(FPGA)和专用集成电路(ASIC)是目前主流使用硬件电路实现密码算法的2个方式. FPGA虽然具有可编程性、灵活性和快速设计等优势,但ASIC相较于FPGA拥有更高的性能,与本文设计追求的高效率目标相符,所以选择在ASIC下实现.

    SM4硬件系统的整体结构设计如图8所示,包括密钥扩展模块、加解密模块和适配CBC工作模式的组合逻辑. 对于单个加解密任务,若明文被分为n组,会执行1次密钥扩展和n次加解密. 因此,优化加解密算法的执行效率是优化SM4硬件设计的重点. 本文所提出的2种化简方法,对于每一组明文输入,可以减少64级异或门的延时,极大地提升了运算效率.

    图  8  SM4硬件整体架构
    Figure  8.  Overall architecture of SM4 hardware

    SM4算法的硬件实现主要有2种方案:一种方案是流水线结构,即通过寄存器连接多个加解密模块同时工作以提高加解密的效率,如图9(a)所示;另一种方案是使用循环迭代的方式. 即一次性提取32个轮函数中的n轮组合成一个组合电路,称为n合1电路,如图9(b)所示. 流水线结构的优势是可以充分利用n个加密核心的性能,在不影响整体工作频率的情况下加速运算. 对于SM4算法而言,在合理范围内堆叠流水线可以实现极高的吞吐量.

    图  9  流水线结构与循环迭代结构
    Figure  9.  Pipeline architecture and loop iteration architecture

    然而,流水线结构仅适用于ECB等数据无前后依赖的工作模式. 在CBC工作模式下,由于需要将前一轮的输出与本轮的输入进行异或运算,相邻的数据存在依赖,故而无法使用流水线加速运算. 因此,在本设计中没有选用流水线结构.

    虽然循环迭代结构会降低整体模块的工作频率,对吞吐量的提升较为有限,但可以同时兼容 ECB,CBC这 2种工作模式. 本设计最终选择了循环迭代的设计方式.

    在SM4算法中,密钥扩展与加解密算法类似,均包含32轮迭代. 密钥扩展模块采用图2所示的单轮组合逻辑电路循环32次来实现32轮迭代.

    在密钥扩展模块的输出端,使用寄存器存放每一轮电路的轮密钥,标号为0~31,如图10所示. 标号从0开始的好处是:在解密时,使用到的密钥顺序相反的,加密的第k轮使用的是第k1号密钥,解密的第k轮使用的是第32k号密钥. 在二进制下,二者的标号可以通过取反操作相互转化.

    图  10  轮密钥的存储与使用
    Figure  10.  Storage and usage of round keys

    为了保证运算结果的准确性,密钥扩展模块还 会向加解密模块发出控制信号表明自己的工作状态,以避免在轮密钥尚未完全更新时使用错误的轮密钥进行加解密.

    在国家标准文档[1]中,并没有针对CBC工作模式给出具体的测试用例. 因此,本文设计方案通过完整的Verilog HDL语言实现,通过在FPGA平台进行综合、仿真和上板验证,以确保功能正确并进行相关性能分析,如图11所示. 具体而言,通过PCIE上位机下发随机的明文数据到FPGA开发板,开发板完成加密后传回上位机,通过与软件对比实现功能验证. 若在循环验证多次后二者的输出均完全相同,则认为设计的SM4电路的功能正确.

    图  11  测试流程
    Figure  11.  Testing procedures

    最终,本文的设计在Zynq 7020 FPGA开发板上完成了上板验证,确保了功能的正确性,工作频率最高可达95 MHz,吞吐量约为1.5 Gb/s.

    ASIC上主要针对2种工艺SMIC 55 nm与 TSMC 40 nm进行了测试、通过Synopsys公司的EDA工具DesignCompiler进行时序等综合约束,我们选择了芯片面积和芯片使用的逻辑门数量(gates)作为评估指标,其结果如表2表3所示,在CBC模式下,本文的设计在3.97 mW的功耗下,单位面积吞吐率达129.4 Gb·s−1·mm−2,明显优于同类设计. 此外,以使用逻辑门的数量为评估标准,本文提出的设计在该指标上也明显优于同类设计,单位面积吞吐率为0.205×10−3 Gb·s−1·gates−1.

    表  2  SM4综合结果与面积效率对比
    Table  2.  Comparison of SM4 Synthesis Results and Area Efficiency
    工艺节点 芯片面积/mm2 吞吐率/(Gb·s−1 单位面积吞吐率/
    (Gb·s−1·mm−2
    功耗/mW
    40 nm* 0.0335 4.34 129.40 3.97
    55 nm* 0.0877 4.41 50.30 10.88
    65 nm[2] 0.1260 5.24 41.59
    180 nm[4] 0.0790 0.10 1.27 5.31
    55 nm[5] 0.0870 0.40 4.59 4.35
    350 nm[6] 0.0270 0.412 15.26
    注:*标注的表示本文的结果.
    下载: 导出CSV 
    | 显示表格
    表  3  SM4综合结果与门效率对比
    Table  3.  Comparison of SM4 Synthesis Results and Gates Efficiency
    工艺节点 gates 吞吐率/(Gb·s−1 单位面积吞吐率/
    (Gb·s−1·gates−1
    40 nm* 21.2×103 4.34 0.205×10−3
    55 nm* 21.1×103 4.41 0.209×10−3
    180 nm[6] 32.0×103 0.80 0.025×10−3
    65 nm[7] 31.0×103 1.23 0.040×10−3
    55 nm[8] 22.0×103 0.27 0.012×10−3
    130 nm[9] 22.0×103 0.80 0.036×10−3
    注:*标注的表示本文的结果.
    下载: 导出CSV 
    | 显示表格

    在不同工艺、电压下对该设计进行综合,可以得到本文设计在不同使用场景下的吞吐率. 在TSMC 40 nm、SMIC 55 nm、SMIC 130 nm下使用不同的工艺角分别对本文的设计进行综合,结果如表4所示.

    表  4  不同工艺角下的SM4综合结果与效率对比
    Table  4.  Comparison of SM4 Synthesis Results and Efficiency with Different Process Corners
    工艺节点 工艺角 面积/gates 吞吐率/(Gb·s−1 功耗/mW
    40 nm 0.99V/125°C/SS 21.0×103 2.40 2.55
    1.1V/25°C/TT 21.2×103 4.34 3.97
    1.21V/0°C/FF 20.9×103 6.96 8.35
    55 nm 1V/25°C/TT 20.0×103 2.78 4.10
    1.2V/25°C/TT 21.1×103 4.41 10.88
    1.32V/0°C/FF 17.8×103 6.84 33.59
    130 nm 1.08V/125°C/SS 20.8×103 1.11 6.86
    1.2V/25°C/TT 21.0×103 1.75 15.70
    1.32V/0°C/FF 21.8×103 2.45 23.03
    下载: 导出CSV 
    | 显示表格

    根据本文提出的2种对SM4加解密模块关键路径进行化简以及降低面积的方法,实现了4合1的SM4电路,并基于Zynq7020开发板进行了功能验证. 此外,ASIC综合结果表明本文的SM4电路相比于其他方案有更高的单位面积吞吐率和更低的功耗. 因此,这种对SM4算法进行的优化是有效的,并且对其他分组算法提高CBC模式下的单位面积吞吐率具有参考价值.

    作者贡献声明:郝泽钰提出研究方案并完成了论文的撰写;代天傲、黄亦成、段岑林协助完成了ASIC平台上的验证实验;董进、吴世勇、张博、王雪岩、贾小涛提出指导意见并修改论文;杨建磊提出指导意见并讨论定稿.

  • 图  1   边缘智能发展趋势

    Figure  1.   Edge intelligence developmental trend

    图  2   协同推理关键技术出现时间

    Figure  2.   Emerging time of key techniques in collaborative inference

    图  3   边缘协同推理关键技术、过程及应用场景

    Figure  3.   Key techniques, processes and application scenarios of edge collaborative inference

    图  4   模型切割方式

    Figure  4.   Model partition methods

    图  5   早期退出模式

    Figure  5.   Early exit pattern

    图  6   模型选择模式

    Figure  6.   Model selection pattern

    图  7   主流的边缘计算协同推理架构

    Figure  7.   Mainstream of collaborative inference framework in edge computing

    表  1   模型切割方法比较

    Table  1   Comparison of Model Partition Methods

    方法模型切割执行者切割的依据收集方式切片依赖关系处理/服务发现方式切片更新方式优化目标运行时涉及
    的切片数量
    DeepThings[46-47]网关(gateway)周期性收集节点状态网关统一调度节点拥有完整模型内存、通信≥2
    ADCNN[55]中心节点(central node)基于历史任务执行时延估计中心节点调度节点拥有完整模型时延、通信≥2
    Neurosurgeon[34]客户端实时观测当前网络、能耗状态IP绑定(固定)节点拥有完整模型能耗(时延)2
    MoDNN[48]中心节点(group owner)节点注册到中心节点时获取中心节点调度部署一次,无更新时延≥2
    DeepX[50]中心节点(execution planner)实时收集与线性回归预测中心节点调度每次运行推断重新生成执行计划能耗、内存≥2
    AOFL[51]云端或中心节点周期性收集节点状态IP绑定(固定)重新部署时延、通信≥2
    CRIME[52]任意节点节点实时交互直接邻居集合节点拥有完整模型时延、能耗≥2
    DeepSlicing[53]主节点调度(master)基于历史任务执行时延估计中心节点调度节点拥有完整模型时延、内存≥2
    Edgent[54]主节点(边缘服务器)观测的历史网络数据IP绑定(固定)重新部署准确率、时延2
    文献[45]中心节点实时收集节点状态IP绑定(固定)节点拥有完整模型内存≥2
    Cogent[49]中心节点(DDPG adgent)周期性收集节点状态Kubernetes提供的静态虚拟IP绑定
    (固定)
    重新部署准确率、时延2
    文献[5657]边缘服务器(server)根据模型及优化目标折中分析IP绑定(固定)重新部署计算、通信时延2
    下载: 导出CSV

    表  2   不同架构的比较

    Table  2   Comparison of Different Architectures

    序号名称关键结合技术针对的问题适用场景
    1基于模型切割的云边协同推理模型切割、数据压缩、量化、
    矩阵分解/压缩、早期退出
    边端设备能耗、算力有限、
    能耗与时延“折中”
    有云端支撑、数据预处理、
    隐私、负载实时调整
    2基于模型切割的边边协同推理模型切割、数据压缩、量化、
    矩阵分解/压缩
    与云端链接不可靠、单节点资
    源受限、能耗与时延“折中”
    无云端支撑、单节点资源不足
    且有邻居节点、通信代价低
    3基于模型选择的云边协同推理数据压缩、模型压缩、知识蒸馏边端设备推理精度不可靠推理精度高可信、边端节点
    资源相对充足
    4基于多模型结果聚合的边边协同推理数据/模型融合、数据压缩、
    异/同步通信
    协同推理并行度低、
    推理精度不可靠
    多场景协同推理、边端节点资源
    充足、对时延要求相对较低
    下载: 导出CSV
  • [1]

    David S, David C, Nick J. Top 10 strategic technology trends for 2020[EB/OL]. (2019-10-20)[2022-02-05].https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technology-trends-for-2020

    [2]

    Carrie M, David R, Michael S. The growth in connected IoT devices is expected to generate 79.4ZB of data in 2025, according to a new IDC forecast[EB/OL]. (2019-06-18) [2022-02-15].https://www.businesswire.com/news/home/20190618005012/en/The-Growth-in-Connected-IoT-Devices-is-Expected-to-Generate-79.4ZB-of-Data-in-2025-According-to-a-New-IDC-Forecast

    [3]

    Xiao Yinhao, Jia Yizhen, Liu Chunchi, et al. Edge computing security: State of the art and challenges[J]. Proceedings of the IEEE, 2019, 107(8): 1608−1631 doi: 10.1109/JPROC.2019.2918437

    [4]

    Kevin M, Amir E. AWS customers rack up hefty bills for moving data[EB/OL]. (2019-10-21)[2022-02-15].https://www.theinformation.com/articles/aws-customers-rack-up-hefty-bills-for-moving-data

    [5]

    Jin Hai, Jia Lin, Zhou Zhi. Boosting edge intelligence with collaborative cross-edge analytics[J]. IEEE Internet of Things Journal, 2020, 8(4): 2444−2458

    [6]

    Xiang Chong, Wang Xinyu, Chen Qingrong, et al. No-jump-into-latency in China's Internet! toward last-mile hop count based IP geo-localization[C/OL] //Proc of the 19th Int Symp on Quality of Service. New York: ACM, 2019[2021-03-15].https://doi.org/10.1145/3326285.3329077

    [7]

    Jiang Xiaolin, Shokri-Ghadikolaei H, Fodor G, et al. Low-latency networking: Where latency lurks and how to tame it[J]. Proceedings of the IEEE, 2018, 107(2): 280−306

    [8] 施巍松,张星洲,王一帆,等. 边缘计算: 现状与展望[J]. 计算机研究与发展,2019,56(1):69−89

    Shi Weisong, Zhang Xingzhou, Wang Yifan, et al. Edge computing: Status quo and prospect[J]. Journal of Computer Research and Development, 2019, 56(1): 69−89 (in Chinese)

    [9]

    Zamora-Izquierdo MA, Santa J, Martínez JA, et al. Smart farming IoT platform based on edge and cloud computing[J]. Biosystems Engineering, 2019, 177(1): 4−17

    [10] 肖文华,刘必欣,刘巍,等. 面向恶劣环境的边缘计算综述[J]. 指挥与控制学报,2019,5(3):181−190

    Xiao Wenhua, Liu Bixin, Liu Wei, et al. A review of edge computing for harsh environments[J]. Journal of Command and Control, 2019, 5(3): 181−190 (in Chinese)

    [11]

    Stojkoska BLR, Trivodaliev KV. A review of Internet of things for smart home: Challenges and solutions[J]. Journal of Cleaner Production, 2017, 140(3): 1454−1464

    [12]

    Wan Shaohua, Gu Zonghua, Ni Qiang. Cognitive computing and wireless communications on the edge for healthcare service robots[J]. Computer Communications, 2020, 149(1): 99−106

    [13] 吕华章,陈丹,范斌,等. 边缘计算标准化进展与案例分析[J]. 计算机研究与发展,2018,55(3):487−511

    Lü Huazhang, Chen Dan, Fan Bin, et al. Standardization progress and case analysis of edge computing[J]. Journal of Computer Research and Development, 2018, 55(3): 487−511 (in Chinese)

    [14]

    Qi Jianpeng. Awesome edge computing[EB/OL]. (2003-06-02) [2022-03-15]. https://github.com/qijianpeng/awesome-edge-computing#engine

    [15]

    Cheol-Ho H, Blesson V. Resource management in fog/edge computing: A survey on architectures, infrastructure, and algorithms[J]. ACM Computing Surveys, 2019, 52(5): 1−37 doi: 10.1145/3342101

    [16] 曾鹏,宋纯贺. 边缘计算[J]. 中国计算机学会通讯,2020,16(1):8−10

    Zeng Peng, Song Chunhe. Edge computing[J]. Communications of China Computer Federation, 2020, 16(1): 8−10 (in Chinese)

    [17] 高晗,田育龙,许封元,等. 深度学习模型压缩与加速综述[J]. 软件学报,2021,32(1):68−92

    Gao Han, Tian Yulong, Xu Fengyuan, et al. Overview of deep learning model compression and acceleration[J]. Journal of Software, 2021, 32(1): 68−92 (in Chinese)

    [18]

    Zhou Zhi, Chen Xu, Li En, et al. Edge intelligence: Paving the last mile of artificial intelligence with edge computing[J]. Proceedings of the IEEE, 2019, 107(8): 1738−1762 doi: 10.1109/JPROC.2019.2918951

    [19] 李肯立,刘楚波. 边缘智能: 现状和展望[J]. 大数据,2019,5(3):69−75

    Li Kenli, Liu Chubo. Edge intelligence: Status quo and prospect[J]. Big Data, 2019, 5(3): 69−75 (in Chinese)

    [20] 谈海生,郭得科,张弛,等. 云边端协同智能边缘计算的发展与挑战[J]. 中国计算机学会通讯,2020,16(1):38−44

    Tan Haisheng, Guo Deke, Zhang Chi, et al. Development and challenges of cloud-edge-device collaborative intelligent edge computing[J]. Communications of China Computer Federation, 2020, 16(1): 38−44 (in Chinese)

    [21] 张星洲,鲁思迪,施巍松. 边缘智能中的协同计算技术研究[J]. 人工智能,2019,5(7):55−67

    Zhang Xingzhou, Lu Sidi, Shi Weisong. Research on collaborative computing technology in edge intelligence[J]. Artificial Intelligence, 2019, 5(7): 55−67 (in Chinese)

    [22] 王晓飞. 智慧边缘计算: 万物互联到万物赋能的桥梁[J]. 人民论坛·学术前沿,2020(9):6−17

    Wang Xiaofei. Smart edge computing: The bridge from the Internet of everything to the empowerment of everything[J]. People’s Forum·Academic Frontiers, 2020(9): 6−17 (in Chinese)

    [23]

    Fan Zhenyu, Wang Yang, Fan Wu, et al. Serving at the edge: An edge computing service architecture based on ICN[J]. ACM Transactions on Internet Technology, 2021, 22(1): 1−27

    [24]

    Jennings A , Copenhagen R V , Rusmin T. Aspects of network edge intelligence[R/OL]. 2001 [2022-03-16]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.6997&rep=rep1&type=pdf

    [25]

    Romaniuk R S. Intelligence in optical networks[G] //Proceedings of SPIE 5125: Proc of the Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments. Bellingham, WA: SPIE, 2003: 17−31

    [26]

    Okagawa T, Nishida K, Yabusaki M. A proposed mobility management for IP-based IMT network platform[J]. IEICE Transactions on Communications, 2005, 88(7): 2726−2734

    [27]

    Liang Ye. Mobile intelligence sharing based on agents in mobile peer-to-peer environment[C] //Proc of the 3rd Int Symp on Intelligent Information Technology and Security Informatics. Piscataway, NJ: IEEE, 2010: 667−670

    [28]

    Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84−90

    [29]

    Szegedy C, Liu Wei, Jia Yangqing, et al. Going deeper with convolutions[C/OL] //Proc of the 28th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015 [2022-03-16]. https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Szegedy_Going_Deeper_With_2015_CVPR_paper.html

    [30]

    Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL]. (2016-11-04) [2022-03-16]. https://arxiv.org/abs/1602.07360

    [31]

    Cao Yu, Chen Songqing, Hou Peng, et al. FAST: A Fog computing assisted distributed analytics system to monitor fall for Stroke mitigation[C] //Proc of the 10th IEEE Int Conf on Networking, Architecture and Storage. Piscataway, NJ: IEEE, 2015: 2−11

    [32]

    Teerapittayanon S, McDanel B, Kung H T. Distributed deep neural networks over the cloud, the edge and end devices[C] //Proc of the 37th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2017: 328−339

    [33]

    Wang Xiaofei, Han Yiwen, Wang Chenyang, et al. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning[J]. IEEE Network, 2019, 33(5): 156−165 doi: 10.1109/MNET.2019.1800286

    [34]

    Kang Yiping, Johann H, Gao Cao, et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge[J]. ACM SIGARCH Computer Architecture News, 2017, 45(1): 615−629 doi: 10.1145/3093337.3037698

    [35]

    Li En, Zhou Zhi, and Chen Xu. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy[C] //Proc of the 2018 Workshop on Mobile Edge Communications. New York, ACM, 2018: 31−36

    [36] 李逸楷,张通,陈俊龙. 面向边缘计算应用的宽度孪生网络[J]. 自动化学报,2020,46(10):2060−2071

    Li Yikai, Zhang Tong, Chen Junlong. Wide twin networks for edge computing applications[J]. Acta Automatica Sinica, 2020, 46(10): 2060−2071 (in Chinese)

    [37]

    Al-Rakhami M, Alsahli M, Hassan M M, et al. Cost efficient edge intelligence framework using docker containers[C] //Proc of the 16th IEEE Int Conf on Dependable, Autonomic and Secure Computing. Piscataway, NJ: IEEE, 2018: 800−807

    [38]

    Al-Rakhami M, Gumaei A, Alsahli M, et al. A lightweight and cost effective edge intelligence architecture based on containerization technology[J]. World Wide Web, 2020, 23(2): 1341−1360 doi: 10.1007/s11280-019-00692-y

    [39]

    Verbraeken J, Wolting M, Katzy J, et al. A survey on distributed machine learning[J]. ACM Computing Surveys, 2020, 53(2): 1−33 doi: 10.1145/3389414

    [40] 杨涛,柴天佑. 分布式协同优化的研究现状与展望[J]. 中国科学:技术科学,2020,50(11):1414−1425 doi: 10.1360/SST-2020-0040

    Chai Tianyou, Yang Tao. Research status and prospects of distributed collaborative optimization[J]. Scientia Sinica Technologica, 2020, 50(11): 1414−1425 (in Chinese) doi: 10.1360/SST-2020-0040

    [41]

    Merenda M, Porcaro C, Iero D. Edge machine learning for AI-enabled IoT devices: A review[J/OL]. Sensors, 2020, 20(9) [2022-03-18]. https://doi.org/10.3390/s20092533

    [42]

    Véstias M P, Duarte R P, de Sousa J T, et al. Moving deep learning to the edge[J/OL]. Algorithms, 2020, 13(5) [2022-03-18]. https://doi.org/10.3390/a13050125

    [43]

    Chen Jiasi, Ran Xukan. Deep learning with edge computing: A review[J]. Proceedings of the IEEE, 2019, 107(8): 1655−1674 doi: 10.1109/JPROC.2019.2921977

    [44] 洪学海,汪洋. 边缘计算技术发展与对策研究[J]. 中国工程科学,2018,20(2):28−34

    Hong Xuehai, Wang Yang. Research on the development and countermeasures of edge computing technology[J]. China Engineering Science, 2018, 20(2): 28−34 (in Chinese)

    [45]

    Hadidi R, Cao Jiashen, Ryoo M S, et al. Toward collaborative inferencing of deep neural networks on Internet-of-things devices[J]. IEEE Internet of Things Journal, 2020, 7(6): 4950−4960 doi: 10.1109/JIOT.2020.2972000

    [46]

    Zhao Zhuoran, Barijough K M, Gerstlauer A. Deepthings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2348−2359 doi: 10.1109/TCAD.2018.2858384

    [47]

    Pnevmatikatos D N, Pelcat M, Jung M. Embedded Computer Systems: Architectures, Modeling, and Simulation[M]. Berlin: Springer, 2019

    [48]

    Mao Jiachen, Chen Xiang, Nixon K W, et al. MoDNN: Local distributed mobile computing system for deep neural network[C] //Proc of the 24th Design, Automation Test in Europe Conf Exhibition. Piscataway, NJ: IEEE, 2017: 1396−1401

    [49]

    Shan Nanliang, Ye Zecong, Cui Xiaolong. Collaborative intelligence: Accelerating deep neural network inference via device-edge synergy[J/OL]. Secrrity and Communication Networks, 2020 [2022-03-16]. https://doi.org/10.1155/2020/8831341

    [50]

    Lane N D, Bhattacharya S, Georgiev P, et al. DeepX: A software accelerator for low-power deep learning inference on mobile devices[C/OL] //Proc of the 15th ACM/IEEE Int Conf on Information Processing in Sensor Networks (IPSN). 2016 [2022-04-06]. https://doi.org/10.1109/IPSN.2016.7460664

    [51]

    Zhou Li, Samavatian M H, Bacha A, et al. Adaptive parallel execution of deep neural networks on heterogeneous edge devices[C] //Proc of the 4th ACM/IEEE Symp on Edge Computing. New York: ACM, 2019: 195−208

    [52]

    Jahierpagliari D, Chiaro R, Macii E, et al. CRIME: Input-dependent collaborative inference for recurrent neural networks[J]. IEEE Transactions on Computers, 2020, 70(10): 1626−1639

    [53]

    Zhang Shuai, Zhang Sheng, Qian Zhuzhong, et al. DeepSlicing: Collaborative and adaptive CNN inference with low latency[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 22(9): 2175−2187

    [54]

    Li En, Zeng Liekang, Zhou Zhi, et al. Edge AI: On-demand accelerating deep neural network inference via edge computing[J]. IEEE Transactions on Wireless Communications, Institute of Electrical and Electronics Engineers, 2020, 19(1): 447−457

    [55]

    Zhang Saiqian, Lin Jieyu, Zhang Qi. Adaptive distributed convolutional neural network inference at the network edge with ADCNN[C/OL] //Proc of the 49th Int Conf on Parallel Processing. 2020 [2022-03-18]. https://doi.org/10.1145/3404397.3404473

    [56]

    Shao Jiawei, Zhang Jun. BottleNet++: An end-to-end approach for feature compression in device-edge co-inference systems[C/OL] //Proc of the IEEE Int Conf on Communications Workshops. Piscataway, NJ: IEEE, 2020 [2022-03-18]. https://doi.org/10.1109/ICCWorkshops49005.2020.9145068

    [57]

    Shao Jiawei, Zhang Jun. Communication-computation trade-off in resource-constrained edge inference[J]. IEEE Communications Magazine, 2020, 58(12): 20−26 doi: 10.1109/MCOM.001.2000373

    [58]

    Avasalcai C, Tsigkanos C, Dustdar S. Resource management for latency-sensitive IoT applications with satisfiability[J/OL]. IEEE Transactions on Services Computing, 2021 [2022-03-18]. https://doi.ieeecomputersociety.org/10.1109/TSC.2021.3074188

    [59]

    Chen Min, Li Wei, Hao Yiyue, et al. Edge cognitive computing based smart healthcare system[J]. Future Generation Computer Systems, 2018, 86(9): 403−411

    [60]

    Hu Diyi, Krishnamachari B. Fast and accurate streaming cnn inference via communication compression on the edge[C] //Proc of the 5th ACM/IEEE Int Conf on Internet of Things Design and Implementation. Piscataway, NJ: IEEE, 2020: 157−163

    [61]

    Hsu K J, Choncholas J, Bhardwaj K, et al. DNS does not suffice for MEC-CDN[C] //Proc of the 19th ACM Workshop on Hot Topics in Networks. New York: ACM, 2020: 212−218

    [62]

    Campolo C, Lia G, Amadeo M, et al. Towards named AI networking: Unveiling the potential of NDN for edge AI[G] //LNCS 12338: Proc of the 19th Int Conf on Ad-Hoc Networks and Wireless. Cham: Springer, 2020: 16−22

    [63]

    Jiang A H, Wong D L K, Canel C, et al. Mainstream: Dynamic stem-sharing for multi-tenant video processing[C] //Proc of the 2018 USENIX Annual Technical Conf. New York: ACM, 2018: 29−42

    [64]

    Mhamdi E, Guerraoui R, Rouault S. On the robustness of a neural network[C] //Proc of the 36th IEEE Symp on Reliable Distributed Systems. Piscataway, NJ: IEEE, 2017: 84−93

    [65]

    Yousefpour A, Devic S, Nguyen B Q, et al. Guardians of the Deep Fog: Failure-resilient DNN inference from edge to cloud[C] //Proc of the 1st Int Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things. New York: ACM, 2019: 25−31

    [66]

    Hu Chuang, Bao Wei, Wang Dan, et al. Dynamic adaptive DNN surgery for inference acceleration on the edge[C] //Proc of the 38th IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2019: 1423−1431

    [67]

    Song Han, Mao Huizi, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[EB/OL]. (2016-02-15) [2022-03-18]. https://arxiv.org/abs/1510.00149

    [68]

    Masana M, van de Weijer J, Herranz L, et al. Domain-adaptive deep network compression[C] //Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 22−29

    [69]

    Courbariaux M, Bengio Y, David J P. BinaryConnect: Training deep neural networks with binary weights during propagations[C] //Proc of the 28th Int Conf on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2015: 3123−3131

    [70]

    Gholami A, Kim S, Zhen Dong, et al. A survey of quantization methods for efficient neural network inference[J]. arXiv preprint, arXiv: 2103.13630, 2021

    [71]

    Cao Qingqing, Irimiea A E, Abdelfattah M, et al. Are mobile DNN accelerators accelerating DNNs?[C] //Proc of the 5th Int Workshop on Embedded and Mobile Deep Learning. New York: ACM, 2021: 7−12

    [72]

    Guo Kaiyuan, Song Han, Song Yao, et al. Software-hardware codesign for efficient neural network acceleration[J]. IEEE Micro, 2017, 37(2): 18−25 doi: 10.1109/MM.2017.39

    [73]

    Guo Kaiyuan, Li Wenshuo, Zhong Kai, et al. Neural network accelerator comparison[EB/OL]. (2018-01-01) [2022-12-26]. https://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator tsinghua.edu.cn/project.html

    [74]

    Li Hao, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets[J]. arXiv preprint, arXiv: 1608.08710, 2017

    [75]

    Luo Jianhao, Zhang Hao, Zhou Hongyu, et al. ThiNet: Pruning cnn filters for a thinner net[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(10): 2525−2538 doi: 10.1109/TPAMI.2018.2858232

    [76]

    He Yihui, Zhang Xianyu, Sun Jian. Channel pruning for accelerating very deep neural networks[C] //Proc of the 16th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 1398−1406

    [77]

    Hu Hengyuan, Peng Rui, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures[J]. arXiv preprint, arXiv: 1607.03250, 2016

    [78]

    Wen Wei, Wu Chunpeng, Wang Yandan, et al. Learning structured sparsity in deep neural networks[C] //Proc of the 30th Int Conf on Neural Information Processing Systems. New York: ACM, 2016: 2082−2090

    [79] Chen Hanting, Wang Yunhe, Xu Chang, et al. Data-free learning of student networks[C] //Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 3513−3521
    [80]

    Niu Wei, Ma Xiaolong, Lin Sheng, et al. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning[C] //Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 907−922

    [81] Qin Haotong, Gong Ruihao, Liu Xianglong, et al. Binary neural networks: A survey [J]. Pattern Recognition, 2020, 105(9): 107281
    [82] 卢冶,龚成,李涛. 深度神经网络压缩自动化的挑战与机遇[J]. 中国计算机学会通讯,2021,17(3):41−47

    Lu Ye, Gong Cheng, Li Tao. Challenges and opportunities of deep neural network compression automation[J]. China Computer Society Communications, 2021, 17(3): 41−47 (in Chinese)

    [83]

    Hubara I, Courbariaux M, Soudry D, et al. Binarized neural networks[C] //Proc of the 30th Int Conf on Neural Information Processing Systems. New York: ACM, 2016: 4114−4122

    [84]

    Li Fengfu, Liu Bin. Ternary weight networks[J]. arXiv preprint, arXiv: 1605.04711, 2016

    [85]

    Alemdar H, Leroy V, Prost-Boucle A, et al. Ternary neural networks for resource-efficient AI applications[C] //Proc of the 30th Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2017: 2547−2554

    [86]

    Chen Yao, Zhang Kang, Gong Cheng, et al. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA[C] //Proc of the 14th IEEE Computer Society Annual Symp on VLSI. Piscataway, NJ: IEEE, 2019: 13−18

    [87]

    Zhou Shuchuang, Wu Yuxin, Ni Zekun, et al. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients[J]. arXiv preprint, arXiv: 1606.06160, 2018

    [88] Wang Peisong, Hu Qinghao, Zhang Yifan, et al. Two-step quantization for low-bit neural networks[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 4376−4384
    [89] Jung Sangli, Son Changyong, Lee Seohyung, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition . Piscataway, NJ: IEEE, 2019: 4345−4354
    [90]

    Gong Cheng, Li Tao, Lu Ye, et al. µL2Q: An ultra-low loss quantization method for DNN compression[C/OL] //Proc of the Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2019 [2022-04-07]. https://doi.org/10.1109/IJCNN.2019.8851699

    [91] 葛道辉,李洪升,张亮,等. 轻量级神经网络架构综述[J]. 软件学报,2020,31(9):2627−2653 doi: 10.13328/j.cnki.jos.005942

    Ge Daohui, Li Hongsheng, Zhang Liang, et al. A review of lightweight neural network architecture[J]. Journal of Software, 2020, 31(9): 2627−2653 (in Chinese) doi: 10.13328/j.cnki.jos.005942

    [92]

    Shi Lei, Feng Shi, Zhu Zhifang. Functional hashing for compressing neural networks[J]. arXiv preprint, arXiv: 1605.06560, 2016

    [93]

    Wu Junru, Wang Yue, Wu Zhenyu, et al. Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions[C] //Proc of the 35th Int Conf on Machine Learning PMLR. New York: ACM, 2018: 5363−5372

    [94]

    Xu Xiaowei, Lu Qing, Wang Tianchen, et al. Efficient hardware implementation of cellular neural networks with incremental quantization and early exit[J]. ACM Journal on Emerging Technologies in Computing Systems, 2018, 14(4): 1−20

    [95]

    Li Yuhong, Hao Cong, Zhang Xiaofan, et al. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions[C/OL] //Proc of the 57th ACM/IEEE Design Automation Conf. New York: ACM, 2020 [2022-04-07]. https://doi.org/10.1109/DAC18072.2020.9218749

    [96]

    Aimar A, Mostafa H, Calabrese E, et al. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3): 644−656 doi: 10.1109/TNNLS.2018.2852335

    [97]

    Sebastian A, Le Gallo M, Khaddam-Aljameh R, et al. Memory devices and applications for in-memory computing[J]. Nature Nanotechnology, 2020, 15(7): 529−544 doi: 10.1038/s41565-020-0655-z

    [98]

    Song Zhuoran, Fu Bangqi, Wu Feiyang, et al. DRQ: Dynamic region-based quantization for deep neural network acceleration[C] //Proc of the 47th ACM/IEEE Annual Int Symp on Computer Architecture. New York: ACM, 2020: 1010−1021

    [99]

    Yang Yixiong, Yuan Zhe, Su Fang, et al. Multi-channel precision-sparsity-adapted Inter-frame differential data Codec for video neural network processor[C] //Proc of the 33rd ACM/IEEE Int Symp on Low Power Electronics and Design. New York: ACM, 2020: 103−108

    [100]

    Tang Yibin, Wang Ying, Li Huawei, et al. MV-Net: Toward real-time deep learning on mobile GPGPU systems[J]. ACM Journal on Emerging Technologies in Computing Systems, 2019, 15(4): 1−25

    [101]

    Chen Shengbo, Shen Cong, Zhang Lanxue, et al. Dynamic aggregation for heterogeneous quantization in federated learning[J]. IEEE Transactions on Wireless Communications, 2021, 20(10): 6804−6819 doi: 10.1109/TWC.2021.3076613

    [102]

    Teerapittayanon S, McDanel B, Kung H T. BranchyNet: Fast inference via early exiting from deep neural networks[C] //Proc of the 23rd Int Conf on Pattern Recognition. Piscataway, NJ: IEEE, 2016: 2464−2469

    [103]

    Lo C, Su YY, Lee CY, et al. A dynamic deep neural network design for efficient workload allocation in edge computing[C] //Proc of the 35th 2017 IEEE Int Conf on Computer Design. Piscataway, NJ: IEEE, 2017: 273−280

    [104]

    Wang Zizhao, Bao Wei, Yuan Dong, et al. SEE: Scheduling early exit for mobile DNN inference during service outage[C] //Proc of the 22nd Int ACM Conf on Modeling, Analysis and Simulation of Wireless and Mobile Systems. New York: ACM, 2019: 279−288

    [105]

    Wang Zizhao, Bao Wei, Yuan Dong, et al. Accelerating on-device DNN inference during service outage through scheduling early exit[J]. Computer Communications, 2020, 162(10): 69−82

    [106]

    Scarpiniti M, Baccarelli E, Momenzadeh A, et al. DeepFogSim: A toolbox for execution and performance evaluation of the inference phase of conditional deep neural networks with early exits atop distributed Fog platforms[J/OL]. Applied Sciences, 2021, 11(1)[2022-03-18]. https://doi.org/10.3390/app11010377

    [107]

    Su Xiao. EasiEI simulator[CP/OL]. [2022-03-18]. https://gitlab.com/Mirrola/ns-3-dev/-/wikis/EasiEI-Simulator

    [108]

    Park E, Kim D, Kim S, et al. Big/little deep neural network for ultra low power inference[C] //Proc of the Int Conf on Hardware/Software Codesign and System Synthesis. Piscataway, NJ: IEEE, 2015: 124−132

    [109]

    Putra T A, Leu J S. Multilevel Neural network for reducing expected inference time[J]. IEEE Access, 2019, 7(11): 174129−174138

    [110]

    Taylor B, Marco V S, Wolff W, et al. Adaptive deep learning model selection on embedded systems[J]. ACM SIGPLAN Notices, 2018, 53(6): 31−43 doi: 10.1145/3299710.3211336

    [111]

    Shu Guansheng, Liu Weiqing, Zheng Xiaojie, et al. IF-CNN: Image-aware inference framework for cnn with the collaboration of mobile devices and cloud[J]. IEEE Access, 2018, 6(10): 68621−68633

    [112]

    Stamoulis D, Chin T W, Prakash A K, et al. Designing adaptive neural networks for energy-constrained image classification[C] //Proc of the Int Conf on Computer-Aided Design. New York: ACM, 2018: 1−8

    [113]

    Song Mingcong, Zhong Kan, Zhang Jiaqi, et al. In-Situ AI: Towards autonomous and incremental deep learning for IoT systems[C] //Proc of the 24th IEEE Int Symp on High Performance Computer Architecture. Piscataway, NJ: IEEE, 2018: 92−103

    [114]

    Zhang Li, Han Shihao, Wei Jianyu, et al. nn-Meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices[C] //Proc of the 19th Annual Int Conf on Mobile Systems, Applications, and Services. New York: ACM, 2021: 81−93

    [115]

    Yue Zhifeng, Zhu Zhixiang, Wang Chuang, et al. Research on big data processing model of edge-cloud collaboration in cyber-physical systems[C] //Proc of the 5th IEEE Int Conf on Big Data Analytics. Piscataway, NJ: IEEE, 2020: 140−144

    [116]

    Wang Huitian, Cai Guangxing, Huang Zhaowu, et al. ADDA: Adaptive distributed DNN inference acceleration in edge computing environment[C] //Proc of the 25th Int Conf on Parallel and Distributed Systems. Piscataway, NJ: IEEE, 2019: 438−445

    [117]

    Chen Liang, Qi Jiapeng, Su Xiao, et al. REMR: A reliability evaluation method for dynamic edge computing network under time constraints[J]. arXiv preprint, arXiv: 2112.01913, 2021

    [118]

    Long Saiqin, Long Weifan, Li Zhetao, et al. A game-based approach for cost-aware task assignment with QoS constraint in collaborative edge and cloud environments[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(7): 1629−1640 doi: 10.1109/TPDS.2020.3041029

    [119]

    Yang Bo, Cao Xuelin, Li Xiangfan, et al. Mobile-edge-computing-based hierarchical machine learning tasks distribution for IIoT[J]. IEEE Internet of Things Journal, 2020, 7(3): 2169−2180 doi: 10.1109/JIOT.2019.2959035

    [120]

    Fang Yihao, Jin Ziyi, Zheng Rong. TeamNet: A collaborative inference framework on the edge[C] //Proc of the 39th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2019: 1487−1496

    [121]

    Fang Yihao, Shalmani SM, Zheng Rong. CacheNet: A model caching framework for deep learning inference on the edge[J]. 2020 (2020-07-03)[2022-03-17]. arXiv preprint, arXiv: 2007.01793, 2020

    [122] 檀超,张静宣,王铁鑫,等. 复杂软件系统的不确定性[J]. 软件学报,2021,32(7):1926−1956 doi: 10.13328/j.cnki.jos.006267

    Tan Chao, Zhang Jingxuan, Wang Tiexin, et al. Uncertainty in complex software systems[J]. Journal of Software, 2021, 32(7): 1926−1956 (in Chinese) doi: 10.13328/j.cnki.jos.006267

    [123] 宋纯贺,曾鹏,于海斌. 工业互联网智能制造边缘计算: 现状与挑战[J]. 中兴通讯技术,2019,25(3):50−57 doi: 10.12142/ZTETJ.201903008

    Song Chunhe, Zeng Peng, Yu Haibin. Industrial Internet intelligent manufacturing edge computing: Current situation and challenges[J]. ZTE Technology, 2019, 25(3): 50−57 (in Chinese) doi: 10.12142/ZTETJ.201903008

    [124]

    Chen Chao, Zhang Daqing, Wang Yasha, et al. Enabling Smart Urban Services with GPS Trajectory Data[M]. Berlin: Springer, 2021

    [125] 黄倩怡,李志洋,谢文涛,等. 智能家居中的边缘计算[J]. 计算机研究与发展,2020,57(9):1800−1809 doi: 10.7544/issn1000-1239.2020.20200253

    Huang Qianyi, Li Zhiyang, Xie Wentao, et al. Edge computing in smart home[J]. Journal of Computer Research and Development, 2020, 57(9): 1800−1809 (in Chinese) doi: 10.7544/issn1000-1239.2020.20200253

    [126]

    Li Xian, Bi Suzhi, Wang Hui. Optimizing resource allocation for joint AI model training and task inference in edge intelligence systems[J]. IEEE Wireless Communications Letters, 2021, 10(3): 532−536 doi: 10.1109/LWC.2020.3036852

    [127]

    Trivedi A, Wang Lin, Bal H, et al. Sharing and caring of data at the edge[C/OL] //Proc of the 3rd USENIX Workshop on Hot Topics in Edge Computing. Berkeley, CA: USENIX Association, 2020 [2022-04-06]. https://www.usenix.org/conference/hotedge20/presentation/trivedi

    [128]

    Richins D, Doshi D, Blackmore M, et al. AI tax: The hidden cost of AI data center applications[J]. ACM Transactions on Computer Systems, 2021, 37(1-4): 1-32

  • 期刊类型引用(1)

    1. 石明丰,甘永根,赵玉珂,刘飞飞,何晓蓉. 智能量测开关与智能物联锁具信息交互设计. 中国新技术新产品. 2024(21): 137-139 . 百度学术

    其他类型引用(1)

图(7)  /  表(2)
计量
  • 文章访问数:  945
  • HTML全文浏览量:  101
  • PDF下载量:  354
  • 被引次数: 2
出版历程
  • 收稿日期:  2021-08-25
  • 修回日期:  2022-04-14
  • 网络出版日期:  2023-02-10
  • 发布日期:  2021-08-25
  • 刊出日期:  2023-01-31

目录

/

返回文章
返回