Loading [MathJax]/jax/output/SVG/jax.js
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向AIoT的协同智能综述

罗宇哲, 李玲, 侯朋朋, 于佳耕, 程丽敏, 张常有, 武延军, 赵琛

罗宇哲, 李玲, 侯朋朋, 于佳耕, 程丽敏, 张常有, 武延军, 赵琛. 面向AIoT的协同智能综述[J]. 计算机研究与发展, 2025, 62(1): 179-206. DOI: 10.7544/issn1000-1239.202330975
引用本文: 罗宇哲, 李玲, 侯朋朋, 于佳耕, 程丽敏, 张常有, 武延军, 赵琛. 面向AIoT的协同智能综述[J]. 计算机研究与发展, 2025, 62(1): 179-206. DOI: 10.7544/issn1000-1239.202330975
Luo Yuzhe, Li Ling, Hou Pengpeng, Yu Jiageng, Cheng Limin, Zhang Changyou, Wu Yanjun, Zhao Chen. Survey of AIoT-Oriented Collaborative Intelligence[J]. Journal of Computer Research and Development, 2025, 62(1): 179-206. DOI: 10.7544/issn1000-1239.202330975
Citation: Luo Yuzhe, Li Ling, Hou Pengpeng, Yu Jiageng, Cheng Limin, Zhang Changyou, Wu Yanjun, Zhao Chen. Survey of AIoT-Oriented Collaborative Intelligence[J]. Journal of Computer Research and Development, 2025, 62(1): 179-206. DOI: 10.7544/issn1000-1239.202330975
罗宇哲, 李玲, 侯朋朋, 于佳耕, 程丽敏, 张常有, 武延军, 赵琛. 面向AIoT的协同智能综述[J]. 计算机研究与发展, 2025, 62(1): 179-206. CSTR: 32373.14.issn1000-1239.202330975
引用本文: 罗宇哲, 李玲, 侯朋朋, 于佳耕, 程丽敏, 张常有, 武延军, 赵琛. 面向AIoT的协同智能综述[J]. 计算机研究与发展, 2025, 62(1): 179-206. CSTR: 32373.14.issn1000-1239.202330975
Luo Yuzhe, Li Ling, Hou Pengpeng, Yu Jiageng, Cheng Limin, Zhang Changyou, Wu Yanjun, Zhao Chen. Survey of AIoT-Oriented Collaborative Intelligence[J]. Journal of Computer Research and Development, 2025, 62(1): 179-206. CSTR: 32373.14.issn1000-1239.202330975
Citation: Luo Yuzhe, Li Ling, Hou Pengpeng, Yu Jiageng, Cheng Limin, Zhang Changyou, Wu Yanjun, Zhao Chen. Survey of AIoT-Oriented Collaborative Intelligence[J]. Journal of Computer Research and Development, 2025, 62(1): 179-206. CSTR: 32373.14.issn1000-1239.202330975

面向AIoT的协同智能综述

基金项目: 广东省重点领域研发计划项目(2019B010154004)
详细信息
    作者简介:

    罗宇哲: 1995 年生. 博士研究生. 主要研究方向为分布式机器学习

    李玲: 1982 年生. 博士,研究员. CCF 高级会员. 主要研究方向为智能计算

    侯朋朋: 1985 年生. 博士,副研究员. 主要研究方向为操作系统

    于佳耕: 1983 年生. 博士,副研究员. 主要研究方向为操作系统、云计算、智能软件

    程丽敏: 1988年生. 博士研究生. 主要研究方向为智能系统软件、机器学习和视频分析

    张常有: 1970年生. 博士,研究员. 主要研究方向为并行与分布式软件、软件工程

    武延军: 1979年生. 博士,研究员. CCF高级会员. 主要研究方向为操作系统、系统安全

    赵琛: 1967年生. 博士,研究员. 主要研究方向为逻辑、自动程序分析和编程语言设计与实现

    通讯作者:

    李玲(liling@iscas.ac.cn

  • 中图分类号: TP391

Survey of AIoT-Oriented Collaborative Intelligence

Funds: This work was supported by the Key-Area Research and Development Program of Guangdong Province(2019B010154004).
More Information
    Author Bio:

    Luo Yuzhe: born in 1995. PhD candidate. His main research interest includes distributed machine learning

    Li Ling: born in 1982. PhD, professor.Senior member of CCF. Her mian research interest includes intelligent computing

    Hou Pengpeng: born in 1985. PhD, associate professor. His main research interest includes operating system

    Yu Jiageng: born in 1983. PhD, associate professor. His main research interests include operating system, cloud computing and intelligent software

    Cheng Limin: born in 1988. PhD candidate. Her main research interests include intelligent system software, machine learning and video analysis

    Zhang Changyou: born in 1970. PhD, professor. His main research interests include parallel distributed software and software engineering

    Wu Yanjun: born in 1979. PhD, professor. Senior member of CCF. His main research interests include operating system and system security

    Zhao Chen: born in 1967. PhD, professor. His main research interests include logic and automatic program analysis, and design and implementation of programming languages

  • 摘要:

    深度学习和物联网的融合发展有力地促进了AIoT生态的繁荣. 一方面AIoT设备为深度学习提供了海量数据资源,另一方面深度学习使得AIoT设备更加智能化. 为保护用户数据隐私和克服单个AIoT设备的资源瓶颈,联邦学习和协同推理成为了深度学习在AIoT应用场景中广泛应用的重要支撑. 联邦学习能在保护隐私的前提下有效利用用户的数据资源来训练深度学习模型,协同推理能借助多个设备的计算资源来提升推理的性能. 引入了面向AIoT的协同智能的基本概念,围绕实现高效、安全的知识传递与算力供给,总结了近十年来联邦学习和协同推理算法以及架构和隐私安全3个方面的相关技术进展,介绍了联邦学习和协同推理在AIoT应用场景中的内在联系. 从设备共用、模型共用、隐私安全机制协同和激励机制协同等方面展望了面向AIoT的协同智能的未来发展.

    Abstract:

    The fusion of deep learning and the Internet of things has significantly promoted the development of the AIoT ecosystem. On the one hand, the huge amounts of multi-modal data collected by AIoT devices provide deep learning with abundant training data resources, which plays a more important role in the era of big models. On the other hand, the development of deep learning makes AIoT devices smarter, which shows great potential for promoting social development and the convenience of human life. As major support for the usage of deep learning in AIoT, federated learning effectively makes use of the training data provided by AIoT devices to train deep learning models with data privacy protection while collaborative inference overcomes the obstacles in the deployment of deep learning brought by the limited computation resource of AIoT devices. We induce the concept of AIoT-oriented collaborative intelligence. Aiming at implementing knowledge transmission and computation resource supply with high efficiency and security, we review the related works, published in the past 10 years, about the architecture, algorithm, privacy, and security of federated learning and collaborative inference, and introduce the inner connection of federated learning and collaborative inference. The algorithm part summarizes the federated learning and collaborative inference algorithm related to AIoT use cases and their optimization goals. The architecture part introduces the related works about deep learning accelerators, deep learning compilation, deep learning frameworks, communication among devices, and collaboration among devices from the view of AI computing systems. The privacy and security part introduces the privacy and security threats faced by AIoT-oriented collaborative intelligence and the defense methods against them. We also provide insights into the future development of AIoT-oriented collaborative intelligence in the aspect of equipment sharing, model sharing, collaboration of privacy and security mechanisms, and collaboration of incentive mechanisms.

  • 目前超级计算机已经迈入E级计算时代,但随着摩尔定律接近失效、颠覆性使能技术迟迟不能实现,超级计算机在访存、通信、能耗等方面仍面临严重的发展瓶颈[1]. 而MPI(message passing interface)集合通信[2]是高性能计算中最重要的通信类型,对整机系统的性能具有重要的影响,提升MPI集合通信性能是突破通信墙的重要手段. 传统的MPI集合通信是基于点到点消息实现的[3-4],随着系统规模的不断扩大,这种实现方式面临越来越严重的性能及可扩展性问题,而硬件集合通信具有性能高、可扩展性好等优点,正受到越来越多的关注[5].

    硬件集合通信中,数据沿聚合树进行规约或广播操作,聚合树的高度、负载均衡性等对集合操作的性能具有至关重要的影响. 本文研究了影响硬件集合通信性能的因素,提出了硬件集合通信开销模型,并以此为基础提出了构建硬件集合通信聚合树的方法. 本文主要贡献包括4个方面:

    1) 提出了硬件集合通信开销模型,能够较精确地计算硬件集合通信的时间开销;

    2) 研究了如何根据消息大小确定聚合树的宽度,从而在网络传输开销与数据计算开销之间取得平衡,以降低集合操作的延迟;

    3) 针对Ⅰ型聚合树,提出了最小高度分层k项聚合树构建方法,相比传统的Ⅰ型聚合树,降低了跨组聚合包的个数,减少了网络拥塞;

    4) 针对Ⅱ型聚合树,提出了最小代价Ⅱ型聚合树构建方法,减少了聚合树使用的交换机数量.

    近年来,网络计算技术发展迅速[6-7]. 网络计算可将数据规约计算、协议处理、数据加解密等原本由处理器承担的计算任务卸载到网络设备中进行,解放了CPU,使CPU可以专注于其他任务的处理. 在高性能计算领域,可利用网络计算提升MPI集合通信性能. 本文将基于网络计算实现的MPI集合通信称为硬件集合通信. MPI集合通信中,Barrier,Bcast,Reduce,Allreduce是应用最广泛的操作,它们均是1对多或多对1的通信模式,适于以硬件集合通信方式实现. 图1显示了Allreduce的实现过程,它利用聚合树完成数据聚合及结果分发操作,该树中叶节点对应通信进程,根节点及中间节点对应交换机. 进行集合通信时,数据先从叶节点向根节点聚合,并在聚合的过程中计算规约结果;到达根节点后再沿反方向将规约结果广播给所有叶节点. 其他集合操作的实现方式与此类似.

    图  1  基于网络计算的Allreduce操作实现过程
    Figure  1.  Implementation phases of Allreduce operation based on in-network-computing

    目前,很多公司推出了面向高性能计算的硬件集合通信技术,如IBM BlueG/Q中的硬件集合通信原语[8-11]、Mellanox的SHARP技术[5, 12-14]、神威互连网络中的硬件集合通信[15]等. 相较于点到点消息实现的集合通信,硬件集合通信的性能有了显著提升. Zimmer等人[16]在Summit系统中利用基准程序对SHARP性能进行了测试,4096个节点规模下集合消息性能至少提升了1倍.

    硬件集合通信除了需要特殊的硬件支持外,还需要网络管理软件的密切配合,以完成聚合树分配、多播路由配置等操作. 其中,聚合树分配是其中最为关键的操作. 现有的硬件集合通信技术在分配聚合树时还存在3个问题:

    1) 缺乏精确的集合通信开销模型,无法更好地指导聚合树构建. 基于α-β模型的软件集合通信开销模型[3]过于简化,而针对特定网络的专用模型又不具备通用性[17],它们均不适用于硬件集合通信.

    2) 构造聚合树时未考虑相互干扰问题. 当系统中存在大量通信域时,需要为每个通信域分配1棵独立的聚合树,以降低各通信域间的相互干扰,而现有的聚合树分配方法未充分考虑到该问题. 例如,SHARP技术中每个作业的所有通信域共享同一棵聚合树[18],这会产生严重的相互干扰. 现有的聚合树构建方法也没有考虑到同一聚合树内不同消息间的相互干扰问题. 例如,传统方法构建出的聚合树会产生较多的跨组通信,使得同一集合操作的不同消息因竞争通信链路而相互干扰.

    3) 未考虑硬件通信资源不够用问题. 网络中用于支持硬件集合通信的资源是有限的,而现有的聚合树构建方法未考虑该问题,导致出现通信资源不够用的情况. 例如,BlueG/Q中每个网络节点最多支持12棵聚合树,在很多情况下,由于聚合树数量不足,无法使用硬件集合通信[11].

    针对上述3个问题,本文研究如何建立硬件集合通信开销模型,以及如何基于该模型高效地构建聚合树.

    定义1. 互连网络. 互连网络定义为有向图I=(N, L),其中N为网络节点集合,L为通信链路集合. 网络节点包括网卡及交换机. 本文假设网卡在I中均是叶节点.

    定义2. 聚合树. 在互连网络I=(N, L)上目标集合MMN)对应的聚合树是一棵树A=(NA, LA),MNAN,其中M是通信域中各进程所在的网卡集合. A中边(μ,υ)的长度定义为Iμυ的距离,也即I中从μυ经过的网络链路数. A不必是I的子树,也即不要求LAL. 另外,M中允许有重复元素,从而使得每个节点上可同时运行多个通信进程. 聚合树还需满足度约束条件,也即每个节点的最大子节点数不能超过硬件阈值.

    定义3. 聚合树高度. 设A=(NA, LA)是互连网络I=(N, L)上目标集合M对应的一棵聚合树,A中从叶节点出发到达根节点所经过的边的最大条数,称为聚合树的高度. 只有1个节点的聚合树高度为0.

    定义4. 聚合树半径. 设A=(NA, LA)是互连网络I=(N, L)上目标集合M对应的一棵聚合树,A中从叶节点出发到根节点所经过边的长度之和的最大值称为聚合树半径. 聚合树半径实际上是从各叶节点出发沿该聚合树依次经过各祖先最终到达根节点时所经过的最大链路条数.

    现有的硬件集合通信技术中,有的仅支持将集合操作卸载到网卡中进行,有的则还可将集合操作卸载到交换机中进行. 根据不同的卸载位置,本文将聚合树分为Ⅰ型聚合树及Ⅱ型聚合树2种类型.

    定义5. Ⅰ型聚合树. 设A=(NA, LA)是互连网络I=(N, L)上目标集合M对应的一棵聚合树,若M = NA,则称A为Ⅰ型聚合树. 该类聚合树中,集合操作均被卸载到网卡中执行,且仅使用进程所在的那些网卡.

    定义6. Ⅱ型聚合树. 设A=(NA, LA)是互连网络I=(N, L)上目标集合M对应的一棵聚合树,T=(NT, LT)是I的一棵子树,MNANTN,且T中叶节点均在M中. 若uNAuA中的父亲p也是uT中的某个祖先,则称A为Ⅱ型聚合树. Ⅱ型聚合树中的集合操作均被卸载到交换机中进行.

    图2显示了Ⅰ型聚合树和Ⅱ型聚合树. 假设有8个进程参与集合通信,每个网卡上1个进程,互连网络的拓扑结构如图2(a)所示;图2(b)是一棵Ⅰ型聚合树,高度为3,半径为12;图2(c)是一棵Ⅱ型聚合树.

    图  2  Ⅰ型聚合树和Ⅱ型聚合树
    Figure  2.  Aggregate trees of type Ⅰ and type Ⅱ

    本文采用式(1)~(4)所示的开销模型,模型中各变量的含义如表1所示. 本文关注于如何降低聚合过程的开销,下面着重对聚合过程开销进行说明.

    表  1  硬件集合通信开销模型中使用的符号
    Table  1.  Symbols Used in the Cost Model of Hardware Supported Collectives
    符号 含义
    Tcoll 集合操作的总时间开销
    Tpost_wr 应用投递1个集合操作描述符的时间
    Tfetch_wr 网卡读取1个集合操作描述符的时间
    Tfetch_data 网卡从内存读取操作数的时间
    Taggregate 聚合过程的时间开销
    Tbcast 广播过程的时间开销
    Twrite_wc 网卡写完成条目及结果到内存的时间
    Tnoise 噪声导致的开销
    S 用户消息大小(不超过2 KB)
    S 聚合包大小
    S 规约数据长度
    β PCIE链路/网络链路带宽
    l 消息经过1条网络链路的时间
    hatree 聚合树高度
    ratree 聚合树半径
    P 进程总数
    k k叉树的宽度或k项树的基
    θ 每个聚合包的匹配处理时间
    γ 单位长度数据的规约计算时间
    δ 任2个进程间单播路由的最大跳步数
    下载: 导出CSV 
    | 显示表格

    Taggregate是聚合过程的时间,其包括聚合包在网络链路上的传输时间,以及在网络节点内的处理时间. 假设聚合树是k叉完全树,聚合树的半径为ratree,则聚合包在网络链路中的传输总时间为ratree×l. 设每个聚合包的匹配处理时间为θ、存储转发时间为Sβ、规约计算时间为S*×γ,则聚合包在网络节点内的处理总时间为hatree×k×(θ+Sβ+S*×γ). 最终得到式(2)所示的聚合时间开销.

    为进一步简化模型,除了假设聚合树是k叉完全树之外,还假设任意2个进程之间单播路由的路径长度相等,均需经过δ条链路,则对Ⅰ型聚合树,hatreelogkPratreeδ×logkP,从而得到式(3)所示的聚合过程开销. 而对Ⅱ型聚合树,hatreelogkPratree0.5δ,从而得到式(4)所示的聚合过程开销.

    Tcoll=Tpost\_wr+Tfetch\_wr+Tfetch\_data+Taggregate+Tbcast+Twrite\_wc+Tnoise, (1)
    Taggregate=ratree×l+hatree×k×(θ+Sβ+S*×γ), (2)
    Taggregate\_IlogkP×[δl+k(θ+Sβ+S*×γ)], (3)
    Taggregate\_II12δl+logkP×k(θ+Sβ+S*×γ). (4)

    本文在新一代神威超级计算机上测试了各类集合消息的延迟,并与开销模型计算出的结果进行对比. 实验环境如图3所示.

    图  3  测试环境使用的2层胖树
    Figure  3.  2-level fat tree used in the test environment

    使用1个超节点进行测试,超节点内共有256个节点,每个节点安装1块消息处理芯片,每块消息处理芯片有2个网络端口,故整个超节点内共有512个网络端口. 这512个网络端口通过一棵2层胖树进行互连. 该胖树共有48台交换机和其中32台叶交换机和16台骨干交换机. 每台交换机的端口数均为40,其中有些端口未被使用. 每台叶交换机有16个端口用于连接消息处理芯片的网络接口,16个端口用于连接骨干交换机.

    分别在不同进程数下测试了各种类型、各种长度集合消息的延迟. 测试方法为:每种类型及长度均测试5000次,每次测试进程均记录集合操作的时间开销,取耗时最长的那个进程的时间开销作为该次集合通信的时间开销,再取这5000个时间开销的中位值作为当前类型、当前长度下集合消息的延迟.

    限于篇幅,本文仅显示了Ⅰ型聚合树的部分测试结果. 不同进程数下,长度为8 B的广播操作的理论计算延迟与实际测试延迟如图4所示;进程数为512时不同长度Bcast消息的理论计算延迟与实际测试延迟如图5所示. 可以看出,Ⅰ型聚合树的理论计算延迟与实际测试延迟最多相差0.5 μs. 这表明本文提出的硬件集合通信开销模型能够较准确地拟合实测延迟.

    图  4  不同进程数下集合操作的实测延迟及理论计算延迟
    Figure  4.  Measured latency and estimated latency of collectives for different count of processes
    图  5  进程数为512时不同消息大小下集合消息的实测延迟及理论计算延迟
    Figure  5.  Measured latency and estimated latency of collectives for different message size when process count is 512

    给定任意P个进程,为它们构造一棵开销最小的Ⅰ型聚合树是非常困难的,因为不但要考虑聚合树宽度、聚合树半径等因素,还要考虑通信与通信重叠、通信与计算重叠、消息间相互干扰等因素的影响,故目前Ⅰ型聚合树均是通过启发式算法进行构造. 本节首先研究如何根据消息类型、消息长度等确定Ⅰ型聚合树的宽度k,然后研究如何快速构造一棵聚合树,使得跨组聚合包的个数尽可能地少,尽量降低消息间的相互干扰.

    聚合过程的时间开销包括聚合包的传输时间以及处理时间. 聚合树的宽度k对聚合过程的时间开销具有重要影响. 直观上看,如果k很小,则聚合树很高,相应地聚合包的传输开销就很大;如果k很大,则每个网卡需要处理的聚合包很多,相应地聚合包的处理开销就很大. 需要研究k对聚合时间开销的影响.

    先看如何选择k才能使式(3)计算出的聚合开销F(k)最小. 记δ×l=αθ+S'β+S×γ=b,定义如式(5)所示的F(k),其导数函数如式(6)所示.

    F(k)=logkP×[δ×l+k×(θ+Sβ+S*×γ)]=logkP×[a+kb], (5)
    F(k)=(lnP)[kb(lnk1)a]kln2k. (6)

    使F(k)取最小值的k记为k,此时的F(k)记为F. 根据式(5)(6),要使F(k)取最小值,需F(k)=0,也即k(lnk1)=ab,而函数f(k)=k(lnk1)是单调递增的. 对不同的集合操作类型及消息长度,ab取值不同,使F(k)取最小值的k也不同. 在本文的实验环境中,对同步、广播操作来说,k = 41时F(k)取最小值.

    很多情况下,选择其他k值时,Fk)与F差别不大. 以同步、广播操作为例,在本文的实验环境中,设P = 512,a = 1.12 μs,b = 0.01 μs,F(k)的曲线如图6所示,可以看出,当24k64时,F(k)相差不大,不超过0.1 μs. 对每种类型、每种长度的集合消息,都可以确定区间[kmin,kmax],使得k[kmin,kmax]时,|F(k)F|ε成立(ε是常数). 本文测试环境中使用的k表2所示(仅显示了部分结果). 可以发现,随着聚合包长度增大,kminkmax均逐渐下降. 对该现象的直观解释是:当聚合包长度较小时,聚合包在网络链路上的传输时间在总聚合时间中占比最大,此时应该采用较大的k,从而使得聚合包在网络链路上的传输尽量并行;而当聚合包长度较大时,聚合包在网卡内的处理时间在总聚合时间中占比最大,此时应采用较小的k,以使更多的网卡参与聚合包的处理.

    图  6  同步、广播操作的F(k)曲线
    Figure  6.  Curve of F(k) for Barrier and Bcast operation
    表  2  测试环境中使用的Ⅰ型聚合树宽度
    Table  2.  Type Ⅰ Aggregate Tree Breadth Used in the Test Environment
    消息类型聚合包长度/Bkminkmax推荐k
    Barrier/Bcast0326432
    Reduce/AllReduce32166332
    Reduce/AllReduce64155332
    Reduce/AllReduce128134016
    Reduce/AllReduce256102816
    Reduce/AllReduce51281916
    Reduce/AllReduce10246128
    Reduce/AllReduce2048588
    下载: 导出CSV 
    | 显示表格

    确定k值后就可以构建聚合树了. 式(3)假设任意2个进程间的单播路由经过的链路数相等,但实际情况要复杂得多. 使用相同的k值构造的2棵聚合树值半径ratree可能不同,导致聚合包的传输开销不同. 带度约束的最小半径聚合树问题是NP难的[19-21],尚未有高效的求解办法. 本文目标不是构造最优聚合树,而是快速构造出一棵满足度约束条件、半径尽可能小、冲突尽可能少的聚合树. 本节首先介绍2种传统的聚合树构建方法.

    定义7. k叉树. k叉树是一棵完全树,除最后一个中间节点外,根节点及其他中间节点都有k个子节点,故k叉树是平衡树,每个中间节点处理的聚合包个数相等.

    k叉树中,按广度优先遍历方式为进程编号,根进程编号为0. 若进程在通信域内的编号为my_rank,则其父节点编号为(my_rank−1)/k,第i个子节点编号为my_rank×k+i(i1).

    由于未考虑网络拓扑结构,在胖树网络[22-23]、蜻蜓网络[24]等具有分组结构的网络中,k叉树会产生大量的组间通信,导致网络拥塞. 16个进程构建聚合树,其网络拓扑如图7 (a)所示,4叉树如图7(b)所示. 粗线表示该聚合包需要经过最顶层的交换机,图7(b)中共有8个聚合包经过最顶层的交换机. 如果最顶层的交换机传输能力不足,则很容易在顶层交换机处形成拥塞. 而图7(c)是另一棵聚合树,其高度、半径均与图7(b)所示聚合树一样,但仅有2个经过顶层交换机的聚合包,因而具有更优的性能.

    图  7  k叉树易产生大量的组间通信
    Figure  7.  k-ary aggregate tree is prone to generate lots of inter-group communication

    定义8. k项树. k项树[25]是一组递归定义的树Tn,其中T0是1个单节点的树;而Tn是由kTn1连接起来构成的树,其中k−1棵Tn1的根节点直接连接到另外那棵Tn1的根节点上. k项树第i层的节点数是(k1)i×Cin,节点总数是kn,高度为n.

    图8分别显示了一棵2项树及一棵4项树. 2项树是特殊的k项树,在软件集合通信中已得到了广泛使用,例如在MPICH中短的Bcast、Reduce均是通过2项树实现的[3];而大于2的k项树目前使用得较少.

    图  8  k项树(k=2, 4)
    Figure  8.  k-nomial tree (k=2, 4)

    k叉树不同,k项树中不同深度的节点拥有不同数量的子节点,其中根节点的子节点数最多,为(k1)×logkP;而随着深度的增加,子节点数越来越少,故k项树是非平衡树.

    由于k叉树是平衡树,k叉树中上层节点要等待下层节点处理完毕之后才开始处理聚合包,故上层节点会存在空等待现象. 而k项树是非平衡树,位于不同层的节点可以并发处理聚合包,不存在空等待现象.

    k项树中,按深度优先遍历方式为进程编号. 进程编号用k进制表示为kn1k2,k1,k0,其中最低位共有i个0(0in),则第i个位置0即可得到其父节点编号,从最低的i位中任选一位置为1,2,,k1,则得到其子节点的编号最多有i×k1)个子节点.

    每个进程调用算法1所示的build_k_nomial_tree函数计算parent_rankchildren_cnt. 与k叉树一样,k项树也会产生很多跨组通信.

    算法1. build_k_nomial_tree.

    输入:本进程编号 my_rank,基数k,进程总数P

    输出:父节点parent_rank,子节点数children_cnt, 子节点列表children[].

    mask=1,children_cnt=0;

    ② while(mask<P

    ③  if(my_rank%(k×mask)) then

    ④   parent_rank=my_rank/(k×mask)× (k×mask)

    ⑤    break;

    ⑥  end if

    ⑦  mask=mask×k

    ⑧ end while

    mask=mask/k

    ⑩ while(mask>0) do

    ⑪  for (r=1;r<kr++)

    ⑫    child=my_rank+mask×r

    ⑬    if (child < P) then

    ⑭     children[children_cnt]=child

    ⑮     children_cnt+=1;

    ⑯    end if

    ⑰   end for

    ⑱   mask=mask/k

    ⑲ end while

    ⑳ 返回parent_rankchildren_cntchildren.

    传统的聚合树构建方法未考虑网络拓扑结构,容易产生较多的组间通信,从而造成网络拥塞. 为解决此问题,本文提出了分层k项树构建方法.

    分层思想是优化集合通信的一种重要策略[26-29],该策略利用了不同的层具有不同数据传输性能的特点,将同一层内的进程组织成组,使尽可能多的通信在组内完成,以降低层间通信、提升集合消息性能.

    在构建Ⅰ型聚合树时,利用分层思想也可以降低聚合过程的时间开销. Zhao等人[25]提出了一种简单方法:先将进程分为不同的组,并为每个组单独构建一棵k项树;然后将各组的首进程组成一个新的组,并创建一棵k项树,从而将所有组连接起来形成一棵更大的k项树. 进行聚合操作时,同一个组内的进程会先聚合到本组的首进程,首进程再聚合到更高层的首进程,从而大大减少组间通信. 但该方法产生的k项树可能具有较大的高度. 图9举了一个例子:通信域内共有14个进程,依图案分属4个不同的组. 当k = 2时,构建的聚合树如图9(a)所示,其高度为4. 实际上,可以使某个组的首进程连接到另一个组的非首进程下,从而使得聚合树更加平衡,以降低聚合树的高度. 图9(b)即是一棵更优的聚合树,其高度为3. 本文的目标是在最小化组间通信的同时使聚合树高度最低.

    图  9  具有分层结构的聚合树
    Figure  9.  Aggregate tree with hierarchical structure

    本文将具有相同通信特性的一组进程称为一个进程组. 例如,位于同一节点内的多个进程,以及位于同一交换机内的多个进程等,均可构成一个进程组. 进程组可以嵌套,从而使得一个进程组可包含多个子进程组,最终构成1个层次结构.

    进程组由如图10所示的数据结构定义. 每个进程组都对应一棵k项树,按深度优先方式为k项树内的节点编号. 进程被映射到k项树的某个节点上,树中有些节点可能无映射进程. 该数据结构中,cntk项树的节点数,它必须是k的幂;ranks_idk项树内各节点对应的进程号,nodes_id是各进程对应的k项树节点号;sub_groups定义了属于该进程组的子进程组,共有sub_grp_cnt个子进程组.

    图  10  进程组的数据结构
    Figure  10.  The data structure of the process group

    下面探讨最小高度分层k项Ⅰ型聚合树问题. 给定m个互不相交的进程组g0,g1,,gm1g是满足2个条件的进程组:1)g包含g0,g1,,gm1的所有进程;2)g对应的k项树中,除首进程外其余进程的父节点均位于gi内;3)g对应的k项树中,每个节点的儿子数不超过硬件限制. 该问题为从中找出具有最小高度的进程组g,从而为进程组g0,g1,,gm1构建一棵高度最小的分层k项Ⅰ型聚合树. 进程组gi(0i<m)内嵌套的子进程组也需满足条件2.

    g0i是进程组gi首进程的进程号,根据k项树的性质,条件2等价于g.nodes[g0i]%(gi.cnt)=0且对gi内每个进程gjig.nodes[gji]/(gi.cnt)=g.nodes[g0i]/(gi.cnt),也即进程g0i被映射到某个编号为gi.cnt倍数的聚合树节点内,其余进程依次映射到后续节点内.

    给定m个进程组g0,g1,,gm1,且每个gi均已构建好各自的k项树之后,即完成了ranks的设置,采用算法2为这些进程组构建一棵更大的k项聚合树. 算法2采用贪心策略,每次找出2个最大的进程组g0g1,然后在g0的空闲子树内寻找能够容纳g1的位置. 如果找不到这样的位置,则将g0的聚合树高度加1,即cnt变为原来的k倍,从而使g0内有足够的空余位置容纳g1. 该过程持续下去,直到最终只剩1个进程组为止. 算法2行⑤代码用于为g[i]寻找可用的位置,仅需检查分块的第1个位置是否可用即可判断整个分块是否可用. 行⑫~⑭代码用于将g[i]ranks映射表拷贝到新的聚合树中.

    算法2. merge_groups.

    输入:子进程组列表g[]及列表长度m,基数k,父 进程组parent

    输出:无.

    ① 将g[0],g[1],,g[m1]按照cnt字段由大到 小的顺序排列;

    parent.cnt=g[0].cnt

    ③ for(i=0;i<m;++i

    ④  for(j=0;j<parent.cntj+=g[i].cnt

    ⑤   if(parent.ranks[j]==1)then

    ⑥    break;

    ⑦   end if

    ⑧  end for

    ⑨  if(jparent.cnt) then

    ⑩   parent.cnt=pareng×k

    ⑪  end if

    ⑫  for(x=0;x<g[i].cnt;++x

    ⑬   parent.ranks[j+x]g[i].ranks[x]

    ⑭  end for

    ⑮ end for

    采用数学归纳法可证明算法2所产生的k项树是高度最小的分层k项Ⅰ型聚合树,限于篇幅,不再赘述证明过程.

    算法3以递归方式为每个层次建立进程到聚合树节点的映射关系. 算法4根据拓扑信息建立分层的k项树. build_hierarchical_tree建立的聚合树中,每个组的进程均位于同一子树内,从而保证除该组的首进程外其余进程均向本组内的进程发送聚合包,故跨组聚合包的个数相比传统聚合树构建方法减少了. 在使用算法4时,需要选择合适的k,以保证聚合树内每个节点的儿子数不超过硬件限制.

    算法3. build_rank_mapping.

    输入:进程组g

    输出:无.

    ① if(g.sub_grp_cnt==0

    ②  return;

    ③ end if

    ④ for(i=0;i<g.sub_grp_cnt;++i

    ⑤  build_rank_mapping.sub_groups[i]);

    ⑥ end for

    merge_groupsg.sub_groups, g.sub_grp_cnt, k, g).

    算法4. build_hierarchical_tree.

    输入:本进程编号my_rank,基数k,进程总数P

    输出:父节点parent_rank,子节点数children_cnt.

    ① 根据网络拓扑信息构建进程组信息,其中 底层的组其sub_grp_cnt = 0,ranks填充为该 组内各进程的rank号,其他层的组需填充 sub_groups,最高层的组记为g

    build_rank_mappingg)

    ③ 遍历g.ranks找到my_rank在该数组内的下标 node_id

    ④调用build_k_nomial_treenode_id, k, g.cnt)获得 parent_node_id、子节点列表children及列 表长度cnt

    parent_rank=g.ranks[parent_node_id]

    children_cnt=0;

    ⑦ for (i=0;i<cnti++)

    ⑧  if (g.ranks[children[i]]1) then

    ⑨   children_cnt+=1;

    ⑩  end if

    ⑪ end for

    ⑫ 返回parent_rankchildren_cnt.

    本节研究构造Ⅱ型聚合树的方法. 构造Ⅱ型聚合树时需要综合考虑网络拓扑、进程位置等因素. 另外,在每个交换机内,每棵Ⅱ型聚合树都占用1个聚合树条目,而硬件支持的聚合树条目是有限的,故创建Ⅱ型聚合树时,还需要考虑是否有可用的聚合树条目.

    构造Ⅱ型聚合树时要考虑3个目标:1)最小化聚合树的半径,以降低集合操作延迟;2)尽量让不同的聚合树分布到不同链路上,以降低聚合树间的相互干扰;3)使用尽可能少的聚合树条目,以支持创建更多的Ⅱ型聚合树.

    本文分2个步骤构造Ⅱ型聚合树. 1)选择1个交换机作为树根,并构造1棵生成树(称为物理树);2)对物理树进行裁剪,去掉不需要的交换机.

    定义9. 物理树. 设M是某通信域内进程所在的网卡集合,T=(NT, LT)是互连网络I=(N, L)的一棵子树,若T满足2个条件,则称T是关于M的物理树:

    1)MNT

    2)T的叶节点除M中网卡之外,不包含任何其他网卡及交换机.

    定义10. 由物理树推导出的Ⅱ型聚合树. 设K是交换机节点的度约束函数,给定互连网络I=(N, L)上一棵关于M的物理树T=(NT, LT),由物理树T导出的Ⅱ型聚合树是满足2个条件的树A=(NA, LA):

    1)MNANT

    2)若uNAuA中的父亲p也是uT中的某个祖先;

    3)A中每个交换机sw的子节点数不大于度约束Ksw),其中Ksw)大于swT中的子节点数.

    实际上,由物理树T导出的Ⅱ型聚合树A可看成是将T中某些节点提升位置而得到的(仅能提升为其祖先的儿子).

    定义11. Ⅱ型聚合树的代价. 设A=(NA, LA)是互连网络I=(N, L)上目标集合M对应的一棵Ⅱ型聚合树,A中交换机的个数称为该Ⅱ型聚合树的代价,记为costA).

    树根的位置决定了聚合树的半径,应该选择到通信域内各进程最大跳步数最小的交换机作为树根.

    本文采用集中式方法构建物理树. 在该方法中,由集中式的网络管理软件选择树根并构建物理树. 网络管理软件维护了网络的状态信息,包括交换机内空闲的聚合树条目数、网络链路的负载情况等. 网络管理软件可根据这些信息选择一个交换机作为树根.

    该方法具体过程为:先计算出每个交换机到通信域组内各进程的最小跳步数,据此找到所有的备选树根. 然后根据备选树根的负载情况对备选树根进行排序,优先使用负载低的树根. 选择好一个树根后,就可以构造一棵物理树,使得每个进程到树根的跳步数最少. 然后检查该物理树的每个交换机是否有空闲的聚合树条目可用. 若每个交换机都有空闲的聚合树条目,则返回该物理树;若失败,则尝试下一个树根.

    由物理树导出的最小代价Ⅱ型聚合树问题. 给定互连网络I=(N, L)上一棵关于M的物理树T=(NT, LT)及度约束函数K,在由物理树T导出的Ⅱ型聚合树中求代价最小的聚合树A.

    本文利用算法5所示的函数创建最小代价Ⅱ型聚合树. 可以证明算法5产生的聚合树A是由物理树T导出的最小代价Ⅱ型聚合树. 限于篇幅,证明不再赘述.

    算法5. build_minimum_type_Ⅱ_tree.

    输入:物理树T=(NT, LT),度约束函数K

    输出:聚合树A=(NA, LA).

    ①记T的根节点为r,若r不是交换机,则返回仅 包含节点r的聚合树;

    ②否则,对r的每个子节点si,记以si为根的子 树为Ti,调用build_minimum_type_Ⅱ_treeTi,K)产生由物理树Ti导出的最小代价Ⅱ 型聚合树Ai

    ③若r仅有1个子节点T1,则返回该子节点对应 的最小代价Ⅱ型聚合树A1

    ④否则,构造一棵树A=(NA, LA),r为根节点,各 Ai为子树;

    ⑤若Ar的子节点数小于r的度约束K(r),则

    ⑥  从所有Ai中找到一个最小的叶交换机L(叶 交换机是指所有子节点均是网卡的交换 机,最小是指子节点数最小);

    ⑦  从L中任选一进程p,使p成为r的儿子;

    ⑧  若L删掉p后仅剩1个儿子q,则将q设为 q祖父的儿子并删掉L

    ⑨  重复过程⑥~⑧,直到r的子节点数等于 K(r)为止;

    ⑩返回A.

    图11(a)显示了一棵物理树,该树中共有16个进程,使用了13个交换机,每个交换机的度约束均为5;其导出的最小代价Ⅱ型聚合树如图11(b)所示,仅使用了4个交换机.

    图  11  一棵物理树及其导出的最小代价Ⅱ型聚合树
    Figure  11.  A physical tree and its reduced minimum cost type Ⅱ aggregate tree

    在性能方面,Ⅰ型聚合树与Ⅱ型聚合树各有优势. 对长度较小的集合操作,聚合包的传输时间占主导,而Ⅱ型聚合树中聚合包传输路径较短,故延迟比Ⅰ型聚合树低. 对长度较大的集合操作,规约计算时间占主导,Ⅰ型聚合树可以将规约计算分布到大量网卡中并发进行,故延迟较小. 创建通信域时,需要综合考虑各方面因素,选择使用Ⅰ型聚合树还是Ⅱ型聚合树.

    为简化分析,假设Ⅱ型聚合树是标准的d叉树,记δ×l=αθ+Sβ+S×γ=b,则Ⅰ型、Ⅱ型聚合树的时间开销分别如式(7)(8)所示. 为简化讨论,仅考虑S=S=S的情况.

    Taggregate\_I(a+kb)logkP, (7)
    Taggregate\_IIa2+dblogdP. (8)

    定义如式(9)所示的函数F(d,k,P,S)

    F(d,k,P,S)=Taggregate\_ITaggregate\_II=(logkP0.5)a+(klogkPdlogdP)b, (9)
    FP=1P(a+kblnkdblnd), (10)
     FS=(1β+γ)(klogkPdlogdP). (11)

    可以根据式(9)~(11)分析F(d,k,P,S)的变化趋势,并据此构造一张静态映射表. 给定集合操作类型及消息长度后,通过查表即可快速确定聚合树类型及宽度. 表3列出了本文实验环境中如何根据聚合包长度来选择使用Ⅰ型聚合树还是Ⅱ型聚合树. 当申请不到创建Ⅱ型聚合树所需的通信资源时,需要回退到Ⅰ型聚合树.

    表  3  聚合树选择方法
    Table  3.  Method to Select Aggregate Tree
    d 聚合包长度/B 聚合树类型 k
    16 (0, 2 048] Ⅱ 型 16
    64 [0, 323] Ⅱ 型 64
    (323, 512] Ⅰ 型 16
    (512, 2 048] Ⅰ 型 8
    不能创建Ⅱ
    型聚合树时
    [0, 64] Ⅰ 型 32
    (64, 512] Ⅰ 型 16
    (512, 2 048] Ⅰ 型 8
    下载: 导出CSV 
    | 显示表格

    本节将在新一代神威超级计算机中对聚合树性能进行测试.

    首先测试不同的Ⅰ型聚合树集合消息的延迟. 对同步、广播、8B长度的规约操作而言,选择k = 32可以获得更好的性能,故分别构造了32叉树、32项树、分层32项树. 使用1个超节点进行测试,每个节点运行2个进程,每个进程使用1个网络端口,这2个端口间的通信不需要经过网络链路. 分别在2种场景下进行测试.

    在2层标准胖树中进行测试. 分层32项树使用了3个层次,首先将位于同一节点的2个进程组成1个组,这2个进程间的通信不需要经过网络链路;然后将位于同一交换机内的16个进程组成1个组,该组内的进程通信时仅需经过2条链路;最后将位于同一超节点内的所有进程组成1个组,该组内的进程通信时需经过4条链路.

    图12显示了Bcast-8B,Bcast-2KB这2种集合操作的测试结果. 可以看出:1)大部分情况下32叉树、32项树几乎具有相同的延迟;2)大部分情况下,分层32项树的延迟低32叉树、32项树约0.6 μs. 分层32项树延迟低的原因是:它先把位于同一节点内的2个进程组成1个组,而这2个进程间通信不需要经过网络链路,故延迟较低. 需要指出的是,进程数较少时分层32项树的延迟反而高于另外2种聚合树,原因是此时的分层32项树的高度大于32叉树、32项树;随着进程数的增多,这3种聚合树将具有相同的树高.

    图  12  标准胖树拓扑中不同进程数下3种聚合树的消息延迟
    Figure  12.  Messages latency of three types of aggregate trees for different count of processes in standard fat-tree topology

    不同消息长度时3种构造方法下的聚合树的性能如图13所示. 测试时使用了512个进程,每个进程使用1个网络端口. 测试结果与图12的测试结果一致,即32叉树和32项树的性能相当,而分层32项树的延迟明显低于32叉树和32项树.

    图  13  标准胖树拓扑中不同消息长度下3种聚合树的消息延迟
    Figure  13.  Messages latency of three types of aggregate trees for different message sizes in standard fat-tree topology

    使用如图14所示的裁剪胖树进行测试. 该拓扑中,同一交换机内的不同网络端口向其他交换机内的网络端口发消息时会共享一条通信链路. 测试时还在网络中加入了网络噪声,该噪声消耗约80%的网络带宽.

    图  14  测试使用的2层裁剪胖树示意图
    Figure  14.  Illustration of two-layer pruned fat-tree used in the test

    分别测试不同进程数时3种方法构建的聚合树的性能,并将之与标准胖树下的测试结果进行了对比,结果如图15所示. 可以看出,加入网络噪声后,3种构建方法的集合消息延迟相比标准胖树下有显著增大. 其中32叉树增幅最大,分层32项树增幅最小.

    图  15  裁剪胖树拓扑下有干扰时集合通信消息延迟
    Figure  15.  Collective messages latency in pruned fat-tree topology with inference

    图16显示了进程数为512时,3种方法构建的聚合树中跨组聚合包的数量,其中每条实线上的数字表示该实线所连接的2个交换机间的聚合包个数. 在32叉树中,大量聚合包发送到1号交换机的不同进程内,故每个集合操作中,0号交换机与1号交换机间链路上需传输496个聚合包,从而产生严重的链路竞争.

    图  16  裁剪胖树下3种聚合树构建方法的跨组聚合包
    Figure  16.  Inter-group aggregation packets of three types of aggregate trees constructed methods in pruned fat-tree

    表4显示了不同进程数时3种聚合树下Bcast-8 B 的延迟对比,可以看出,在存在噪声干扰的情况下,分层32项树的延迟相比传统的聚合树构建方法下降了24%~89%. 这表明,相比另外2种聚合树构建方法,分层32项树具有更少的跨组聚合包,因而存在带宽竞争的情况下可以显著降低集合消息的延迟.

    表  4  裁剪胖树拓扑下有干扰时不同进程数下Bcast-8B延迟
    Table  4.  Bcast-8B Latency for Different Count of Processes in Pruned Fat-tree Topology with Inference
    进程数 32叉树下
    延迟/μs
    32项树下
    延迟/μs
    分层32项树
    下延迟/μs
    64 19.92(↓39%) 17.79(↓32%) 12.11
    128 25.11(↓49%) 18.58(↓31%) 12.75
    192 35.75(↓64%) 18.54(↓31%) 12.73
    256 42.87(↓70%) 18.62(↓32%) 12.75
    320 71.81(↓82%) 18.86(↓32%) 12.83
    384 97.33(↓86%) 18.94(↓30%) 13.33
    448 115.82(↓87%) 18.80(↓26%) 13.94
    512 136.60(↓89%) 19.48(↓24%) 14.89
    注:括号内的百分比表示分层32项树相比32叉树、32项树的延迟下降比例;↓表示分层32项树相比32叉树、32项树的延迟下降比例.
    下载: 导出CSV 
    | 显示表格

    需要注意的是,随着进程数的增大,32项树的延迟增加较缓慢,而分层32项树的延迟增加略快(分层32项树相比32项树的延迟下降比例由进程数为64时的32%下降为进程数为512时的24%). 这是因为,在裁剪胖树中,随着进程数的增大,32项树中每条链路上传输的最大聚合包数保持为16,故32项树的延迟变化不大;而分层32项树中每条链路上传输的最大聚合包数由1逐渐增大到31,故分层32项树的延迟会缓慢增加. 但可以肯定的是,随着进程数的进一步增大,分层32项树的延迟仍低于32项树,这是因为32项树中每2个相邻交换机间都需要传输16个聚合包,从而更易受网络噪声的干扰.

    除此之外,本文还在其他场景下测试了3种聚合树构建算法的消息延迟. 例如,利用更多超节点进行了测试,其网络拓扑采用3层裁剪胖树,故分层32项树可采用更多的层次. 此实验的测试结论与上述测试结论一致,即分层32项树延迟最低,32项树次之,32叉树延迟最高. 限于篇幅,不再赘述其详细测试数据.

    本节对Ⅰ型聚合树及Ⅱ型聚合树的性能进行对比. 测试环境如图17所示. 其中Ⅰ型聚合树采用64叉树进行构建,Ⅱ型聚合树的树根建在最高层的那个交换机上. 分别测试了不同进程数下2种类型聚合树的性能,Bcast-8 B及Bcast-2 KB的延迟如图18所示. 可以看出,Ⅱ型聚合树的延迟比Ⅰ型聚合树低1 μs左右.

    图  17  使用网络拓扑对比Ⅰ型聚合树及Ⅱ型聚合树性能
    Figure  17.  Network topology used to compare the performance of type Ⅰ and type Ⅱ aggregate trees
    图  18  Ⅰ型聚合树及Ⅱ型聚合树的性能对比
    Figure  18.  Performance comparison of type Ⅰ and type Ⅱ aggregate trees

    本节测试最小代价Ⅱ型聚合树构造算法的有效性以及不同通信模式下聚合树条目的使用情况.

    MPI集合通信中,一般将进程划分为2维或3维网格通信域,其每个维度中的每行或每列都是一个通信域. 本文测试了5种典型的通信模式,包括2种2维网格结构、2种3维网格结构,以及1种随机模式. 选择的2维网格结构中,256×160通信模式按拓扑结构划分通信域,模拟拓扑感知的通信模式,即第1维有160个通信域,故每个通信域都位于同一个超节点内;而202×202通信模式存在很多跨超节点的通信域,模拟拓扑无感的通信模式. 2种3维网格结构也是按类似原则进行划分的. 随机模式是指进程随机加入一个通信域.

    下面从2个方面证明最小代价Ⅱ型聚合树构造算法的有效性:一方面,最小代价Ⅱ型聚合树占用的聚合树条目数相比传统方法构建的Ⅱ型聚合树有显著下降;另一方面,最小代价Ⅱ型聚合树构建算法创建聚合树时的失败率相比传统的Ⅱ型聚合树也有显著下降.

    1) 聚合树占用的总条目数

    首先测试每种通信模式下聚合树占用的总条目数. 假设每个交换机内的聚合树条目数足够多. 对于每种通信模式,依次为每个通信域构建Ⅱ型聚合树,并统计占用的聚合树条目总数,结果如图19所示. 其中,传统方法构建的Ⅱ型聚合树是指利用集中式方法构建的物理树. 可以看出,所有通信模式下,最小代价Ⅱ型聚合树占用的条目总数明显低于传统方法构建的聚合树所占用的条目数,最少下降了90%. 这表明最小代价Ⅱ型聚合树算法在降低聚合条目使用量方面具有显著效果. 有些通信模式下,最小代价Ⅱ型聚合树的聚合树条目总数等于通信域个数,这是因为在这些通信模式下,每个通信域的最小代价Ⅱ型聚合树都仅使用1个交换机.

    图  19  不同通信模式下Ⅱ型聚合树使用的总聚合树条目数
    Figure  19.  Total count of aggregate tree entries used by type Ⅱ aggregate trees for different kinds of communication patterns

    2)创建聚合树时的失败率

    将每个交换机的聚合树条目数设为16,然后在每种通信模式下依次为每个通信域构建Ⅱ型聚合树,并记录创建失败的次数. 定义创建聚合树时的失败率为:=,结果如图20所示. 可以看出,最小代价Ⅱ型聚合树算法均能为每个通信域成功构建聚合树;而传统的Ⅱ型聚合树构建方法失败率很高,最高可达74.51%.

    图  20  不同通信模式下创建Ⅱ型聚合树的失败率
    Figure  20.  Failure rate of creating type Ⅱ aggregate trees in different communication patterns

    需要指出的是,将交换机的聚合树条目数设为更小的值后,某些通信模式下也存在不能创建最小代价Ⅱ型聚合树的情况,但其失败率仍低于传统的Ⅱ型聚合树构建方法,结果不再赘述.

    综上所述,最小代价Ⅱ型聚合树算法可以显著减少对交换机内聚合树条目资源的占用,从而可以支持创建更多的Ⅱ型聚合树.

    硬件集合通信中,聚合树对集合操作的性能具有重要影响. 构建聚合树时,需要综合考虑聚合树半径、宽度、负载均衡、网络噪声等的影响. 本文提出了硬件集合通信开销模型,并据此提出了3种构建聚合树的方法,包括:1)根据聚合包大小确定Ⅰ型聚合树的宽度;2)最小高度分层Ⅰ型聚合树构建方法;3)最小代价Ⅱ型聚合树构建方法. 然后在神威互连网络中对聚合树构建方法进行了全面测试,在存在网络噪声的情况下,分层Ⅰ型聚合树的消息延迟相比传统构建方法下降了24%~89%;最小代价Ⅱ型聚合树使用的交换机聚合条目数相比传统构建方法下降了约90%.

    下一步,我们将利用真实的应用程序对聚合树构建方法的性能进行更全面的测试. 另外,我们还将研究如何利用硬件集合通信提升长度较大的规约、广播等集合操作的性能,扩大硬件集合通信的适用范围.

    作者贡献声明:陈淑平提出研究思路,设计算法、实验,分析实验数据并撰写论文;尉红梅指导算法及完善实验方案;王飞、李祎负责算法实现、测试数据的整理与分析;何王全、漆锋滨提出研究课题,把握论文创新性,并对论文进行修改.

    根据IoT-Analytics 的报告,近年来AIoT的设备数目和市场规模均保持年均15%以上的增长,详见https://iot-analytics.com/number-connected-iot-devices/和https://iot-analytics.com/iot-market-size/
    当本文分别论述联邦学习和协同推理这2个领域中与AIoT应用场景相关的技术进展时,是从广义的角度来介绍面向AIoT的协同智能;当本文论述这2种技术的联系或涉及两者联合起作用的新的应用形态时,则是从狭义的角度来介绍面向AIoT的协同智能.
    攻击者保持模型收敛精度不受显著影响是为了获得有价值的全局模型参数同时防止被发现.
  • 图  1   各领域之间的相互关系

    Figure  1.   The relationship among the domains

    图  2   不同的通信拓扑结构[68,137]

    Figure  2.   The different communication topology structure[68,137]

    表  1   相关综述简介

    Table  1   A Brief Summary of Related Surveys

    相关综述 AIoT 大模型 联邦学习 协同推理
    定义 架构 异构 多模态 FCL FRL P&S 优化 定义 架构 P&S 优化
    文献[1]
    文献[16]
    文献[20]
    文献[22]
    文献[25]
    文献[43]
    文献[30]
    文献[37]
    本文
    注:隐私安全(privacy and pecurity,P&S);联邦持续学习(federated continual learning,FCL);联邦强化学习(federated reinforcement learning,FRL). ◐ 简略介绍;● 详细介绍.
    下载: 导出CSV

    表  2   联邦学习的算法相关工作总结

    Table  2   Summary of Related Works About the Algorithm of Federated Learning

    联邦学习算法 适用情形 参考文献
    异构环境中
    的联邦学习
    统计异构 不同AIoT端侧设备上样本
    训练样本分布不同
    [6166]
    设备异构 不同AIoT端侧设备进行本地
    模型训练的速度不同
    [6770]
    模型异构 统计异构或设备异构环境 [7175]
    联邦多模态学习 AIoT端侧设备收集的数据
    具有多模态特性
    [56, 77, 7981]
    联邦持续学习 AIoT端侧设备的数据
    分布随时间变化
    [58, 8590]
    联邦强化学习 基于联邦学习为AIoT
    设备训练决策模型
    [59, 82, 91, 9396]
    下载: 导出CSV

    表  3   协同推理的算法相关工作总结

    Table  3   Summary of Related Works About the Algorithm of Collaborative Inference

    主要优化目标 相关工作 模型切分方法 任务调度方法
    性能 DeepThings[49] 卷积层并行 任务窃取
    DeepSlicing[26] 通信量优化、模型并行 同步开销优化
    IONN[97] 执行图生成与最短路径搜索
    OFL[98] 基于层融合的模型切分 动态规划
    PICO[99] 基于结束片的模型切分 动态规划
    EdgeFlow[100] 模型并行 线性规划
    IAO[50]    延迟预测
    Neurosurgeon[107] 基于延迟预测的模型切分
    延迟鲁棒性 DistrEdge[102] 强化学习
    ICE[103] 服务质量感知、执行图最短路径搜索
    MTS[105] 强化学习
    能耗 CoEdge[21] 模型并行 线性规划
    Neurosurgeon[107] 基于能耗估计的模型切分
    AutoScale[101] 强化学习
    下载: 导出CSV

    表  4   面向AIoT的协同智能架构各层次相关工作总结

    Table  4   Summary of Related Works at Different Levels of AIoT-Oriented Collaborative Intelligence Achitecture

    架构层级 分类 优势 劣势 参考文献
    联邦学习 协同推理
    深度学习
    加速器
    GPU 高性能、软件栈成熟、
    兼顾通用计算任务
    面积大、能耗高 [27, 47] [41, 97, 101]
    深度学习处理器 面积较小、能效比高 任务类型相对单一 [114115] [111, 116]
    深度学习
    编译
    即时编译 可以获取运行时信息[109] 增加启动开销[171] [114115]
    预编译 更大的静态搜索空间、支持交叉编译等[109] 无法获取运行时信息 [127]* [116, 130]
    深度学习
    框架

    AIoT联邦
    学习框架
    FedML 基于MPI和MQTT的分布式通信协议支
    持、支持多种通信拓扑结构、对真实
    AIoT设备的实验平台支持
    没有对推理任务提供
    专门支持和优化
    [73, 137]
    Flower 支持大量异构端侧设备和
    各种通信状况的模拟
    [138]
    轻量级
    端侧框架
    TensorFlow Lite 支持嵌入式设备的轻量级运行时 一般只用于端侧设备 [136] [21, 111]
    MNN 基于半自动搜索的最佳执行策略搜索 [141]
    端边云
    通用框架
    PyTorch 编程风格简洁、多进程并行计算和通信优化 嵌入式设备等资源
    受限设备难以支持
    [72, 77, 134] [99100, 103]
    TensorFlow 良好的可扩展性、调度策略优化 [85] [50]
    TensorRT 高性能、高吞吐量推理 没有对训练任务提供支持 [139]
    设备间
    通信
    通信拓扑
    结构[30, 137]
    中心化 结构简单、易于管理 中心节点通信瓶颈,可能依赖
    第三方提供的计算服务
    [44] [97]
    层次化 缓解中心节点通信瓶颈 增加额外通信层级,可能依赖
    第三方提供的计算服务
    [143] [144]
    去中心化 P2P直接通信、系统自治、拜占庭容错 共识开销,系统管理复杂 [147] [49]
    混合式 可以兼具多种通信拓扑结构的优点 结构和系统管理较为复杂 [68]
    减少通信的次数 降低通信开销 一般只用于联邦学习场景 [143, 148, 150151, 172]
    减少每次通信
    的数据量
    可能降低模型精度 [152157] [144, 158]
    通信干涉管理 减少通信干涉的负面影响 对Wi-Fi 6等新通信网络
    需要进一步研究
    [159161]
    多设备
    协同[16]
    端云协同 云服务器计算、存储资源充足,
    有利于数据的长期存储
    云服务器带宽受限、广域网
    不稳定、隐私安全问题
    [166167] [41, 101]
    端边协同 降低通信延迟 边缘服务器计算和存储资源
    受限、隐私安全问题
    [58] [97]
    端边云协同 减轻云服务器计算和通信负担 隐私安全问题 [143] [144]
    本地协同 高速和稳定的数据传输、隐私安全保障 只适用于封闭场景,
    不适用于开放场景
    [21, 26, 49, 111]
    大小模型协同 既可以使用大模型中包含的丰富
    知识来提供高质量的服务,
    又可以使用小模型来提升服务的响应速度
    大小模型之间的知识
    传递需要进一步研究
    [72, 170] [141]
    注:“*”表示潜在解决方案.
    下载: 导出CSV

    表  5   面向AIoT的协同智能面临的攻击和对应防御方法总结

    Table  5   Summary of the Attacks That the AIoT-Oriented Collaborative Intelligence Faces and the Corresponding Defense Methods

    攻击类型 攻击面 攻击场景 参考文献 防御机制
    数据隐私 训练样本相关 模型反演攻击 模型参数、模型输出 联邦学习、
    协同推理
    [52, 119, 174175] 混淆[186-188]
    同态加密[119, 192, 195]
    多方安全计算[51, 197-198, 230-231]
    可信执行环境[199, 201-203]
    成员推断攻击 [53, 176177]
    性质推断攻击 模型参数 联邦学习 [178179]
    模型参数相关 模型提取攻击 模型输出 联邦学习、
    协同推理
    [52, 173174, 181] 异常检测[205]、改变输出[208-210]
    free-rider攻击 模型参数 联邦学习 [183184] 异常检测[206-207]、区块链[68, 147]
    模型安全 投毒攻击 训练数据、模型参数
    联邦学习、
    协同推理
    [214215] 异常检测[120, 226]
    逃逸攻击 模型输出、模型参数 [213, 217219] 异常检测[227-228]、对抗学习[224]、混淆[229]
    下载: 导出CSV
  • [1]

    Chang Zhuoqing, Liu Shubo, Xiong Xingxing, et al. A survey of recent advances in edge-computing-powered artificial intelligence of things[J]. IEEE Internet of Things Journal, 2021, 8(18): 13849−13875 doi: 10.1109/JIOT.2021.3088875

    [2]

    Wang Wenbo, Zhang Yingfeng, Gu Jinan, et al. A proactive manufacturing resources assignment method based on production performance prediction for the smart factory [J]. IEEE Transactions on Industrial Informatics, 18(1): 46−55

    [3]

    Yu Liang, Xie Weiwei, Xie Di, et al. Deep reinforcement learning for smart home energy management[J]. IEEE Internet of Things Journal, 2020, 7(4): 2751−2762 doi: 10.1109/JIOT.2019.2957289

    [4]

    Shaikh F K, Karim S, Zeadally S, et al. Recent trends in Internet-of-things-enabled sensor technologies for smart agriculture[J]. IEEE Internet of Things Journal, 2022, 9(23): 23583−23598 doi: 10.1109/JIOT.2022.3210154

    [5]

    Zhao Jianxin, Chang Xinyu, Feng Yanhao, et al. Participant selection for federated learning with heterogeneous data in intelligent transport system[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 24(1): 1106−1115

    [6]

    Analitics IoT. IoT 2020 in review: The 10 most relevant IoT developments of the year [EB/OL]. (2021-01-12)[2024-07-16]. https://iot-analytics.com/iot-2020-in-review/

    [7]

    Analitics IoT. IoT 2021 in review: The 10 most relevant IoT developments of the year [EB/OL]. (2022-01-11)[2024-07-16]. https://iot-analytics.com/iot-2021-in-review/

    [8] 张玉清,周威,彭安妮. 物联网安全综述[J]. 计算机研究与发展,2017,54(10):2130−2143 doi: 10.7544/issn1000-1239.2017.20170470

    Zhang Yuqing, Zhou Wei, Peng Anni. Survey of Internet of things security[J]. Journal of Computer Research and Development, 2017, 54(10): 2130−2143(in Chinese) doi: 10.7544/issn1000-1239.2017.20170470

    [9]

    Dong Yudi, Yao Yudong. Secure mmwave-radar-based speaker verification for IoT smart home[J]. IEEE Internet of Things Journal, 2021, 8(5): 3500−3511 doi: 10.1109/JIOT.2020.3023101

    [10]

    Liu Yangyang, Chang Shuo, Wei Zhiqing, et al. Fusing mmwave radar with camera for 3-D detection in autonomous driving[J]. IEEE Internet of Things Journal, 2022, 9(20): 20408−20421 doi: 10.1109/JIOT.2022.3175375

    [11]

    Zhang Chaoyun, Patras P, Haddadi H, et al. Deep learning in mobile and wireless networking: A survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21((3): ): 2224−2287

    [12]

    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]// Proc of the 2016 IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778

    [13]

    Amodei D, Ananthanarayanan S, Anubhai R, et al. Deep speech 2: End-to-end speech recognition in English and Mandarin[C]// Proc of the 33rd Int Conf on Machine Learning. New York: ACM, 2016: 173–182

    [14]

    Otter D W, Medina J R, Kalita J K. A survey of the usages of deep learning for natural language processing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(2): 604−624 doi: 10.1109/TNNLS.2020.2979670

    [15]

    Hasselt H V, Guez A, Silver D. Deep reinforcement learning with double Q-learning[C]// Proc of the 30th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2016: 2094−100

    [16]

    Ren Weiqing, Qu Yuben, Dong Chao, et al. A survey on collaborative DNN inference for edge intelligence[J]. Machine Intelligence Research, 2023, 20(3): 370−395 doi: 10.1007/s11633-022-1391-7

    [17]

    EU. Regulation (EU) 2016/679 of the European parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) [EB/OL]. (2018-05-25) [2024-07-16]. https://gdpr-info.eu/

    [18]

    Li Mu, Andersen D G, Park J W, et al. Scaling distributed machine learning with the parameter server[C]// Proc of the 11th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2014: 583–598

    [19]

    Teerapittayanon S, Mcdanel B, Kung H T. Distributed deep neural networks over the cloud, the edge and end devices[C]// Proc of the 37th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2017: 328−339

    [20]

    Lim W Y B, Luong N C, Hoang D T, et al. Federated learning in mobile edge networks: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2019, 22: 2031−2063

    [21]

    Zeng Liekang, Chen Xu, Zhou Zhi, et al. CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices[J]. IEEE/ACM Transactions on Networking, 2021, 29(2): 595−608 doi: 10.1109/TNET.2020.3042320

    [22]

    Yang Qiang, Liu Yang, Chen Tianjian, et al. Federated machine learning: Concept and applications [J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): Article 12

    [23] 朱泓睿,元国军,姚成吉,等. 分布式深度学习训练网络综述[J]. 计算机研究与发展,2021,58(1):98−115 doi: 10.7544/issn1000-1239.2021.20190881

    Zhu Hongrui, Yuan Guojun, Yao Chengji, et al. Survey on network of distributed deep learning training[J]. Journal of Computer Research and Development, 2021, 58(1): 98−115 (in Chinese) doi: 10.7544/issn1000-1239.2021.20190881

    [24]

    Nguyen D C, Ding Ming, Pathirana P N, et al. Federated learning for internet of things: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2021, 23(3): 1622−1658

    [25]

    Khan L U, Saad W, Han Zhu, et al. Federated learning for Internet of things: Recent advances, taxonomy, and open challenges[J]. IEEE Communications Surveys & Tutorials, 2021, 23(3): 1759−1799

    [26]

    Zhang Shuai, Zhang Sheng, Qian Zhuzhong, et al. DeepSlicing: Collaborative and adaptive CNN inference with low latency[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(9): 2175−2187 doi: 10.1109/TPDS.2021.3058532

    [27]

    Mao Yunlong, Hong Wenbo, Wang Heng, et al. Privacy-preserving computation offloading for parallel deep neural networks training[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(7): 1777−1788

    [28]

    Bommasani R, Hudson D, Adeli E, et al. On the opportunities and risks of foundation models [J]. arXiv preprint, arXiv: 2108.07258, 2021

    [29]

    Cao Yihan, Li Siyu, Liu Yixin, et al. A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT [J]. arXiv preprint, arXiv: 2303.04226, 2023

    [30]

    Zhou Zhi, Chen Xu, Li En, et al. Edge intelligence: Paving the last mile of artificial intelligence with edge computing[J]. Proceedings of the IEEE, 2019, 107: 1738−1762 doi: 10.1109/JPROC.2019.2918951

    [31] 陈云霁,李玲,李威,等. 智能计算系统[M]. 北京:机械工业出版社,2020

    Chen Yunji, Li Ling, Li Wei et al. AI Computing System [M] Beijing: China Machine Press, 2020(in Chinese)

    [32]

    Poirot M G, Vepakomma P, Chang Ken, et al. Split learning for collaborative deep learning in healthcare [J]. arXiv preprint, arXiv: 1912.12115, 2019

    [33]

    Zhuang Fuzhen, Qi Zhiyuan, Duan Keyu, et al. A comprehensive survey on transfer learning[J]. Proceedings of the IEEE, 2021, 109(1): 43−76 doi: 10.1109/JPROC.2020.3004555

    [34]

    Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]// Proc of the 34th Int Conf on Machine Learning. New York: ACM, 2017: 1126–1135

    [35]

    Yao Jiangchao, Wang Feng, Jia Kunyang, et al. Device-cloud collaborative learning for recommendation[C]// Proc of the 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining. New York: ACM, 2021: 3865−3874

    [36]

    Chen Zeyuan, Yao Jiangchao, Wang Feng, et al. Mc2-SF: Slow-fast learning for mobile-cloud collaborative recommendation [J]. arXiv preprint, arXiv: 2109.12314, 2021

    [37]

    Yao Jiangchao, Zhang Shengyu, Yao Yang, et al. Edge-cloud polarization and collaboration: A comprehensive survey for AI[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(7): 6866−6886

    [38]

    Zhao Yuxi, Gong Xiaowen, Mao Shiwen. Truthful incentive mechanism for federated learning with crowdsourced data labeling[C]// Proc of the 2023 IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2023: 1−10

    [39]

    Zhang Tuo, Feng Tiantian, Alam S, et al. GPT-FL: Generative pre-trained model-assisted federated learning [J]. arXiv preprint, arXiv: 2306.02210, 2023

    [40] 郭斌,刘思聪,刘琰,等. 智能物联网:概念、体系架构与关键技术[J]. 计算机学报,2023,46(11): 2259−2278

    Guo Bin, Liu Sicong, Liu Yan, et al. AIoT: The concept, architecture and key techniques[J]. Chinese Journal of Computers, 2023, 46(11): 2259−2278 (in Chinese)

    [41]

    Kang Yiping, Hauswald J, Gao Cao, et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge[C] //Proc of the 22nd Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2016: 615−629

    [42]

    Pan Jingyu, Chang C C, Xie Zhiyao, et al. Towards collaborative intelligence: Routability estimation based on decentralized private data[C] //Proc of the 59th ACM/IEEE Design Automation Conf. New York: ACM, 2017: 961−966

    [43] 王睿,齐建鹏,陈亮,等. 面向边缘智能的协同推理综述[J]. 计算机研究与发展,2023,60(2):398−414 doi: 10.7544/issn1000-1239.202110867

    Wang Rui, Qi Jianpeng, Chen Liang, et al. Survey of collaborative inference for edge intelligence[J]. Journal of Computer Research and Development, 2023, 60(2): 398−414 (in Chinese) doi: 10.7544/issn1000-1239.202110867

    [44]

    Mcmahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data[C]// Proc of the 20th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2017: 1273−1282

    [45]

    Kairouz P, Mcmahan H B, Avent B, et al. Advances and open problems in federated learning[J]. Foundation Trends in Machine Learning, 2021, 14(1): 1−210

    [46]

    Hinton G E, Vinyals O, Dean J. Distilling the knowledge in a neural network [J]. arXiv preprint, arXiv: 1503.02531, 2015

    [47]

    Thapa C, Chamikara M A P, Camtepe S, et al. SplitFed: When federated learning meets split learning[C]// Proc of the 36th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2022: 8485−8493

    [48]

    Lu Ying, Luo Lingkun, Huang Di, et al. Knowledge transfer in vision recognition: A survey [J]. ACM Computing Surveys, 2020, 53(2): Article 37

    [49]

    Zhao Zhuoran, Barijough K M, Gerstlauer A. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2348−2359 doi: 10.1109/TCAD.2018.2858384

    [50]

    Tang Xin, Chen Xu, Zeng Liekang, et al. Joint multiuser dnn partitioning and computational resource allocation for collaborative edge intelligence[J]. IEEE Internet of Things Journal, 2021, 8(12): 9511−9522 doi: 10.1109/JIOT.2020.3010258

    [51]

    Huang P H, Tu C H, Chung S M, et al. SecureTVM: A TVM-based compiler framework for selective privacy-preserving neural inference[J]. ACM Transactions on Design Automation of Electronic Systems, 2023, 28(4): 1−28

    [52]

    He Zecheng, Zhang Tianwei, Lee R B. Model inversion attacks against collaborative inference[C]// Proc of the 35th Annual Computer Security Applications Conf. New York: ACM, 2019: 148–162

    [53]

    Chen Hanxiao, Li Hongwei, Dong Guishan, et al. Practical membership inference attack against collaborative inference in industrial IoT[J]. IEEE Transactions on Industrial Informatics, 2022, 18(1): 477−487 doi: 10.1109/TII.2020.3046648

    [54]

    Ayad A, Renner M, Schmeink A. Improving the communication and computation efficiency of split learning for IoT applications[C/OL]// Proc of the 2021 IEEE Global Communications Conf. Piscataway, NJ: IEEE, 2021[2024-08-17]. https://ieeexplore.ieee.org/document/9685493

    [55]

    Li Tian, Sahu A K, Talwalkar A, et al. Federated learning: Challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50−60 doi: 10.1109/MSP.2020.2975749

    [56]

    Zhao Yuchen, Barnaghi P, Haddadi H. Multimodal federated learning on IoT data[C]// Proc of the 7th IEEE/ACM Int Conf on Internet-of-Things Design and Implementation. Piscataway, NJ: IEEE, 2022: 43−54

    [57]

    Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521−3526 doi: 10.1073/pnas.1611835114

    [58]

    Zhang Zhouyangzi, Guo Bin, Sun Wen, et al. Cross-FCL: Toward a cross-edge federated continual learning framework in mobile edge computing systems[J]. IEEE Transactions on Mobile Computing, 2022, 23(1): 313−326

    [59]

    Zhuo H H, Feng Wenfeng, Lin Yufeng, et al. Federated deep reinforcement learning [J]. arXiv preprint, arXiv: 1901.08277, 2019

    [60]

    Kingma D P, Ba J. Adam: A method for stochastic optimization[C/OL]// Proc of the 3rd Int Conf on Learning Representations. Washington: ICLR, 2015[2024-08-16]. https://www.semanticscholar.org/reader/a6cb366736791bcccc5c8639de5a8f9636bf87e8

    [61]

    Zhang Jianyi, Li Ang, Tang Minxue, et al. Fed-CBS: A heterogeneity-aware client sampling mechanism for federated learning via class-imbalance reduction[C]// Proc of the 40th Int Conf on Machine Learning. New York: ACM, 2023: Article 1734

    [62]

    Duan Moming, Liu Duo, Chen Xianzhang, et al. Self-balancing federated learning with global imbalanced data in mobile systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 32(1): 59−71

    [63]

    Li Tian, Sahu A K, Zaheer M, et al. Federated optimization in heterogeneous networks[C/OL]// Proc of the 3rd Conf on Machine Learning and Systems. Indio, CA: MLSys. org, 2020[2024-08-16]. https://proceedings.mlsys.org/paper_files/paper/2020/hash/1f5fe83998a09396ebe6477d9475ba0c-Abstract.html

    [64]

    Karimireddy S P, Kale S, Mohri M, et al. SCAFFOLD: Stochastic controlled averaging for federated learning[C]// Proc of the 37th Int Conf on Machine Learning. New York: ACM, 2020: 5132−5143

    [65]

    Arivazhagan M G, Aggarwal V, Singh A K, et al. Federated learning with personalization layers [J]. arXiv preprint, arXiv: 1912.00818, 2019

    [66]

    Li Tian, Hu Shengyuan, Beirami A, et al. Ditto: Fair and robust federated learning through personalization[C]// Proc of the 38th Int Conf on Machine Learning. New York: ACM, 2021: 6357−6368

    [67]

    Xie Cong, Koyejo O, Gupta I. Asynchronous federated optimization [J]. arXiv preprint, arXiv: 1903.03934, 2019

    [68]

    Lu Yunlong, Huang Xiaohong, Zhang Ke, et al. Blockchain empowered asynchronous federated learning for secure data sharing in Internet of vehicles[J]. IEEE Transactions on Vehicular Technology, 2020, 69(4): 4298−4311 doi: 10.1109/TVT.2020.2973651

    [69]

    Sun Yuchang, Shao Jiawei, Mao Yuyi, et al. Semi-decentralized federated edge learning with data and device heterogeneity[J]. IEEE Transactions on Network and Service Management, 2023, 20(2): 1487−1501 doi: 10.1109/TNSM.2023.3252818

    [70]

    Zhang Feilong, Liu Xianming, Lin Shiyi, et al. No one idles: Efficient heterogeneous federated learning with parallel edge and server computation[C]// Proc of the 40th Int Conf on Machine Learning. New York: ACM, 2023: 41399−41413

    [71]

    Diao Enmao, Ding Jie, Tarokh V. HeteroFL: Computation and communication efficient federated learning for heterogeneous clients[C] // Proc of the 2021 Int Conf on Learning Representations. Washington: ICLR, 2021: 1−24

    [72]

    Alam S, Liu Luyang, Yan Ming, et al. FedRolex: Model-heterogeneous federated learning with rolling sub-model extraction[C] // Proc of the 36th Annual Conf on Neural Information Processing Systems. Cambridge, MA: MIT, 2022: 29677−29690

    [73]

    He Chaoyang, Annavaram M, Avestimehr S. Group knowledge transfer: Federated learning of large CNNs at the edge[C]// Proc of the 34th Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2020: Article 1180

    [74]

    Itahara S, Nishio T, Koda Y, et al. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data[J]. IEEE Transactions on Mobile Computing, 2023, 22(1): 191−205 doi: 10.1109/TMC.2021.3070013

    [75]

    Lin Tao, Kong Lingjing, Stich S U, et al. Ensemble distillation for robust model fusion in federated learning[C]// Proc of the 34th Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2020: Article 198

    [76]

    Lin Yiming, Gao Yuan, Gong Maoguo, et al. Federated learning on multimodal data: A comprehensive survey[J]. Machine Intelligence Research, 2023, 20(4): 539−553 doi: 10.1007/s11633-022-1398-0

    [77]

    Xiong Baochen, Yang Xiaoshan, Qi Fan, et al. A unified framework for multi-modal federated learning [J]. Neurocomputing, 2022, 480: 110−118

    [78]

    Lu Jiasen, Yang Jianwei, Batra D, et al. Hierarchical question-image co-attention for visual question answering[C]// Proc of the 30th Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2016: 289−297

    [79]

    Liu Fenglin, Wu Xian, Ge Shen, et al. Federated learning for vision-and-language grounding problems[C]// Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 11572−11579

    [80]

    Chen Jiayi, Zhang Aidong. FedMSplit: Correlation-adaptive federated multi-task learning across multimodal split networks[C]// Proc of the 28th ACM SIGKDD Conf on Knowledge Discovery and Data Mining. New York: ACM, 2022: 87–96

    [81]

    Zhang Rongyu, Chi Xiaowei, Liu Guiliang, et al. Unimodal training-multimodal prediction: Cross-modal federated learning with hierarchical aggregation [J]. arXiv preprint, arXiv: 2303.15486, 2023

    [82]

    Liu Boyi, Wang Lujia, Liu Ming. Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems[J]. IEEE Robotics and Automation Letters, 2019, 4(4): 4555−4562 doi: 10.1109/LRA.2019.2931179

    [83]

    Jiang Ziyue, Ren Yi, Lei Ming, et al. FedSpeech: Federated text-to-speech with continual learning[C]// Proc of the 30th Int Joint Conf on Artifical Intelligence. Berlin: Springer, 2021: 3829−3835

    [84]

    Hung S C Y, Tu Chenghao, Wu Chengen, et al. Compacting, picking and growing for unforgetting continual learning[C]// Proc of the 33rd Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2019: Article 1225

    [85]

    Usmanova A, Portet F, Lalanda P, et al. Federated continual learning through distillation in pervasive computing[C]// Proc of the 2022 IEEE Int Conf on Smart Computing. Piscataway, NJ: IEEE, 2022: 86−91

    [86]

    Yoon J H, Jeong W Y, Lee G W, et al. Federated continual learning with weighted inter-client transfer[C]// Proc of the 38th Int Conf on Machine Learning. New York: PMLR, 2021: 12073−12086

    [87]

    Mori J, Teranishi I, Furukawa R. Continual horizontal federated learning for heterogeneous data[C/OL]// Proc of the 2022 Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2022[2024-08-16]. https://www.semanticscholar.org/reader/3674cbf1900f748e5d1e981f296790256989a62e

    [88]

    Hendryx S M, Kc D R, Walls B, et al. Federated reconnaissance: Efficient, distributed, class-incremental learning [J]. arXiv preprint, arXiv: 2109.00150, 2021

    [89]

    Xu Chencheng, Hong Zhiwei, Huang Minlie, et al. Acceleration of federated learning with alleviated forgetting in local training[C/OL]// Proc of the 10th Int Conf on Learning Representations. Washington: ICLR, 2022[2024-07-30]. https://openreview.net/pdf?id=541PxiEKN3F

    [90]

    Dong Jiahua, Wang Lixu, Fang Zhen, et al. Federated class-incremental learning[C]// Proc of the 2022 IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2022: 10154−10163

    [91]

    Wang Tianyu, Liang Teng, Li Jun, et al. Adaptive traffic signal control using distributed MARL and federated learning[C]// Proc of the 20th IEEE Int Conf on Communication Technology. Piscataway, NJ: IEEE, 2020: 1242−1248

    [92]

    Liu Haotian, Wu Wenchuan. Federated reinforcement learning for decentralized voltage control in distribution networks[J]. IEEE Transactions on Smart Grid, 2022, 13(5): 3840−3843 doi: 10.1109/TSG.2022.3169361

    [93]

    Rezazadeh F, Bartzoudis N. A federated DRL approach for smart micro-grid energy control with distributed energy resources[C]// Proc of the 27th IEEE Int Workshop on Computer Aided Modeling and Design of Communication Links and Networks. Piscataway, NJ: IEEE, 2022: 108−114

    [94]

    Wang Xiaofei, Wang Chenyang, Li Xiuhua, et al. Federated deep reinforcement learning for Internet of things with decentralized cooperative edge caching[J]. IEEE Internet of Things Journal, 2020, 7(10): 9441−9455 doi: 10.1109/JIOT.2020.2986803

    [95]

    Yu Shuai, Chen Xu, Zhou Zhi, et al. When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network[J]. IEEE Internet of Things Journal, 2021, 8(4): 2238−2251 doi: 10.1109/JIOT.2020.3026589

    [96]

    Wang Xiaoding, Hu Jia, Lin Hui, et al. QoS and privacy-aware routing for 5G-enabled industrial Internet of things: A federated reinforcement learning approach[J]. IEEE Transactions on Industrial Informatics, 2022, 18(6): 4189−4197 doi: 10.1109/TII.2021.3124848

    [97]

    Jeong H J, Lee H J, Shin C H, et al. IONN: Incremental offloading of neural network computations from mobile devices to edge servers[C] //Proc of the 2018 ACM Symp on Cloud Computing. New York: ACM, 2018: 401−411

    [98]

    Zhou Li, Samavatian M H, Bacha A, et al. Adaptive parallel execution of deep neural networks on heterogeneous edge devices[C] // Proc of the 4th ACM/IEEE Symp on Edge Computing. New York: ACM, 2019: 195−208

    [99]

    Yang Xiang, Xu Zikang, Qi Qi, et al. PICO: Pipeline inference framework for versatile CNNs on diverse mobile devices[J]. IEEE Transactions on Mobile Computing, 2023, 23(4): 2712−2730

    [100]

    Hu Chenghao, Li Baochun. Distributed inference with deep learning models across heterogeneous edge devices[C]// Proc of the 2022 IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2022: 330−339

    [101]

    Kim Y G, Wu C J. AutoScale: Energy efficiency optimization for stochastic edge inference using reinforcement learning[C]// Proc of the 53rd Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2020: 1082−1096

    [102]

    Hou Xueyu, Guan Yongjie, Han Tao, et al. DistrEdge: Speeding up convolutional neural network inference on distributed edge devices[C]// Proc of the 2022 IEEE Int Parallel and Distributed Processing Symp. Piscataway, NJ: IEEE, 2022: 1097−1107

    [103]

    Fu Kaihua, Shi Jiuchen, Chen Quan, et al. QoS-aware irregular collaborative inference for improving throughput of DNN services[C/OL] //Proc of the 2022 Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2022[2024-08-16]. https://dl.acm.org/doi/10.5555/3571885.3571976

    [104]

    Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[C/OL]// Proc of the 4th Int Conf on Learning Representations. Washington: ICLR, 2016[2024-07-30]. https://www.semanticscholar.org/reader/024006d4c2a89f7acacc6e4438d156525b60a98f

    [105]

    Wang Lingdong, Xiang Liyao, Xu Jiayu, et al. Context-aware deep model compression for edge cloud computing[C]// Proc of the 40th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2020: 787−797

    [106]

    Molina M, Muñoz O, Pascual-Iserte A, et al. Joint scheduling of communication and computation resources in multiuser wireless application offloading[C]// Proc of the 25th IEEE Annual Int Symp on Personal, Indoor, and Mobile Radio Communication. Piscataway, NJ: IEEE, 2014: 1093−1098

    [107]

    Kang Yiping, Hauswald J, Gao Cao, et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge[C] // Proc of the 22nd Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2017: 615−629

    [108]

    Zhuang Weiming, Chen Chen, Lyu Lingjuan. When foundation model meets federated learning: Motivations, challenges, and future directions [J]. arXiv preprint, arXiv: 2306.15546, 2023

    [109]

    Li Mingzhen, Liu Yi, Liu Xiaoyan, et al. The deep learning compiler: A comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(3): 708−727 doi: 10.1109/TPDS.2020.3030548

    [110]

    Zeng Qunsong, Du Yuqing, Huang Kaibin, et al. Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing[J]. IEEE Transactions on Wireless Communications, 2021, 20(12): 7947−7962 doi: 10.1109/TWC.2021.3088910

    [111]

    Han M, Hyun J, Park S, et al. MOSAIC: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference[C]// Proc of the 28th Int Conf on Parallel Architectures and Compilation Techniques. Piscataway, NJ: IEEE, 2019: 165−177

    [112]

    Chen Yunji, Luo Tao, Liu Shaoli, et al. DaDianNao: A machine-learning supercomputer[C]// Proc of the 47th Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2014: 609−622

    [113]

    Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit[C/OL] // Proc of the 44th ACM/IEEE Annual Int Symp on Computer Architecture. New York: ACM, 2017[2024-08-16]. https://dl.acm.org/doi/10.1145/3079856.3080246

    [114]

    Ro J H, Suresh A T, Wu Ke. FedJAX: Federated learning simulation with JAX [J]. arXiv preprint, arXiv: 2108.02117, 2021

    [115]

    Lee J Y, Park W P, Mitchell N, et al. JaxPruner: A concise library for sparsity research [J]. arXiv preprint, arXiv: 2304.14082, 2023

    [116]

    Villarrubia J, Costero L, Igual F D, et al. Improving inference time in multi-TPU systems with profiled model segmentation[C]// Proc of the 31st Euromicro Int Conf on Parallel, Distributed and Network-Based Processing. Piscataway, NJ: IEEE, 2023: 84−91

    [117]

    Wang Zixiao, Che Biyao, Guo Liang, et al. PipeFL: Hardware/software co-design of an FPGA accelerator for federated learning[J]. IEEE Access, 2022, 10: 98649−98661 doi: 10.1109/ACCESS.2022.3206785

    [118]

    Li H M, Rieger P, Zeitouni S, et al. FLAIRS: FPGA-accelerated inference-resistant & secure federated learning [J]. arXiv preprint, arXiv: 2308.00553, 2023

    [119]

    Phong L T, Aono Y, Hayashi T, et al. Privacy-preserving deep learning via additively homomorphic encryption[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(5): 1333−1345 doi: 10.1109/TIFS.2017.2787987

    [120]

    Nguyen T D, Rieger P, Chen Huili, et al. FLAME: Taming backdoors in federated learning[C]// Proc of the 31st USENIX Security Symp USENIX Association. Berkeley, CA, 2022: 1415−1432

    [121] 包云岗,常轶松,韩银和,等. 处理器芯片敏捷设计方法:问题与挑战[J]. 计算机研究与发展,2021,58(6):1131−1145 doi: 10.7544/issn1000-1239.2021.20210232

    Bao Yungang, Chang Yisong, Han Yinhe, et al. Agile design of processor chips: Issues and challenges[J]. Journal of Computer Research and Development, 2021, 58(6): 1131−1145 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210232

    [122] 王凯帆,徐易难,余子濠,等. 香山开源高性能RISC-V处理器设计与实现[J]. 计算机研究与发展,2023,60(3):476−493 doi: 10.7544/issn1000-1239.202221036

    Wang Kaifan, Xu Yinan, Yu Zihao, et al. XiangShan open-source high performance RISC-V processor design and implementation[J]. Journal of Computer Research and Development, 2023, 60(3): 476−493 (in Chinese) doi: 10.7544/issn1000-1239.202221036

    [123]

    Dhilleswararao P, Boppu S, Manikandan M S, et al. Efficient hardware architectures for accelerating deep neural networks: Survey[J]. IEEE Access, 2022, 10: 131788−131828 doi: 10.1109/ACCESS.2022.3229767

    [124]

    Zhao Yongwei, Du Zidong, Guo Qi, et al. Cambricon-F: Machine learning computers with fractal von neumann architecture[C]// Proc of the 46th ACM/IEEE Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2019: 788−801

    [125]

    Chen Tianqi, Moreau T, Jiang Ziheng, et al. TVM: An automated end-to-end optimizing compiler for deep learning[C]// Proc of the 13th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2018: 579–594

    [126]

    Pytorch. Pytorch on XLA devices [EB/OL]. (2024-07-17)[2023-10-21]. https://pytorch.org/xla/master/

    [127]

    Pytorch. Aot autograd —How to use and optimize?[EB/OL]. (2023-10-25)[2024-07-17]. https://pytorch.org/functorch/stable/notebooks/aot_autograd_optimizations.html

    [128]

    Coral. Edge TPU compiler [EB/OL]. (2020-05-15)[2024-07-16]. https://coral.ai/docs/edgetpu/compiler/#help

    [129]

    Nvidia. Optimizing inference on large language models with NVIDIA tensorrt-LLM, now publicly available [EB/OL]. (2023-10-19)[2024-07-16]. https://developer.nvidia.com/blog/optimizing-inference-on-llms-with-tensorrt-llm-now-publicly-available/

    [130]

    NVIDIA. TensorRT-LLM [EB/OL]. (2023-10-24)[2024-07-17]. https://github.com/NVIDIA/TensorRT-LLM

    [131]

    ONNX. ONNX [EB/OL]. (2024-05-24)[2024-07-17]. https://onnx.ai/

    [132]

    MLIR. Multi-level intermediate representation overview [EB/OL]. (2017-07-17)[2024-07-17]. https://mlir.llvm.org/

    [133]

    Jin Tian, Bercea G T, Tung D L, et al. Compiling ONNX neural network models using MLIR [J]. arXiv preprint, arXiv: 2008.08272, 2020

    [134]

    Gao Liang, Li Li, Chen Yingwen, et al. FIFL: A fair incentive mechanism for federated learning[C]// Proc of the 50th Int Conf on Parallel Processing. New York: ACM, 2021: Article 82

    [135]

    Team Tensorflow Lite. On-device training in tensorflow lite [EB/OL]. (2021-11-09)[2024-07-17]. https://blog.tensorflow.org/2021/11/on-device-training-in-tensorflow-lite.html

    [136]

    Space Ts2. A comprehensive guide to tensorflow lite’s federated learning [EB/OL]. (2023-04-07)[2024-07-17]. https://ts2.space/en/a-comprehensive-guide-to-tensorflow-lites-federated-learning/

    [137]

    He Chaoyang, Li Songze, So Jinhyun, et al. FedML: A research library and benchmark for federated machine learning [J]. arXiv preprint, arXiv: 2007.13518, 2020

    [138]

    Beutel D J, Topal T, Mathur A, et al. Flower: A friendly federated learning research framework [J]. arXiv preprint, arXiv: 2007.14390, 2020

    [139]

    Jeong E J, Kim J R, Ha S H. TensorRT-based framework and optimization methodology for deep learning inference on jetson boards [J]. ACM Transactions on Embedded Computing Systems, 2022, 21(5): Article 51

    [140]

    Jiang Xiaotang, Wang Huan, Chen Yiliu, et al. MNN: A universal and efficient inference engine[C/OL]// Proc of the 3rd Conf on Machine Learning and Systems. Austin, Texas: MLSys. org, 2020[2024-07-30]. https://proceedings.mlsys.org/paper_files/paper/2020/file/bc19061f88f16e9ed4a18f0bbd47048a-Paper.pdf

    [141]

    Lv Chengfei, Niu Chaoyue, Gu Renjie, et al. Walle: An end-to-end, general-purpose, and large-scale production system for device-cloud collaborative machine learning[C/OL]// Proc of the 16th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2022[2024-08-16]. https://www.usenix.org/conference/osdi22/presentation/lv

    [142]

    Aminabadi R Y, Rajbhandari S, Awan A A, et al. DeepSpeed- inference: Enabling efficient inference of transformer models at unprecedented scale[C/OL]// Proc of the 2022 Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2022[2024-08-16]. https://dl.acm.org/doi/abs/10.5555/3571885.3571946

    [143]

    Liu Lumin, Zhang Jun, Song S H, et al. Client-edge-cloud hierarchical federated learning[C/OL]// Proc of the 2020 IEEE Int Conf on Communications. Piscataway, NJ: IEEE, 2020[2024-08-17]. https://ieeexplore.ieee.org/document/9148862

    [144]

    Yang Shusen, Zhang Zhanhua, Zhao Cong, et al. CNNPC: End-edge-cloud collaborative cnn inference with joint model partition and compression[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(12): 4039−4056 doi: 10.1109/TPDS.2022.3177782

    [145]

    Korkmaz C, Kocas H E, Uysal A, et al. Chain FL: Decentralized federated machine learning via blockchain[C]// Proc of the 2nd Int Conf on Blockchain Computing and Applications. Piscataway, NJ: IEEE, 2020: 140−146

    [146]

    Du Jiangsu, Shen Minghua, Du Yunfei. A distributed in-situ CNN inference system for IoT applications[C]// Proc of the 38th Int Conf on Computer Design. Piscataway, NJ: IEEE, 2020: 279−287

    [147]

    Lyu L, Yu Jiangshan, Nandakumar K, et al. Towards fair and privacy-preserving federated deep models[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(11): 2524−2541 doi: 10.1109/TPDS.2020.2996273

    [148]

    Luping Wang, Wei Wang, Li Bo. CMFL: Mitigating communication overhead for federated learning[C]// Proc of the 39th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2019: 954−964

    [149]

    Yu Hao, Yang Sen, Zhu Shenghuo. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 5693−5700

    [150]

    You Chaoqun, Guo Kun, Feng Gang, et al. Automated federated learning in mobile-edge networks—Fast adaptation and convergence[J]. IEEE Internet of Things Journal, 2023, 10(15): 13571−13586 doi: 10.1109/JIOT.2023.3262664

    [151]

    Heinbaugh C E, Luz-Ricca E, Shao Huajie. Data-free one-shot federated learning under very high statistical heterogeneity[C/OL] //Proc of the 11th Int Conf on Learning Representations. Washington: ICLR, 2023[2024-07-30]. https://openreview.net/forum?id=_hb4vM3jspB

    [152]

    Sattler F, Wiedemann S, Müller K R, et al. Robust and communication-efficient federated learning from non-I. I. D. data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(9): 3400−3413 doi: 10.1109/TNNLS.2019.2944481

    [153]

    Gao Hongchang, Xu An, Huang Heng. On the convergence of communication-efficient local SGD for federated learning[C]// Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 7510−7518

    [154]

    Hönig R, Zhao Yiren, Mullins R D. DAdaQuant: Doubly-adaptive quantization for communication-efficient federated learning[C] //Proc of the 39th Int Conf on Machine Learning. New York: ACM, 2022: 8852−8866

    [155]

    Nguyen M D, Lee S M, Pham Q V, et al. HCFL: A high compression approach for communication-efficient federated learning in very large scale IoT networks[J]. IEEE Transactions on Mobile Computing, 2023, 22(11): 6495−6507 doi: 10.1109/TMC.2022.3190510

    [156]

    Dai Rong, Shen Li, He Fengxiang, et al. DisPFL: Towards communication-efficient personalized federated learning via decentralized sparse training[C]// Proc of the 39th Int Conf on Machine Learning. New York: ACM, 2022: 4587−4604

    [157]

    Wen Hui, Wu Yue, Li Jingjing, et al. Communication-efficient federated data augmentation on non-IID data[C]// Proc of the 2022 IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops Piscataway, NJ: IEEE, 2022: 3376−3385

    [158]

    Zhang Zhanhua, Yang Shusen, Zhao Cong, et al. RtCoInfer: Real-time collaborative CNN inference for stream analytics on ubiquitous images[J]. IEEE Journal on Selected Areas in Communications, 2023, 41(4): 1212−1226 doi: 10.1109/JSAC.2023.3242730

    [159]

    Chen Xu, Jiao Lei, Li Wenzhong, et al. Efficient multi-user computation offloading for mobile-edge cloud computing[J]. IEEE/ACM Transactions on Networking, 2016, 24(5): 2795−2808 doi: 10.1109/TNET.2015.2487344

    [160]

    Yi Changyan, Cai Jun, Su Zhou. A multi-user mobile computation offloading and transmission scheduling mechanism for delay-sensitive applications[J]. IEEE Transactions on Mobile Computing, 2020, 19(1): 29−43 doi: 10.1109/TMC.2019.2891736

    [161]

    Ale L H, Zhang Ning, Fang Xiaojie, et al. Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning[J]. IEEE Transactions on Cognitive Communications and Networking, 2021, 7(3): 881−892 doi: 10.1109/TCCN.2021.3066619

    [162]

    Mozaffariahrar E, Theoleyre F, Menth M. A survey of wi-fi 6: Technologies, advances, and challenges[J]. Future Internet, 2022, 14(10): 293−345 doi: 10.3390/fi14100293

    [163]

    Das A K, Roy S, Bandara E, et al. Securing age-of-information (AoI)-enabled 5G smart warehouse using access control scheme[J]. IEEE Internet of Things Journal, 2023, 10(2): 1358−1375 doi: 10.1109/JIOT.2022.3205245

    [164]

    Mehr H D, Polat H. Human activity recognition in smart home with deep learning approach[C]// Proc of the 7th Int Istanbul Smart Grids and Cities Congress and Fair. Piscataway, NJ: IEEE, 2019: 149−153

    [165]

    Qi Lianyong, Hu Chunhua, Hu Chunhua, et al. Privacy-aware data fusion and prediction with spatial-temporal context for smart city industrial environment[J]. IEEE Transactions on Industrial Informatics, 2021, 17(6): 4159−4167 doi: 10.1109/TII.2020.3012157

    [166]

    Chen Yiqiang, Wang Jindong, Yu Chaohui, et al. FedHealth: A federated transfer learning framework for wearable healthcare[J]. IEEE Intelligent Systems, 2020, 35(4): 83−93 doi: 10.1109/MIS.2020.2988604

    [167]

    Lee S, Choi D H. Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources[J]. IEEE Transactions on Industrial Informatics, 2022, 18(1): 488−497 doi: 10.1109/TII.2020.3035451

    [168] 王帅,李丹. 分布式机器学习系统网络性能优化研究进展[J]. 计算机学报,2022,45(7):1384−1412

    Wang Shuai, Li Dan. Research progress on network performance optimization of distributed machine learning system[J]. Chinese Journal of Computers, 2022, 45(7): 1384−1412 (in Chinese)

    [169]

    Martinez I, Hafid A S, Jarray A. Design, resource management, and evaluation of fog computing systems: A survey[J]. IEEE Internet of Things Journal, 2021, 8(4): 2494−2516 doi: 10.1109/JIOT.2020.3022699

    [170]

    Lu Yan, Shu Yuanchao, Tan Xu, et al. Collaborative learning between cloud and end devices: An empirical study on location prediction[C]// Proc of the 4th ACM/IEEE Symp on Edge Computing. New York: ACM, 2019: 139–151

    [171]

    Encora. Ahead-of-time compilation vs just-in-time compilation— Part 1 of understanding angular [EB/OL]. (2024-07-14)[2024-07-16]. https://www.encora.com/insights/ahead-of-time-compilation-vs-just-in-time-compilation-part-1

    [172]

    Yu Hao, Yang Sen, Zhu Shenghuo. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning[C]// Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: Article 698

    [173]

    Tramer F, Zhang Fan, Juels A, et al. Stealing machine learning models via prediction APIs[C]// Proc of the 25th USENIX Conf on Security Symp. Berkeley, CA: USENIX Association, 2016: 601–618

    [174]

    Yin Yupeng, Zhang Xianglong, Zhang Huanle, et al. Ginver: Generative model inversion attacks against collaborative inference[C]// Proc of the 2023 ACM Web Conf. New York: ACM, 2023: 2122–2131

    [175]

    Jin Xiao, Chen Pinyu, Hsu Chiayi, et al. CAFE: Catastrophic data leakage in vertical federated learning[C/OL] //Proc of the 35th Conf on Neural Information Processing Systems. Cambridge, MA: MIT, 2021[2024-07-30]. https://papers.nips.cc/paper/2021/hash/08040837089cdf46631a10aca5258e16-Abstract.html

    [176]

    Nguyen T D T, Lai P, Tran K, et al. Active membership inference attack under local differential privacy in federated learning[C]// Proc of the 26th Int Conf on Artificial Intelligence and Statistics. Piscataway, NJ: IEEE, 2023: 5714−5730

    [177]

    Li Jiacheng, Li Ninghui, Ribeiro B. Effective passive membership inference attacks in federated learning against overparameterized models[C/OL] //Proc of the 11th Int Conf on Learning Representation. Washington: ICLR, 2023[2024-07-30]. https://openreview.net/pdf?id=QsCSLPP55Ku

    [178]

    Melis L, Song C, Cristofaro E D, et al. Exploiting unintended feature leakage in collaborative learning[C]// Proc of the 40th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2019: 691−706

    [179]

    Wang Zhibo, Huang Yuting, Song Mengkai, et al. Poisoning-assisted property inference attack against federated learning[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20(4): 3328−3340 doi: 10.1109/TDSC.2022.3196646

    [180]

    Kourtellis N, Katevas K, Perino D. FLaaS: Federated learning as a service[C] //Proc of the 1st Workshop on Distributed Machine Learning. New York: ACM, 2020: 7−13

    [181]

    Truong J B, Maini P, Walls R J, et al. Data-free model extraction[C]// Proc of the 2021 IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 4769−4778

    [182]

    Liu Sijia, Chen Pinyu, Kailkhura B, et al. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications[J]. IEEE Signal Processing Magazine, 2020, 37(5): 43−54 doi: 10.1109/MSP.2020.3003837

    [183]

    Fraboni Y, Vidal R, Lorenzi M. Free-rider attacks on model aggregation in federated learning[C]// Proc of the 24th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2021: 1846−1854

    [184]

    Lin Jierui, Du Min, Liu Jian. Free-riders in federated learning: Attacks and defenses [J]. arXiv preprint, arXiv: 1911.12560, 2019

    [185]

    Abadi M, Chu Andy, Goodfellow I J, et al. Deep learning with differential privacy[C]// Proc of the 2016 ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2016: 308–318

    [186]

    Wang Baocang, Chen Yange, Jiang Hang, et al. PPeFL: Privacy-preserving edge federated learning with local differential privacy[J]. IEEE Internet of Things Journal, 2023, 10(17): 15488−15500 doi: 10.1109/JIOT.2023.3264259

    [187]

    He Zecheng, Zhang Tianwei, Lee R. B. Attacking and protecting data privacy in edge–cloud collaborative inference systems[J]. IEEE Internet of Things Journal, 2021, 8(12): 9706−9716 doi: 10.1109/JIOT.2020.3022358

    [188]

    Jiang Bin, Li Jianqiang, Wang Huihui, et al. Privacy-preserving federated learning for industrial edge computing via hybrid differential privacy and adaptive compression[J]. IEEE Transactions on Industrial Informatics, 2023, 19(2): 1136−1144 doi: 10.1109/TII.2021.3131175

    [189]

    Mironov I. Rényi differential privacy [C]//Proc of the 30th IEEE Computer Security Foundations Symp. Piscataway, NJ: IEEE 2017: 263−275

    [190]

    Ryu Jihyeon, Zheng Yifeng, Gao Yansong, et al. Can differential privacy practically protect collaborative deep learning inference for IoT?[J/OL]. Wireless Networks, 2022[2024-07-30]. https://link.springer.com/article/10.1007/s11276-022-03113-7

    [191]

    Cheon J H, Kim A, Kim M, et al. Homomorphic encryption for arithmetic of approximate numbers[C/OL]// Proc of the 2017 Int Conf on the Theory and Application of Cryptology and Information Security. Berlin: Springer, 2017 [2024-07-30]. https://link.springer.com/chapter/10.1007/978-3-319-70694-8_15

    [192]

    Zhang Chengliang, Li Suyi, Xia Junzhe, et al. BatchCrypt: Efficient homomorphic encryption for cross-silo federated learning[C]// Proc of the 2020 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2020: 493−506

    [193]

    Zhu Yilan, Wang Xinyao, Ju Lei, et al. FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference[C]// Proc of the 29th IEEE Int Symp on High-Performance Computer Architecture. Piscataway, NJ: IEEE, 2023: 896−907

    [194]

    Yang Zhaoxiong, Hu Shuihai, Chen Kai. FPGA-based hardware accelerator of homomorphic encryption for efficient federated learning [J]. arXiv preprint, arXiv: 2007.10560, 2020

    [195]

    Juvekar C, Vaikuntanathan V, Chandrakasan A P. Gazelle: A low latency framework for secure neural network inference[C]// Proc of the 27th USENIX Security Symp. Berkeley, CA: USENIX Association, 2018: 1651−1668

    [196]

    Li Yiran, Li Hongwei, Xu Guowen, et al. Practical privacy-preserving federated learning in vehicular fog computing[J]. IEEE Transactions on Vehicular Technology, 2022, 71(5): 4692−705 doi: 10.1109/TVT.2022.3150806

    [197]

    Jarin I, Eshete B. PRICURE: Privacy-preserving collaborative inference in a multi-party setting[C]// Proc of the 2021 ACM Workshop on Security and Privacy Analytics. New York: ACM, 2021: 25–35

    [198]

    Liu Yang, Kang Yan, Xing Chaoping, et al. A secure federated transfer learning framework[J]. IEEE Intelligent Systems, 2020, 35(4): 70−82 doi: 10.1109/MIS.2020.2988525

    [199]

    Tramèr F, Boneh D. Slalom: Fast, verifiable and private execution of neural networks in trusted hardware [C/OL]//Proc of the 7th Int Conf on Learning Representations. Washington: ICLR, 2019 [2024-07-30]. https://openreview.net/pdf?id=rJVorjCcKQ

    [200]

    Intel. Innovative technology for CPU based attestation and sealing [EB/OL]. (2013-08-14)[2024-07-17]. https://www.intel.com/content/www/us/en/developer/articles/technical/innovative-technology-for-cpu-based-attestation-and-sealing.html

    [201]

    Kalapaaking A P, Khalil I, Rahman M S, et al. Blockchain-based federated learning with secure aggregation in trusted execution environment for Internet-of-things[J]. IEEE Transactions on Industrial Informatics, 2023, 19(2): 1703−1714 doi: 10.1109/TII.2022.3170348

    [202]

    Kuznetsov E, Chen Yitao, Zhao Ming. SecureFL: Privacy preserving federated learning with SGX and trustzone [C]//Proc of the 2021 IEEE/ACM Symp on Edge Computing. Piscataway, NJ: IEEE, 2021: 55−67

    [203]

    Li Yuepeng, Zeng Deze, Gu Lin, et al. Efficient and secure deep learning inference in trusted processor enabled edge clouds[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(12): 4311−4325 doi: 10.1109/TPDS.2022.3187772

    [204]

    Law A, Leung C, Poddar R, et al. Secure collaborative training and inference for xgboost[C]// Proc of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice. New York: ACM, 2020: 21–26

    [205]

    Juuti M, Szyller S, Marchal S, et al. PRADA: Protecting against dnn model stealing attacks[C]// Proc of the 2019 IEEE European Symp on Security and Privacy. Piscataway, NJ: IEEE, 2019: 512−527

    [206]

    Lin Jierui, Du Min, Liu Jian. Free-riders in federated learning: Attacks and defenses [J]. arXiv prerprint, arXiv: 1911.12560, 2019

    [207]

    Xu Xinyi, Lyu Lingjuan. A reputation mechanism is all you need: Collaborative fairness and adversarial robustness in federated learning [C/OL]// Proc of the 2021 Int Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML. New York: ACM, 2020[2024-08-16]. https://www.semanticscholar.org/reader/329734fdbb35faab89e14eb9b105a665d7a5f079

    [208]

    Zhang Jiliang, Peng Shuang, Gao Yansong, et al. APMSA: Adversarial perturbation against model stealing attacks[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 1667−1679 doi: 10.1109/TIFS.2023.3246766

    [209]

    Tan Jingxuan, Zhong Nan, Qian Zhenxing, et al. Deep neural network watermarking against model extraction attack[C]// Proc of the 31st ACM Int Conf on Multimedia. New York: ACM, 2023: 1588−1597

    [210]

    Zhang Haitian, Hua Guang, Wang Xinya, et al. Categorical inference poisoning: Verifiable defense against black-box DNN model stealing without constraining surrogate data and query times[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 1473−1486 doi: 10.1109/TIFS.2023.3244107

    [211]

    Dai Hongning, Zheng Zibin, Zhang Yan. Blockchain for Internet of things: A survey[J]. IEEE Internet of Things Journal, 2019, 6(5): 8076−8094 doi: 10.1109/JIOT.2019.2920987

    [212]

    Biggio B, Corona I, Maiorca D, et al. Evasion attacks against machine learning at test time [J]. arXiv preprint, arXiv: 1708.06131, 2013

    [213]

    Tang Pengfei, Wang Wenjie, Lou Jian, et al. Generating adversarial examples with distance constrained adversarial imitation networks[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(6): 4145−4155 doi: 10.1109/TDSC.2021.3123586

    [214]

    Bagdasaryan E, Veit A, Hua Yiqing, et al. How to backdoor federated learning[C]// Proc of the 23rd Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2020: 2938−2948

    [215]

    Zhang Jiale, Chen Bing, Cheng Xiang, et al. PoisonGAN: Generative poisoning attacks against federated learning in edge computing systems[J]. IEEE Internet of Things Journal, 2021, 8(5): 3310−3322 doi: 10.1109/JIOT.2020.3023126

    [216]

    Qammar A, Ding Jianguo, Ning Huansheng. Federated learning attack surface: Taxonomy, cyber defences, challenges, and future directions[J]. Artificial Intelligence Review, 2022, 55(5): 3569−3606 doi: 10.1007/s10462-021-10098-w

    [217]

    Kim T, Singh S, Madaan N, et al. Characterizing internal evasion attacks in federated learning[C]// Proc of the 26th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2023: 907−921

    [218]

    Bao Hongyan, Han Yufei, Zhou Yujun, et al. Towards effcient and domain-agnostic evasion attack with high-dimensional categorical inputs[C]// Proc of the 37th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2023: 6753−6761

    [219]

    Demontis A, Melis M, Pintor M, et al. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks[C]// Proc of the 28th USENIX Conf on Security Symp. Berkeley, CA: USENIX Association, 2019: 321–338

    [220]

    Blanchard P, Mahdi E, Guerraoui R, et al. Machine learning with adversaries: Byzantine tolerant gradient descent[C]// Proc of the 31st Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2017: 118–128

    [221]

    Lugan S, Desbordes P, Brion E, et al. Secure architectures implementing trusted coalitions for blockchained distributed learning[J]. IEEE Access, 2019, 7: 181789−181799 doi: 10.1109/ACCESS.2019.2959220

    [222]

    Bao Hongyan, Han Yufei, Zhou Yujun, et al. Towards understanding the robustness against evasion attack on categorical data[C/OL]// Proc of the 10th Int Conf on Learning Representations. Washington: ICLR, 2022[2024-07-30]. https://openreview.net/pdf?id=BmJV7kyAmg

    [223]

    Cao Xiaoyu, Gong N Z Q. Mitigating evasion attacks to deep neural networks via region-based classification[C/OL]// Proc of the 33rd Annual Computer Security Applications Conf. New York: ACM, 2017: 278−287

    [224]

    Zizzo G, Rawat A, Sinn M, et al. FAT: Federated adversarial training [J]. arXiv preprint, arXiv: 2012.01791, 2020

    [225]

    Kumari K, Rieger P, Fereidooni H, et al. BayBFed: Bayesian backdoor defense for federated learning[C]// Proc of the 2023 IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2023: 737−754

    [226]

    Cao Xiaoyu, Jia Jinyuan, Zhang Zaixi, et al. FedRecover: Recovering from poisoning attacks in federated learning using historical information[C]// Proc of the 44th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2023: 1366−1383

    [227]

    Wen Jing, Hui L C K, Yiu S M, et al. DCN: Detector-corrector network against evasion attacks on deep neural networks[C]// Proc of the 48th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks Workshops. Piscataway, NJ: IEEE, 2018: 215−221

    [228]

    Debicha I, Bauwens R, Debatty T, et al. TAD: Transfer learning-based multi-adversarial detection of evasion attacks against network intrusion detection systems[J]. Future Generation Computer Systems, 2023, 138: 185−197 doi: 10.1016/j.future.2022.08.011

    [229]

    Lecuyer M, Atlidakis V, Geambasu R, et al. Certified robustness to adversarial examples with differential privacy[C]// Proc of the 40th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2019: 656−672

    [230]

    Byrd D, Polychroniadou A. Differentially private secure multi-party computation for federated learning in financial applications[C]// Proc of the 1st ACM Int Conf on AI in Finance. New York: ACM, 2021: Article 16

    [231]

    Rathee D, Rathee M, Kumar N, et al. Cryptflow2: Practical 2-party secure inference[C]// Proc of the 2020 ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2020: 325–342

    [232]

    He Xuanli, Lyu L, Xu Qiongkai, et al. Model extraction and adversarial transferability, your bert is vulnerable![C]// Proc of the 2021 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2021: 2006–2012

    [233]

    Keskar N S, Mccann B, Xiong Caiming. The thieves on sesame street are polyglots-extracting multilingual models from monolingual APIs[C]// Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 6203–6207

    [234]

    Wu Huaming, Wolter K. Stochastic analysis of delayed mobile offloading in heterogeneous networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(2): 461−474 doi: 10.1109/TMC.2017.2711014

    [235]

    Tu Xuezhen, Zhu Kun, Luong N C, et al. Incentive mechanisms for federated learning: From economic and game theoretic perspective[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(3): 1566−1593 doi: 10.1109/TCCN.2022.3177522

    [236]

    Liu Shaoshan, Liu Liangkai, Tang Jie, et al. Edge computing for autonomous driving: Opportunities and challenges[J]. Proceedings of the IEEE, 2019, 107(8): 1697−1716 doi: 10.1109/JPROC.2019.2915983

    [237]

    Li Yuanchun, Wen Hao, Wang Weijun, et al. Personal LLM agents: Insights and survey about the capability, efficiency and security [J]. arXiv preprint, arXiv: 2401.05459, 2024

    [238]

    Yu Sixing, Muñoz J P, Jannesari A. Federated foundation models: Privacy-preserving and collaborative learning for large models [J]. arXiv preprint, arXiv: 2305.11414, 2023

  • 期刊类型引用(1)

    1. LUO Haoran,HU Shuisong,WANG Wenyong,TANG Yuke,ZHOU Junwei. Research on Multi-Core Processor Analysis for WCET Estimation. ZTE Communications. 2024(01): 87-94 . 必应学术

    其他类型引用(4)

图(2)  /  表(5)
计量
  • 文章访问数:  633
  • HTML全文浏览量:  145
  • PDF下载量:  160
  • 被引次数: 5
出版历程
  • 收稿日期:  2023-12-04
  • 修回日期:  2024-08-18
  • 录用日期:  2024-09-02
  • 网络出版日期:  2024-09-11
  • 刊出日期:  2024-12-31

目录

/

返回文章
返回