-
摘要:
随着边缘智能需求的快速增长,联邦学习(federated learning,FL)技术在产业界受到了极大的关注. 与传统基于云计算的集中式机器学习相比,边缘网络环境下联邦学习借助移动边缘设备共同训练机器学习模型,不需要把大量本地数据发送到云端进行处理,缩短了数据处理计算节点与用户之间的距离,在满足用户低时延需求的同时,用户数据可以在本地训练进而实现数据隐私保护. 在边缘网络环境下,由于通信资源和计算资源受限,联邦学习的性能依赖于无线网络状态、终端设备资源以及数据质量的综合限制. 因此,面向边缘智能应用,首先分析了边缘智能环境下高效联邦学习面临的挑战,然后综述联邦学习在客户端选择、模型训练与模型更新等关键技术方面的研究进展,最后对边缘智能联邦学习的发展趋势进行了展望.
Abstract:With the increasing demand of edge intelligence, federated learning (FL) has been now of great concern to the industry. Compared with the traditionally centralized machine learning that is mostly based on cloud computing, FL collaboratively trains the neural network model over a large number of edge devices in a distributed way, without sending a large amount of local data to the cloud for processing, which makes the compute-extensive learning tasks sunk to the edge of the network closed to the user. Consequently, the users’ data can be trained locally to meet the needs of low latency and privacy protection. In mobile edge networks, due to the limited communication resources and computing resources, the performance of FL is subject to the integrated constraint of the available computation and communication resources during wireless networking, and also data quality in mobile device. Aiming for the applications of edge intelligence, the tough challenges for seeking high efficiency FL are analyzed here. Next, the research progresses in client selection, model training and model updating in FL are summarized. Specifically, the typical work in data unloading, model segmentation, model compression, model aggregation, gradient descent algorithm optimization and wireless resource optimization are comprehensively analyzed. Finally, the future research trends of FL in edge intelligence are prospected.
-
Keywords:
- federated learning /
- edge computing /
- edge intelligence /
- model aggregation /
- resource constraints
-
ZNS SSD (Zoned Namespaces SSD)[1-7]是2019年由西部数据公司和三星公司推出的新一代固态硬盘(solid state drive,SSD),目前受到了工业界和学术界的广泛关注. 由于基于闪存的SSD只有在块被完全擦除以后才能重写,传统的 SSD 通过使用闪存转换层(flash translation layer,FTL)[8] 来适应这一特性,但由于闪存块存在物理限制 (擦除操作以块大小进行,而读写操作以页大小进行),因此经常需要进行垃圾回收(garbage collection,GC)[5,9],同时也带来了预留空间 (over provisioning,OP)[10]和写放大(write amplification,WA)[11] 等问题,ZNS SSD可以有效提升SSD的读写吞吐,降低写入时的写放大,减少SSD本身的预留空间,并且还能解决传统SSD在空间占用达到一定程度时由于内部垃圾回收导致的性能不稳定的问题[12-13],因此利用ZNS SSD来构建数据库系统是一个趋势[3].
图1展示了ZNS SSD 和传统 SSD 数据放置对比,在ZNS SSD上数据由主机程序显式地控制放置. 虽然ZNS SSD具有如此多优点,但它同样带来了一些挑战[3,7]. 与传统基于闪存的SSD相比,ZNS SSD移除了FTL,将分区(Zone)的空间直接交由主机程序控制管理,由主机程序来处理垃圾回收、磨损均衡、数据放置等,这扩大了用户数据布局的设计空间. 由用户程序来决定数据的放置、生命周期和垃圾回收时机,通过有效合理地组织数据,可以提高系统的性能. 但 ZNS SSD 同样给主机程序的设计带来了新的要求,比如一个Zone内有一个写指针只能进行顺序写、不同Zone性能有差异、何时进行Zone-Reset操作等[7,14].
B+-tree是数据库中经典的索引结构,以往研究者已经提出了多种针对SSD、持久性内存等新型存储的B+-tree优化方法[15-26]. 但是,已有的SSD索引设计重点在于减少对SSD的随机写操作. 虽然ZNS SSD的底层介质仍是闪存,但由于Zone内只能顺序写,因此随机写的问题不再是ZNS SSD上B+-tree索引优化的第一目标. B+-tree如何在没有FTL的情况下适应ZNS SSD的分区特性以及Zone内顺序写要求,是ZNS SSD感知的B+-tree索引需要解决的关键问题. 针对传统SSD设计的索引由于没有考虑ZNS SSD的特性,所以都无法直接运行在ZNS SSD上. 此外,根据我们的调研,目前也还没有提出ZNS SSD感知的索引结构.
本文提出了一种ZNS SSD感知的新型索引结构——ZB+-tree(ZNS-aware B+-tree). 我们发现,虽然ZNS SSD上要求Zone内顺序写,但是实际的分区块设备 (zoned block device,ZBD)中除了只允许顺序写的顺序Zone (sequential zone,Seq-Zone),还存在着少量允许随机写的常规Zone (conventional zone,Cov-Zone). 因此,我们结合了ZNS SSD内部这2类Zone的特性设计了ZB+-tree的结构来适配ZNS SSD. 具体而言,本文的主要工作和贡献可总结为3个方面:
1)提出了一种ZNS SSD感知的新型索引结构ZB+-tree. ZB+-tree利用了ZNS SSD内部的Cov-Zone来吸收对Zone的随机写操作,并将索引节点分散存储在Cov-Zone和Seq-Zone中,通过设计不同的节点结构来管理不同Zone上的节点,在保证Zone内顺序写要求的同时提高空间效率.
2)设计了ZB+-tree上的相关操作,包括Sync,Search,Insert,Delete等,在保证索引性能的同时减少操作过程中的读写次数.
3)利用null_blk[27]和libzbd[28]模拟ZNS SSD设备开展实验,并将传统的CoW B+-tree修改后作为对比索引. 结果表明,ZB+-tree能有效阻断级联更新,减少读写次数,在运行时间、空间利用率上均优于CoW B+-tree.
1. 背景与相关工作
本节主要介绍目前业界对于ZNS SSD的相关研究和SSD 感知的B+-tree索引的相关工作,同时也介绍本文用于对比实验的CoW B+-tree.
1.1 ZNS SSD相关工作
ZNS SSD将其空间划分为许多的Zones,在一个Zone内可以随机读,但是只能顺序写,当一个Zone写满时会触发Zone-Reset和GC操作. 图2显示了Zone的大致结构,每个Zone内有一个写指针,Zone内有严格的顺序写限制.
传统SSD由于FTL的存在,会给主机程序提供块接口[1],各类任务都交由FTL来处理,SSD通过FTL提供给上层应用的接口是支持随机读写的,在程序设计时可以视为硬盘. FTL的存在使得之前为硬盘设计的各类应用程序可以几乎不用修改而直接迁移到SSD上,但同时也导致了SSD需要预留空间、需要进行内部的垃圾回收、在空间占比达到一定程度时会发生性能抖动等问题[1,12].
ZNS SSD相比于传统块接口的SSD有以下优点:首先在ZNS SSD上减少甚至移除了FTL,将映射表的管理、垃圾回收和顺序写的限制都交给主机端进行控制,节约了成本,同时解决了预留空间的问题. 其次由于主机程序能控制ZNS SSD上数据的放置、垃圾回收时机,通过将不同特性的数据放置在不同的Zone,选择合理的垃圾回收时机等策略将有助于提高系统性能、延长SSD使用寿命.
由于在Zone内只能顺序写,因此很多基于传统SSD抽象接口开发的应用和系统不能直接应用在ZNS SSD上[1-6]. 此外,由于ZNS SSD移除了设备内的垃圾回收,因此需要用户自行设计垃圾回收机制,带来了额外的复杂性. Stavrinos等人[3]分析了ZNS SSD的各项优势,并通过调研发现一旦行业因为ZNS SSD的成本和性能优势开始转向ZNS SSD,则之前大多数SSD上的工作都需要重新审视,因此呼吁研究者转向ZNS SSD的研究. Shin等人[7]通过在ZNS SSD原型硬件上进行各种性能测试,为ZNS SSD上的软件设计提供了有效思路. Choi等人[5]提出了一种在ZNS SSD上的LSM(log-structured merge)风格的垃圾回收机制,他们建议针对ZNS SSD进行细粒度的垃圾回收,并根据数据热度将细粒度数据存储在不同的分区内. 西部数据的Bjørling等人[1]基于ZNS SSD开发出了ZenFS文件系统来适配 RocksDB的文件接口,目前已经成功提交到 RocksDB的社区. Han等人[2]在ZNS接口的基础上更进一步,提出了ZNS+接口. ZNS+是一种对日志结构文件系统(log-structured file system,LFS)感知的ZNS接口,将用户定义的垃圾回收产生的有效数据拷贝放到设备内部进行,从而减少I/O操作,提升文件系统垃圾回收的效率.
虽然ZNS SSD具有如此多优点,但它同样带来了一些挑战. 与传统SSD相比,ZNS SSD移除了FTL,将Zone的空间直接交由主机程序控制管理,由主机程序来处理垃圾回收、磨损均衡、数据放置等,这扩大了用户数据存储管理的设计空间,也带来了设计上的困难. 总体而言,ZNS SSD的顺序写和Zone划分等限制对现有的数据存储管理机制提出了诸多新的挑战,包括存储分配、垃圾回收、索引结构等.
1.2 SSD感知的B+-tree相关工作
在过去的10多年里,由于闪存技术的快速发展,学术界针对闪存感知的B+-tree优化方法开展了广泛的研究. Roh等人[15]利用SSD内部的并行性来优化B+-tree索引. Na等人[16]提出了一种动态页内日志风格的B+-tree索引. Agrawal等人[17]设计了一种惰性更新的机制来使B+-tree更适合SSD特性. Ahn等人[18]利用CoW B+-tree的性质对SSD上的B+-tree进行了优化. Fang等人[19]提出一种感知SSD持久度的B+-tree方案. Jin等人[20]则利用布隆过滤器(Bloom filter)和节点的更新缓存实现了不降低读性能的同时减少SSD写操作. Lv等人[21]通过日志的方式将随机写转为顺序写来优化SSD上的R-tree的读写操作. Jin等人[22]通过利用SSD内部的并行来批量处理小的写入,提出了一种flash感知的线性散列索引. Ho和Park[23]通过在内存中设计一个写模式转换器,将随机写转换为顺序写,以此来充分利用SSD的特性. 文献[24]使用IPL (in-page logging)和写缓存等技术设计符合SSD特性的LSB+-tree,同时提高了索引的时间和空间效率.
总体而言,SSD感知的B+-tree索引的设计重点在于减少对SSD的随机写操作. 这是因为B+-tree本身是一种写不友好的索引,因此对于SSD上的写密集应用性能较差. ZNS SSD的介质仍是闪存,因此同样需要考虑减少随机写操作. 但由于Zone内只能顺序写,随机写的问题不再是ZNS SSD上B+-tree索引优化的第一目标. 如何适应数据的分区存储和顺序写特性,是设计ZNS SSD感知的B+-tree索引需要重点考虑的问题. 就目前的研究进展,还没有一种已有的SSD索引能够直接运行在ZNS SSD上.
1.3 CoW B+-tree
目前学术界还没有提出针对ZNS SSD的索引结构. 与本文工作较相关的已有工作主要是CoW B+-tree. CoW B+-tree是一种追加写(append-only write)的B+树结构[25]. 这刚好适应了ZNS SSD的Zone内顺序写要求,因此如果不考虑Zone内数据存储分配的话,可以将CoW B+-tree修改后运行在ZNS SSD上. 图3给出了CoW B+-tree的数据更新过程.
文献[18]对Cow B+-tree针对SSD特性做了一些优化以减少写放大,但是却无法保证B+-tree节点大小相同,因此本文还是选择传统CoW B+-tree[25]进行对比,CoW B+-tree算法流程和传统的B+-tree基本一样,只是修改叶节点时由于不能原地更新,因此需要把修改后的节点直接附加在写指针处,由于叶节点的地址发生了改变,因此还需要修改其父节点中指向叶节点的指针. 这一修改会级联往上传递,直到根节点.
但是,直接将CoW B+-tree的所有节点都写在一个Zone中将会很快耗尽Zone的空间,从而频繁触发Zone-Reset操作. 因此,在本文后续的对比实验中,我们对CoW B+-tree进行了修改使其能够利用所有的Zone,并减少频繁的Zone-Reset操作. 由于主机程序可以得知Zone的使用情况,我们总是将CoW B+-tree节点写入剩余空间最多的Zone,并根据当前Zone的使用情况动态选择Zone而不是固定写入一个Zone. 这样可以在使用该索引时尽量减少Zone-Reset,同时充分利用ZNS SSD的空间.
2. ZB+-tree 索引结构
2.1 设计思路
由于ZNS SSD具有多分区特性,不同的Zone性能有所差异,在频繁访问的磨损较多的Zone会有更高的访问延迟[7],且在Seq-Zone中有严格的顺序写要求. 当Zone写满时会触发Zone-Reset操作和垃圾回收,这2个操作非常耗时,会带来性能的抖动. 因此ZB+-tree主要针对4个目标进行设计:1)索引要能充分利用ZNS SSD的多分区特性;2)要满足在Seq-Zone中的严格顺序写限制;3)将频繁访问的数据尽量放置在磨损较少的Zone以降低访问延迟;4)尽量减少在运行过程中的GC和Zone-Reset操作,以避免性能抖动.
针对上述目标,ZB+-tree进行了全新设计,主要设计思路总结为5个方面:
1)将索引节点分散在多个Zone中,允许节点在不同Zone之间进行移动. 由于Cov-Zone中允许进行原地更新,如果能充分利用这一特性,就能将对索引的原地更新吸收在Cov-Zone中. 当节点写满时则整体移动到Seq-Zone中,既保证不会有太大的写放大,而同时也能保证在Seq-Zone中的顺序写要求.
2)将对Seq-Zone中节点的更新也吸收在Cov-Zone中,为Seq-Zone中不可原地更新的节点设计了日志节点结构,日志节点放置在Cov-Zone中以进行原地更新,减少写放大. 同时为避免日志节点过多而影响整体的读写性能,需要将日志与节点合并,可通过设计新的Sync操作来实现.
3)为不同类型的节点动态选择不同的Zone进行放置,由于叶节点和内部节点更新频率不同,叶节点频繁更新而内部节点相对更新较少,因此我们提出一种动态选择Zone的策略,将频繁更新的叶节点放置在空间占用较少、磨损较少的Zone中,而将内部节点放置于空间占用和磨损相对较多的Zone中,以充分利用ZNS SSD的多分区特性,并降低访问延迟.
4)为管理分散在不同Zone的节点和对应的日志节点,我们为叶节点和内部节点都设计相应的head节点,以记录节点的状态、地址、日志情况,并将head节点存储于Cov-Zone中,以阻断级联更新.
5)由于不同分层的节点处于不同类型的Zone中,对节点的合并控制也将采用不同的策略,对于都在Cov-Zone中的节点,采用严格控制合并策略;对于节点分散在Cov-Zone和Seq-Zone中的节点,则采用机会主义合并策略. 通过不同的控制合并策略,在保证索引性能的同时尽量减少级联更新和写放大.
2.2 ZB+-tree总体结构
在本文的设计中,ZB+-tree总共分为4层,从上到下分别是IH层、Interior层、LH层、Leaf层,其中IH层由interior-head节点组成,LH层由leaf-head节点组成,Interior层由interior节点和interior-log节点组成,Leaf层由leaf节点和leaf-log节点组成. 由于通常来说B+-tree为3~4层,我们设计的IH可以视为B+-tree的根节点. IH层和LH层的节点既可以作为B+-tree的内部节点(里面存着key值和子节点地址),同时又记录着下一层的节点状态和日志情况. 图4展示了ZB+-tree的逻辑结构,其中F代表key值为First(key无法取到的最小值),K代表普通key值.
由于主机程序可以随时知道ZNS SSD上各个Zone的使用情况,这里提出一种动态选择Zone的方法,即当有节点需要从Cov-Zone移入到Seq-Zone中时,如果是leaf节点,则动态选择当前剩余容量最多的Zone(称为Hot-Seq-Zone)放置,如果是interior节点,则动态选择当前剩余容量最少的Zone(称为Cold-Seq-Zone)放置. 之所以这么区分,是因为叶节点和内部节点的更新频率不同,把更新频率高的叶节点往剩余空间最大的Zone内放置可以避免由于Zone空间被迅速占满而导致的Zone-Reset操作. 同时剩余容量多、磨损较少的Zone的读写性能也更好,可以便于leaf的频繁读写.
图5显示了ZB+-tree节点在ZNS SSD上的具体分布情况,其中IH和LH层都位于Cov-Zone中,而interior节点和leaf节点则分布在Cov-Zone和Seq-Zone中. 在本文中,我们约定:白色代表Cov-Zone,深灰色代表Hot-Seq-Zone,浅灰色代表Cold-Seq-Zone,灰色渐变代表处于Seq-Zone中可能是Hot状态也可能是Cold状态.
对于interior节点和leaf节点,设计了相应的状态转换机制,如图6所示. ZB+-tree的interior节点和leaf节点有4种可能的状态.
1)change_state. 处于change_state的节点都位于Cov-Zone中,并且节点未满,可以就地插入、删除、更新. 处于change_state的节点一旦满了,如果是叶节点,则整体移动到Hot-Seq-Zone中;如果是内部节点,则整体移动到Cold-Seq-Zone中,这刚好符合ZNS SSD的顺序写限制.
2)steady_state_hot. 节点位于Hot-Seq-Zone中,且为叶节点,节点一定是满的,并且不可就地更改. 对此状态的节点进行插入将导致其分裂成2个change_state的叶节点,对此状态节点的更新和删除将记录在leaf-log节点中.
3)steady_state_cold. 节点位于Cold-Seq-Zone中,且是内部节点,节点一定是满的,并且不可就地更改. 对此状态的节点进行插入将导致其分裂成2个change_state的内部节点,对此状态节点的更新和删除将记录在interior-log节点中.
4)after_state. 处于该状态的节点一定带有log节点,并且log节点中有删除记录,表示该节点在steady_state经过了删除. 对此状态的节点进行插入会首先触发Sync操作,将节点和对应的log节点合并之后再进行插入,对此状态节点的更新和删除将记录在log节点中.
处于steady_state的节点可能带log节点也可能不带log节点,log节点都位于Cov-Zone中以吸收对Seq-Zone中节点的原地更新,对steady_state的节点进行的更新和删除操作会直接写到对应的log节点里(如果该节点原本不带log节点则为其分配一个log节点). 处于steady_state的节点可以进行3种操作:
1)Sync操作. 将节点与对应的log节点进行合并,Sync操作完成之后节点还是处于steady_state.
2)Delete操作. 在log节点中增加删除记录,Delete操作完成之后节点转换为after_state表示节点已经经过了删除.
3)Split操作. 节点分裂成2个,Split操作完成之后节点转化为2个change_state的节点并移动到Cov-Zone中.
处于after_state的节点可以进行Sync操作,将节点与对应的log节点进行合并. 由于必然经过了删除操作,实际上after_state的节点在逻辑上代表一个未满的节点,因此合并后节点状态转换为change_state.
2.3 ZB+-tree节点结构设计
leaf节点设置为n 个key值和n个value值,leaf-log节点的结构与leaf节点的结构完全相同,之所以设置为结构相同,是因为在Sync操作中需要将leaf-log节点直接转化为leaf节点(具体细节见3. 1节Sync操作),在leaf-log节点中通过位置与leaf节点中的键值对应,如图7所示.
对于leaf-head节点,结构为n+1个key值和n+1个ptr结构,如图8所示. ptr结构中包括3个值:
1)state. 表示子节点所处的状态.
2)addr. 表示子节点所在的位置.
3)log_addr. 表示子节点对应的日志节点所处的位置.
leaf-head节点的第1个key值一定是First(key无法取到的最小值).
interior节点的结构为n+1个key和n+1个addr(子节点的地址),interior-log和interior结构完全相同(同样是因为Sync操作的要求),interior-head的结构和leaf-head的结构相同. 在interior和interior-head节点中,第1个key值同样设置为First. 之所以在interior节点中引入First值是因为在ZB+-tree中的Sync操作需要让日志节点和interior节点保持相同结构以便于log节点与正常的内部节点之间直接进行转化,而日志节点又是通过位置与原内部节点进行对应,所以需要给第1个位置一个key值,代表这是第1个子节点. 之所以在leaf-head和interior-head中引入First值,是因为对于每一个leaf节点或者是interior节点,都需要相应的ptr结构来指示其状态、位置和日志情况. 即使树中只有一个leaf节点或只有一个interior节点也要有对应的ptr结构,因此,需要为第1个ptr结构设置key值为First,表示这是子树中第1个interior节点或者第1个leaf节点,而其他的key值则与正常B+-tree中的key值含义相同,代表对应子树中的最小key值.
3. ZB+-tree 索引操作
本节主要介绍ZB+-tree的Sync,Search,Insert,Delete等操作,并给出了时间性能和空间代价分析.
3.1 Sync
在ZB+-tree中无论是interior-log节点还是leaf-log节点都只记录更新和删除操作,其结构和对应的interior节点或leaf节点相同,当树中存在的日志节点过多时会对Cov-Zone有较大的占用而且会影响整体的查找性能,因为读一个节点之后还需要再读一次日志节点来重建最新的节点. 因此需要将日志节点和对应的节点进行合并,构建最新的节点并释放日志节点空间,我们称之为Sync操作. 在日志节点写满时同样需要将日志与节点进行合并. 在ZB+-tree中,我们设计了2种方式的合并,当日志节点未满时的合并称之为Normal_apply,当日志节点满了时的合并称之为Switch_apply.
1)Normal_apply. 图9给出了Normal_apply的过程. 当节点对应的log节点还未满时,如果节点处于steady_state,表示节点还未进行过删除操作,log节点中无删除记录,因此根据log节点进行更新后的节点仍然处于steady_state,此时将新的节点写回Seq-Zone并释放Cov-Zone上log节点的空间. 如果节点处于after_state,则表示节点经过了删除操作,log节点中一定有删除记录,根据log节点进行更新后的节点转换成change_state并写到Cov-Zone中原log节点的位置. 算法1给出了Normal_apply操作流程.
算法1. Normal_apply_leaf (&leaf, &leaf_log, &leaf_ptr).
输入: 叶节点leaf,对应的日志节点leaf_log,叶节点对应的ptr结构leaf_ptr;
输出: 无.
① if leaf_ptr. state == after_state then
② for each 〈key, value〉 in leaf_log
③ if key ≠ delete_key then
④ update 〈key, value〉 in leaf;
⑤ else delete 〈key, value〉 in leaf ;
⑥ end if
⑦ end for
⑧ write_leaf (leaf_ptr. log_addr, leaf);
⑨ leaf_ptr. state = change_state;
⑩ leaf_ptr. addr = leaf_ptr. log_addr;
⑪ leaf_ptr. log_addr = Null;
⑫ end if
⑬ if leaf_ptr. state == steady_state_hot then
⑭ for each 〈key, value〉 in leaf_log
⑮ update 〈key, value〉 in leaf ;
⑯ end for
⑰ reclaim leaf-log ;
⑱ addr = select the Zone with minimum capacity;
⑲ write_leaf (addr, leaf);
⑳ leaf_ptr. addr = addr;
㉑ leaf_ptr. log_addr = Null;
㉒ end if
2)Switch_apply. 图10给出了Switch_apply的过程. 当节点对应的log节点已经写满时,将发生Switch_apply,与Normal_apply不同的是,Switch_apply不需要读原节点,而只需要读其log节点,因为日志已满说明原节点中所有键值对都已经经过了更新或者删除. 如果原节点处于steady_state,说明对节点的所有键值对都进行过更新操作,因此直接将log节点作为新的节点写入Seq-Zone中 ,并释放在Cov-Zone中log节点的空间. 如果原节点处于after_state,说明其除了进行过更新操作之外还有删除操作,因此把log节点中有删除标记(delete_key)的键值对去除后,转化为change_state的节点,直接写回Cov-Zone中原log节点位置. 算法2给出了Switch_apply.
算法2. Switch_apply_leaf(&leaf_log, &leaf_ptr).
输入: 叶节点对应的日志节点leaf_log,叶节点对应的ptr结构leaf_ptr;
输出: 重建的leaf .
① if leaf_ptr. state == steady_state_hot then
② addr = select the Zone with minimum capacity;
③ write_leaf (addr, leaf_log);
④ reclaim leaf-log;
⑤ leaf_ptr. addr = addr;
⑥ leaf_ptr. log_addr = Null;
⑦ return leaf-log;
⑧ end if
⑨ if leaf_ptr. state == after_state then
⑩ for each 〈key, value〉 in leaf_log
⑪ if key == delete_key then
⑫ delete 〈key, value〉 in leaf-log;
⑬ end if
⑭ end for
⑮ write_leaf (leaf_ptr. log_addr, leaf_log);
⑯ leaf_ptr. state = change_state;
⑰ leaf_ptr. addr = leaf_ptr. log_addr;
⑱ leaf_ptr. log_addr = Null;
⑲ return leaf-log;
⑳ end if
Sync操作(见算法3)对于interior节点和leaf节点都类似,唯一区别在于interior节点中为n+1个键值对而leaf节点中为n个键值对. 对leaf节点的操作为Sync,对于interior节点也有相应的in_Sync操作.
算法3. Sync(&leaf, &leaf_log, &leaf_ptr).
输入: 叶节点leaf,对应的日志节点leaf_log,叶节点对应的ptr结构leaf_ptr;
输出: 重建的leaf.
① if leaf_log is full then
② return Switch_apply_leaf (leaf_log, leaf_ptr);
③ else
④ Normal_apply_leaf (leaf, leaf_log, leaf_ptr);
⑤ return leaf;
⑥ end if
3.2 Search
在ZB+-tree上的Search过程如算法4所示,如果ZB+-tree中还没有IH层,也就是整棵树只有2层(LH层和Leaf层)时,只有一个leaf-head节点,此时直接从leaf-head开始进行搜索. 如果树中存在IH层,则根据要搜索的key值,先从IH中得到interior节点对应的ptr结构,如果interior节点没有log节点,则直接读出interior节点;如果interior节点有log节点,则会触发apply_innner_log()操作,这个操作只是在内存中重建最新的interior节点,并不会对ZNS SSD上的interior节点和日志节点进行修改. 这样做的目的是避免在Search过程中触发写操作. 得到了最新的interior节点之后,在其中根据key值搜索得到leaf-head节点的地址,并读出leaf-head节点,再从leaf-head开始搜索,即触发Search_leaf_head()操作,具体见算法5. 其过程与从IH搜索interior节点的过程类似,其中apply_leaf_log()也只是在内存中建立最新的leaf节点,并不修改ZNS SSD上的leaf节点和log节点. 得到最新的leaf节点之后,再根据key值进行搜索,如果搜到则返回value值,否则返回Null,表示没有搜到.
算法4. Search(key).
输入: 要搜索的key值;
输出: 搜到返回对应的value值,否则返回Null.
① if IH is empty then
② return Search_leaf_head(leaf_head_last, key);
③ else
④ inter_ptr = search IH to get the interior_ptr;
⑤ if inter_ptr. log_addr == Null then
⑥ inter = read_interior(inter_ptr. addr);
⑦ else
⑧ get the log and old_inter;
⑨ inter = apply_inner_log(old_inter,log);
⑩ end if
⑪ leaf_head_addr = search the inter to get the address of leaf_head node;
⑫ leaf_head = read_LH(leaf_head_addr);
⑬ return Search_leaf_head(leaf_head, key);
⑭ end if
算法5. Search_leaf_head (leaf_head, key).
输入: 要搜索的leaf_head节点,要搜索的key值;
输出: 搜到返回对应的value值,否则返回Null.
① leaf_ptr = get the ptr struct of leaf node from leaf_head;
② if leaf_ptr. log_addr == Null then
③ leaf = read_leaf(leaf_ptr. addr);
④ else
⑤ log = read_leaf_log(leaf_ptr. log_addr);
⑥ old_leaf = read_leaf (leaf_ptr. addr);
⑦ leaf = apply_leaf_log(old_leaf, log);
⑧ end if
⑨ for each 〈KEY, VALUE〉 in leaf do
⑩ if key == KEY then
⑪ return VALUE;
⑫ end if
⑬ end for
⑭ return Null.
3.3 Insert
在ZB+-tree上的Insert操作会先根据要插入的key值从根节点一路搜索到叶节点对应的leaf-head节点,搜索过程与Search相同,从leaf-head节点中得到对应的ptr结构,如果叶节点处于change_state,则直接往叶节点中插入键值对,如果满了则移动到Seq-Zone中. 而对于steady_state的叶节点,如果不带log节点,则直接进行Split操作,分裂成2个Cov-Zone上的change_state的节点,如果带log节点,则先进行Sync操作,合并leaf节点和log节点之后再插入键值对,Insert具体细节见算法6,ZB+-tree上的Split操作和正常B+-tree的操作类似,唯一需要注意的是对树中First值的维护.
算法6. Insert(key, value).
输入: key, value 是要插入的键值对;
输出: 无.
① leaf_head = find the leaf_head node of the corres ponding leaf node;
② leaf_ptr = get the ptr struct of the leaf node;
③ leaf = get the leaf node;
④ if leaf_ptr. state == change_state then
⑤ insert 〈key, value〉 in leaf ;
⑥ if leaf is full then
⑦ move leaf to Seq-Zone;
⑧ leaf_ptr. state = steady_state_hot;
⑨ leaf_ptr. addr = new_addr;
⑩ end if
⑪ else
⑫ if leaf_ptr. log_addr == Null then
⑬ Split(leaf, key, value);
⑭ else
⑮ log = get the leaf_log node of the leaf;
⑯ leaf = Sync(leaf, log, leaf_ptr);
⑰ if leaf_ptr. state == change_state then
⑱ insert 〈key, value〉 to leaf;
⑲ if leaf is full then
⑳ move leaf to Seq-Zone;
㉑ leaf_ptr. state = steady_state_hot;
㉒ leaf_ptr. addr = new_addr;
㉓ end if
㉔ end if
㉕ if leaf_ptr. state == steady_state_hot then
㉖ Split(leaf, key, value);
㉗ end if
㉘ end if
㉙ end if
3.4 Delete
ZB+-tree上的删除操作与经典B+-tree算法有许多不同之处,由于不同层的节点所处的Zone性质是不同的,对于不同层的节点在进行Delete操作时,也将采用不同的控制合并的策略.
对于IH层,由于所有的leaf-head节点都处于Cov-Zone中,可以发生原地更新,因此在删除时当节点容量达到下限(lower_bound),我们采取严格控制合并的策略,即leaf-head节点的容量严格控制在lower_bound和full之间(整棵树只有一个leaf-head节点时例外). 而对于Leaf层,在进行删除时我们采取的是机会主义策略控制合并,即查看邻居节点,能合并就合并(二者容量之和小于n),不能合并就不作处理,能满足合并条件则意味着二者必然都处于change_state,即都在Cov-Zone中. 之所以采取不同策略,是因为Leaf层的节点分散在Seq-Zone和Cov-Zone中,且节点可能带log节点也可能不带log节点. 如果不能发生合并,从steady_state的邻居借一个键值对,并不会减少空间的占用,反而将引发级联的修改,还要处理log节点. 在机会主义策略控制下,leaf节点的容量范围为1~n.
对于Interior层的节点我们同样采取机会主义控制合并的策略,理由和Leaf层相同,interior节点的容量范围为1~n+1,当节点只有一个key值时必然是First,算法7展示了对Leaf层的Delete操作.
算法7. Delete(key).
输入: key是要删除的键值对的key值;
输出: 删除成功返回true,删除失败返回false.
① leaf_head = find the leaf_head node of the corresp onding leaf node;
② leaf_ptr = get the ptr struct of the leaf node;
③ leaf = get the leaf node;
④ if find_in_leaf(key, leaf_ptr) == false then
⑤ return false;
⑥ else
⑦ if leaf_ptr. state == change_state then
⑧ delete_in_leaf (leaf, key);
⑨ if leaf is empty then
⑩ LH_delete(LH, key);
⑪ reclaim leaf in Cov-Zone;
⑫ return true;
⑬ else do opportunism merge;
⑭ end if
⑮ end if
⑯ if leaf_ptr. state == after_state then
⑰ log = get the log of leaf;
⑱ Insert delete_key to log;
⑲ return true;
⑳ else
㉑ if leaf_ptr. log_addr == Null then
㉒ Allocate a new log for leaf;
㉓ Insert delete_key to log;
㉔ leaf_ptr. state = after_state;
㉕ leaf_ptr. log_addr = log_addr;
㉖ return true;
㉗ else
㉘ log = get the log of leaf;
㉙ Insert delete_key to log;
㉚ leaf_ptr. state = after_state;
㉛ return true;
㉜ end if
㉝ end if
㉞ end if
3.5 理论代价分析
3.5.1 时间性能
4层的ZB+-tree在查询时最坏情况下需要读6次,即读interior-head节点、读interior节点和interior-log节点、读leaf-head节点、再读leaf节点和leaf-log节点,如果Interior层和Leaf层没有log节点,则与4层的CoW B+-tree一样,需要读4次. 在进行插入操作时,CoW B+-tree需要先4次读出根到叶节点的路径,如果不发生分裂需要4次写操作,在最坏情况下除了根节点外每层节点都分裂,则需要7次写操作. 而ZB+-tree插入时同样需要先进行4~6次读,如果不发生分裂则只需要1次写操作(即直接往Cov-Zone中就地插入),如果发生分裂,最坏情况下需要7次写操作,但大多数情况下不会出现每一层都恰好分裂,因此往往只需要1~2次写操作即可,即将级联的更新阻断在Cov-Zone中. 对CoW B+-tree的删除操作,同样需要先进行4次读,随后如果没发生合并,最好情况下需要4次写操作,如果发生合并或者向邻居借键值对则需要更多的读写. 而ZB+-tree在进行4~6次读操作之后,如果没发生合并,则只需要1次写操作(change_state的节点就地删除后只需要1次写;steady_state的节点则往log节点中插入删除记录,也只需要1次写),如果发生合并,则在机会主义策略控制合并的2层进行的合并操作会更少,并且在这2层不会发生向邻居借键值对的操作,而在LH层和IH层与CoW B+-tree一样都采用了严格控制合并的策略,因此总体上的读写次数也会少于CoW B+-tree. 结合上述分析,可知ZB+-tree在查询性能上与CoW B+-tree接近,在插入时会减少写操作,而在删除时会同时减少读写操作.
3.5.2 空间代价
CoW B+-tree的每次更新都至少需要将从根到叶节点的路径重新写回到ZNS SSD中;而ZB+-tree则将更新吸收在Cov-Zone中,往往只需要在Cov-Zone中进行一次原地更新. 因此,相比于CoW B+-tree,ZB+-tree会大大减少对Seq-Zone的占用.
假设一个节点大小为m,一个4层的ZB+-tree,阶数为n,最坏情况下,每个leaf节点都带有leaf-log节点,且每个interior节点都带有interior-log节点,此时有效节点总大小可估计为
[2(n+1)3+(n+1)2+2(n+1)+1]×m. (1) 1棵4层的CoW B+-tree 有效节点所占空间大小即正常B+-tree所占空间大小,总空间大小可估计为
[(n+1)3+(n+1)2+(n+1)+1]×m. (2) 随着对索引的修改,CoW B+-tree会不断将修改过的节点以及到根节点的路径上的节点写入SSD中. 上述分析仅仅针对其有效节点部分,实际运行中CoW B+-tree还会产生大量无效节点,进一步增加空间占用率. 而ZB+-tree将原地更新吸收在Cov-Zone中,实际运行时虽然也会在Seq-Zone中产生无效节点,但数量会远远少于CoW B+-tree,且其有效节点部分在最坏情况下也和CoW B+-tree处于同一个量级. 因此,总体上,ZB+-tree的空间代价低于CoW B+-tree.
ZB+-tree相比于CoW B+-tree在LH层和IH层增加了一些数据结构用于管理下一层的节点,由于IH层和LH层都是内部节点,相对较少,因此额外的空间代价并不会特别高. ZB+-tree对Cov-Zone的占用情况则主要和工作负载有关,需要通过实验来进行验证.
4. 实验与分析
4.1 实验设置
服务器操作系统为Ubuntu20. 04. 1LTS,内核版本5. 4. 0,gcc版本为 9. 4. 0 . 由于目前还没有商用ZNS SSD,因此我们使用了null_blk来模拟ZNS SSD设备,并使用西部数据的ZBD操作库libzbd进行实验. 具体的实验配置如表1所示.
表 1 实验配置Table 1. Experimental Configuration服务器组件 说明 OS Ubuntu 20.04.1 LTS CPU AMD EPYC 7742 64-Core@2.60GHz DRAM 256GB DDR4 (2666 MHz) GCC version 9.4.0 libzbd version 2.0.3 由于目前还没有提出ZNS SSD感知的索引,因此我们将CoW B+-tree进行了修改使其能够运行在ZNS SSD上. 具体而言,总是将CoW B+-tree节点写入剩余空间最多的Zone,并根据当前Zone的使用情况动态选择要写入的Zone,以尽量减少Zone-Reset操作,同时充分利用ZNS SSD的空间.
实验使用YCSB[29]作为测试负载. YCSB是目前在存储和数据库领域广泛使用的测试负载,它也允许用户自行配置读写比例和访问倾斜性. 在实验中我们利用YCSB生成了5个测试负载,说明如下:
1)Workload1是写密集的负载,包含插入、删除、查找操作,并且3类操作的占比分别为插入40%、删除30%、查找30%.
2)Workload2是读密集的负载,包含插入、删除、查找操作,并且3类操作的占比分别为插入10%、删除10%、查找80%.
3)Workload3是读写均衡的负载,包含插入、删除、查找操作,并且3类操作的占比分别为插入25%、删除25%、查找50%.
4)Workload4是纯写负载,包含插入、删除操作,并且2类操作的占比分别为插入50%、删除50%.
5)Workload5是纯读负载,只包含查找操作.
我们在5种负载上分别测试不同数据量和不同数据分布下的性能. 数据量分别设置为50万、150万、250万键值记录对,数据分布采用了文献[29]中的3类分布,包括Zipfian分布、uniform分布、latest分布. 在YCSB中执行测试前会先加载数据,以下的实验数据都是指在数据加载完成之后再进行各种操作时统计的数据. CoW B+-tree实验中模拟的ZBD设备为40个Seq-Zone以及1个Cov-Zone(为保证实验对比公平,我们把Cov-Zone按照Seq-Zone的顺序写模式来使用),每个Zone大小为2 GB;ZB+-tree实验中模拟的ZBD设备为40个Seq-Zone以及1个Cov-Zone,大小都为2GB. 在模拟器中块大小统一设置为4 KB.
4.2 实验结果分析
4.2.1 运行时间比较
图11~13给出了ZB+-tree和CoW B+-tree的运行时间对比. 可以看到,在不同数据规模和不同工作负载下,ZB+-tree运行时间都要小于CoW B+-tree. 虽然ZB+-tree相对于CoW B+-tree在查询时可能要多读1~2次log节点,但由于在插入和删除操作中减少了读写次数,且SSD读操作比写操作要快[7],因此ZB+-tree总体性能要好于CoW B+-tree,与理论分析相符.
4.2.2 读写次数比较
图14~16和图17~19分别显示了ZB+-tree和CoW B+-tree的读操作次数和写操作次数对比. ZB+-tree比CoW B+-tree在读操作次数上减少了25%左右,这是因为在删除操作时,相比于CoW B+-tree需要级联的读写,ZB+-tree在机会主义策略控制的2层减少了合并操作并去除了从邻居借键值对的操作. 在写操作次数方面,ZB+-tree在不同数据规模时平均节约了75%的写,这主要是因为ZB+-tree通过LH层和IH层的独特设计将级联的更新进行了2次阻断. 由于LH和IH都位于Cov-Zone中,允许进行原地更新,因此节点修改后的地址不发生改变,所以级联的更新被阻断在了Cov-Zone中. 相比于CoW B+-tree每次更新至少需要4次写操作,ZB+-tree只需要1~2次写操作即可.
4.2.3 空间占用率比较
表2~6分别显示了在Zipfian分布下,5种工作负载和3种数据规模下ZB+-tree和CoW B+-tree对于所有Seq-Zone的平均占用情况,其他数据分布下实验结果类似. 可以看出,随着数据规模的增加,CoW B+-tree将快速占用Seq-Zone的空间. 在当数据规模为250万时,CoW B+-tree对Seq-Zone的占用率最大达到了0.739822(Workload4),最小也有0.450943(Workload5). 因此,在大规模写入负载下,CoW B+-tree的Seq-Zone将被快速写满,从而触发耗时的Zone垃圾回收和Zone-Reset操作,导致系统性能出现急剧下降.
表 2 不同数据规模下Workload1下对Seq-Zone的占用率Table 2. Occupancy Rate of Seq-Zone Under Workload1 at Different Data Scales方案 50万 150万 250万 ZB+-tree 0.000469 0.001543 0.002439 CoW B+-tree 0.120431 0.402918 0.678303 表 3 不同数据规模下Workload2下对Seq-Zone的占用率Table 3. Occupancy Rate of Seq-Zone Under Workload2 at Different Data Scales方案 50万 150万 250万 ZB+-tree 0.000385 0.001161 0.001882 CoW B+-tree 0.086633 0.300859 0.513317 表 4 不同数据规模下Workload3下对Seq-Zone的占用率Table 4. Occupancy Rate of Seq-Zone Under Workload3 at Different Data Scales方案 50万 150万 250万 ZB+-tree 0.000417 0.001371 0.002141 CoW B+-tree 0.103262 0.353144 0.599624 表 5 不同数据规模下Workload4下对Seq-Zone的占用率Table 5. Occupancy Rate of Seq-Zone Under Workload4 at Different Data Scales方案 50万 150万 250万 ZB+-tree 0.000498 0.001617 0.002613 CoW B+-tree 0.132233 0.432777 0.739822 表 6 不同数据规模下Workload5下对Seq-Zone的占用率Table 6. Occupancy Rate of Seq-Zone Under Workload5 at Different Data Scales方案 50万 150万 250万 ZB+-tree 0.000363 0.001025 0.001729 CoW B+-tree 0.073475 0.262106 0.450943 此外,ZB+-tree对于所有Seq-Zone的平均占用率在5种不同负载下均低于0.0027,远远低于CoW B+-tree. 而且ZB+-tree的Seq-Zone占用率不会因为数据规模的增大而出现Seq-Zone被快速写满的情况. 因此,在系统运行过程中,ZB+-tree一般不会频繁触发Zone垃圾回收和Zone-Reset操作,从而可以在保证系统性能和稳定性的同时节省较多的Seq-Zone空间,提升空间效率.
4.2.4 Cov-Zone和Seq-Zone的不同比例
由于当前还没有ZNS SSD的真实设备,因此Cov-Zone和Seq-Zone的比例可能会发生变化,因此,我们改变Cov-Zone和Seq-Zone的比例来测试ZB+-tree的效果.
在本节实验中,我们测试了3种不同的Cov-Zone和Seq-Zone的比例,分别是1∶32,1∶48,1∶64,数据分布为Zipfian分布,数据量设置为100万键值对,测试的负载为Workload1和Workload2,如图20所示.
从图20可以看出,在不同比例下,ZB+-tree的运行时间都要少于CoW B+-tree,更改Cov-Zone与Seq-Zone的比例并不影响读写次数,只会影响对Seq-Zone的占用率大小,具体结果如表7和表8所示.
表 7 不同比例下Workload1下对Seq-Zone的占用率Table 7. Occupancy Rate of Seq-Zone Under Workload1 at Different Ratios方案 1∶32 1∶48 1∶64 ZB+-tree 0.001125 0.000751 0.000562 CoW B+-tree 0.321145 0.216282 0.163043 表 8 不同比例下Workload2下对Seq-Zone的占用率Table 8. Occupancy Rate of Seq-Zone Under Workload2 at Different Ratios方案 1∶32 1∶48 1∶64 ZB+-tree 0.000959 0.000638 0.000478 CoW B+-tree 0.240339 0.161861 0.122018 由表7和表8可知,在不同比例下,ZB+-tree对Seq-Zone的占用率都远小于CoW B+-tree. 同时,我们发现更改Seq-Zone和Cov-Zone的比例对ZB+-tree的空间效率影响不大,表明ZB+-tree能够适应不同的Zone配置.
4.2.5 Zone-Reset次数比较
在4.2.1~4.2.4节的实验中,ZB+-tree和CoW B+-tree都采用了动态选择Zone的策略. 该策略将叶子节点和内部节点分开存放以充分利用ZNS SSD内部各个Zone的空间,并且尽量减少Zone-Reset操作,同时将不同特性的节点放在不同的Zone来降低访问延迟.
本节的实验旨在进一步验证ZB+-tree在不采用动态选择Zone策略时的性能. 为了保证实验的公平性,CoW B+-tree同样不采用动态选择Zone的策略. 此外,为了体现实验完整性,本实验中我们通过统计Zone-Reset的次数来对比ZB+-tree和CoW B+-tree.
对于ZB+-tree,设置1个Seq-Zone和1个Cov-Zone,大小都为2 GB,然后统计Zone-Reset次数. 对于CoW B+-tree,设置1个Seq-Zone和1个Cov-Zone(按顺序写模式使用),大小都为2 GB,当某一个Zone写满时,读出有效节点写到另一个Zone,然后进行Zone-Reset操作并统计次数. 实验数据量分别为250万、500万、750万键值对,数据分布设置为Zipfian,测试负载为Workload1和Workload2.
实验结果表明,在Workload1下,在数据量分别为250万、500万、750万键值对时,ZB+-tree均没有触发Zone-Reset操作;而CoW B+-tree则分别触发了10,22,37次Zone-Reset操作. 在Workload2下,ZB+-tree在数据量为250万、500万、750万键值对时依然没有触发Zone-Reset操作;而CoW B+-tree则分别触发了2,6,9次Zone-Reset操作.
因此,CoW B+-tree比ZB+-tree会更容易触发Zone-Reset操作,而ZB+-tree即使不进行动态选择优化,也能够在插入大量键值时不触发Zone-Reset操作. 由于Zone-Reset和Zone垃圾回收操作的时间延迟高,因此ZB+-tree相对于CoW B+-tree具有更好的时间性能,尤其是在写密集型负载下,这种优势更加明显.
4.2.6 对Cov-Zone的占用分析
在本节实验中,我们创建了40个Seq-Zone和1个Cov-Zone(每个Zone大小都为2 GB),并分别在100万、250万、500万、750万、1000万键值对的数据集上运行在Zipfian分布下的Workload1和Workload2,然后统计ZB+-tree对Cov-Zone的空间占用率.
实验结果如图21所示. 随着数据规模的增大,ZB+-tree对Cov-Zone的空间占用率呈线性增长趋势,这是因为数据规模越大,负载中涉及更新的键值越多,因此对Cov-Zone的使用量也增大. 但是,即使数据规模达到了1000万,对Cov-Zone的占用率也没有超过0.50,这表明Cov-Zone的空间不会用满,因此可以满足绝大部分应用的需要.
5. 结束语
本文提出了一种ZNS SSD感知的新型索引结构ZB+-tree,通过合理利用ZNS SSD中的常规Zone和顺序Zone的特性来吸收对Zone的随机写操作、阻断级联更新、减少读写次数. 我们为2种Zone内的节点设计了不同的节点结构,并通过将不同访问频率的节点分散在不同Zone中以降低访问延迟,并且满足ZNS SSD顺序写的限制. 我们利用ZNS SSD模拟器展开了实验,并对比了ZB+-tree和传统CoW B+-tree,结果表明ZB+-tree在运行时间和读写次数上均优于CoW B+-tree. 同时,ZB+-tree具有更低的Seq-Zone空间占用率,可以有效减少系统运行过程中进行的垃圾回收和Zone-Reset操作,提高系统性能,并且对Cov-Zone的占用率也能够在数据规模增大时始终保持在0.50以下,可以满足大多数应用的需要.
未来的工作将主要集中在3个方向:1)为ZB+-tree增加合理的GC机制;2)在选择节点放置时设置更加合理的Zone选择方案;3)在真实的ZNS SSD设备上进行实验.
作者贡献声明:刘扬提出算法思路,完成实验并撰写论文初稿;金培权提出指导意见并修改论文.
-
表 1 现有联邦学习综述研究对比
Table 1 Comparison of Studies on Existing Federated Learning Reviews
表 2 联邦学习客户端选择方案比较
Table 2 Comparison of Federated Learning Client Selection Schemes
方案类型 方案思路 客户端目标 服务器端目标 计算与通信资源优化 剔除不必要的模型更新[17]、客户端分层[18]、控制学习节奏[19]、基于聚类实现自组织学习[20]、长期能耗约束下的带宽分配[21]、设置学习期限[25]、基于设备的计算能力进行选择[26] 激励机制 契约理论[28]:基于信誉进行激励反馈鼓励可靠的终端设备参与学习 奖励和能耗的平衡 最大化由全局迭代时间与补偿给客户端的报酬之间的差异所获得的利润 Stackelberg博弈[31]:实现高质量无线通信效率的全局模型 奖励(即准确率等级)和成本(即通信和计算成本)的平衡 获取不同准确率的凹函数 拍卖理论[32-33]:最大限度地降低客户端投标的成本 奖励和成本的平衡 最小化投标成本 修订目标函数权重[30] 为了引入潜在的公平性并降低训练精度方差,通过在q-FedAvg中分配更高的相对权重来强调具有高经验损失的本地设备 表 3 模型压缩技术总结
Table 3 Summary of Model Compression Techniques
方法 优化手段 优缺点 结构化和草图更新机制[48] 压缩传输模型,提升客户端到服务器的通信效率 客户端到服务器参数压缩;代价是复杂的模型结构可能出现收敛问题 服务端-客户端更新[49] 压缩传输模型,提升服务器到客户端的通信效率 服务器到客户端参数压缩;代价是准确性降低,可能有收敛问题
草图[50]使用计数草图压缩模型更新,然后利用草图的可合并性来组合来自客户端的模型更新 解决了客户端参与稀少而导致的收敛问题,建立在假设网络;已经尽了最大努力使通信效率最大化;可能遇到网络瓶颈 Adam[1] 通过使用Adam优化和压缩方案改进了FedAvg算法 Adam优化加快了收敛速度,压缩方案降低了通信开销 模型蒸馏[51-52] 交换模型输出模型状态信息,即其有效载荷大小仅取决于输出维度的标签数量;然后使用联邦蒸馏实现权重更新规则 解决了数据独立同分布的问题;代价是无线信道对模型训练精度的影响 表 4 模型训练优化方法及特点
Table 4 Optimization Methods and Characteristics of Model Training
表 5 主要联邦学习模型聚合技术的比较总结
Table 5 A Comparative Summary of Major Federated Learning Mode Aggregation Technologies
聚合技术 优化角度 主要思想 特点 FedAvg[7] 统计异构性 客户端对其本地数据执行多个批处理更新,并与服务器传输更新的权重,而不是梯度. 从统计的角度看,FedAvg已被证明设备间数据分布不一致的情况下开始发散;从系统的角度看,FedAvg不允许参与学习的设备根据其底层系统限制执行可变数量的本地更新. FedProx[55] 统计异构性 在每个客户端上的本地训练子问题中添加一项,以限制每个本地模型更新对全局模型的影响. FedProx的提出是为了提高统计异质性数据的收敛性. 与FedAvg类似,在FedProx中,所有设备在全局聚合阶段的权重相等,因为没有考虑设备功能(例如硬件、电量)的差异. FedPAQ[53] 通信 在与服务器共享更新之前,允许客户端在模型上执行多个本地更新. 与FedAvg类似,FedPAQ中的新全局模型为局部模型的平均值,但这在强凸和非凸设置中都需要很高的复杂性. FedMA[54] 统计异构性 在执行聚合前考虑神经元的排列不变性,并允许全局模型大小自适应. 使用贝叶斯非参数机制根据数据分布的异构性调整中心模型的大小;FedMA中的贝叶斯非参数机制容易受到模型中毒攻击,在这种情况下,对手可以很容易地欺骗系统扩展全局模型,以适应任何中毒的本地模型. Turbo-Aggregate[62] 通信和安全 一种多组策略,其中客户端被分成几个组,模型更新以循环方式在组之间共享和一种保护用户隐私数据的附加秘密共享机制. Turbo-Aggregate非常适合无线拓扑,在这种拓扑中,网络条件和用户可用性可能会快速变化. Turbo-Aggregate中嵌入的安全聚合机制虽然能有效处理用户流失,但无法适应加入网络的新用户. 因此,通过重新配置系统规范(即多组结构和编码设置)以确保满足弹性和隐私保证,开发一种可自我配置的协议来扩展它的. 自适应聚合[63] 通信和统计
异构性在给定的资源预算下确定局部更新和全局参数聚合之间的最佳折中的自适应控制算法. 改变了全局聚合频率,以确保期望的模型性能,同时确保在FL训练过程中有效利用可用资源,例如能量,可用于边缘计算中的FL. 自适应聚合方案的收敛性保证目前只考虑凸损失函数. HierFAVG[65] 通信 一种分层的客户端—边缘—云聚合体系结构,边缘服务器聚合其客户端的模型更新,然后将它们发送到云服务器进行全局聚合. 这种多层结构能够在现有的客户端—云架构上实现更高效的模型交换. HierFAVG仍然容易出现掉队和终端设备掉线的问题. 自适应任务分配[66] 设备异构性、通信、计算 在保证异构信道上的数据分发/聚合总次数和异构设备上的本地计算,在延迟约束下最大化学习精度. 自适应任务分配方案,该方案将最大化分布式学习者的本地学习迭代次数(从而提高学习精度),同时遵守时间限制. 该方案没考虑动态参数,如变化的信道状态和数据到达时间. 公平聚合[67] 设备异构性、任务异构性、通信、计算 一种具有自适应学习率的定制学习算法,以适应不同的精度要求,并加快本地训练过程. 为边缘服务器提出了一个公平的全局聚合策略,以最小化异构终端设备之间的精度差异. 一种学习率自适应的CuFL算法,以最小化总学习时间. 考虑到终端设备的任务异质性,CuFL允许终端设备在满足其独特的精度要求后提前退出训练. 该方案没考虑动态参数,如变化的信道状态和数据到达时间. 表 6 边缘网络下基于联邦学习的无人机应用
Table 6 Unmanned Aerial Vehicle Application Based on Federated Learning in Edge Network
挑战 联邦学习 结果 客户端 服务器 数据特征 本地和全局模型 边缘内容
缓存[95-96]UAVs 边缘服务器
边缘服务器内容特征(新鲜度、位置、占用内存、内容请求历史等) 内容受欢迎度预测 有效地确定哪些内容应该存储在每个缓存中 无人机作
为基站[93]地面用户 关于地面用户可移动性的信息(位置、方向、速度等) 地面用户模式(移动性
和内容负荷)的预测优化无人机基站部署、提高网络覆盖和连通性、有效提供热门内容. 无人机轨
迹规划[92]UAVs 边缘服务器或云 源、目的点位置、无人机机动性信息(速度、方向、位置、高度等)、无人机能量消耗、物理障碍、服务需求等. 每条潜在路径的性能预测 无人机选择最优轨迹、优化服务性能、优化无人机能耗 -
[1] Mills J, Hu Jia, Min Geyong. Communication-efficient federated learning for wireless edge intelligence in IoT[J]. IEEE Internet of Things Journal, 2019, 7(7): 5986−5994
[2] Covington P, Adams J, Sargin E. Deep neural networks for YouTube recommendations[C] //Proc of the 10th ACM Conf on Recommender Systems. New York: ACM, 2016: 191−198
[3] Parkhi O M, Vedaldi A, Zisserman A. Deep face recognition[C] //Proc of the 15th IEEE Int Conf on Computer Vision Workshop. Piscataway, NJ: IEEE, 2015: 258−266
[4] Mowla N I, Tran N H, Doh I, et al. Federated learning-based cognitive detection of jamming attack in flying ad-hoc network[J]. IEEE Access, 2020, 8: 4338−4350 doi: 10.1109/ACCESS.2019.2962873
[5] Brik B, Ksentini A, Bouaziz M. Federated learning for UAVs-enabled wireless networks: Use cases, challenges, and open problems[J]. IEEE Access, 2020, 8: 53841−53849 doi: 10.1109/ACCESS.2020.2981430
[6] Abbas N, Zhang Yan, Taherkordi A, et al. Mobile edge computing: A survey[J]. IEEE Internet of Things Journal, 2017, 5(1): 450−465
[7] Mcmahan B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data[C] //Proc of the 20th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2017 : 1273−1282.
[8] Yang Qiang, Liu Yang, Chen Tianjian, et al. Federated machine learning: Concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): 1−19
[9] Zhou Zhi, Yang Song, Pu Lingjun, et al. CEFL: Online admission control, data scheduling, and accuracy tuning for cost-efficient federated learning across edge nodes[J]. IEEE Internet of Things Journal, 2020, 7(10): 9341−9356 doi: 10.1109/JIOT.2020.2984332
[10] Ruder S. An overview of gradient descent optimization algorithms[J]. arXiv preprint, arXiv: 1609.04747, 2016
[11] Lim W Y B, Luong N C, Hoang D T, et al. Federated learning in mobile edge networks: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2020, 22(3): 2031−2063
[12] Li Tian, Sahu A K, Talwalkar A, et al. Federated learning: Challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50−60 doi: 10.1109/MSP.2020.2975749
[13] Li Qinbin, Wen Zeyi, Wu Zhaomin, et al. A survey on federated learning systems: Vision, hype and reality for data privacy and protection[J]. arXiv preprint, arXiv: 1907.09693, 2019
[14] Wang Xiaofei, Han Yiwen, Wang Chenyang, et al. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning[J]. IEEE Network, 2019, 33(5): 156−165 doi: 10.1109/MNET.2019.1800286
[15] Kairouz P, Mcmahan H B, Avent B, et al. Advances and open problems in federated learning[J]. arXiv preprint, arXiv: 1912.04977, 2019
[16] 王艳,李念爽,王希龄,等. 编码技术改进大规模分布式机器学习性能综述[J]. 计算机研究与发展,2020,57(3):542−561 doi: 10.7544/issn1000-1239.2020.20190286 Wang Yan, Li Nianshuang, Wang Xiling, et al. Coding-based performance improvement of distributed machine learning in large-scale clusters[J]. Journal of Computer Research and Development, 2020, 57(3): 542−561 (in Chinese) doi: 10.7544/issn1000-1239.2020.20190286
[17] Jin Yibo, Jiao Lei, Qian Zhuzhong, et al. Resource-efficient and convergence-preserving online participant selection in federated learning[C] //Proc of the 40th IEEE Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE, 2020: 606−616
[18] Chai Z, Ali A, Zawad S, et al. TiFL: A tier-based federated learning system[C] //Proc of the 29th Int Symp on High-Performance Parallel and Distributed Computing. New York: ACM, 2020: 125−136
[19] Li Li, Xiong Haoyi, Guo Zhishan, et al. SmartPC: Hierarchical pace control in real-time federated learning system[C] //Proc of the 40th IEEE Real-Time Systems Symp (RTSS). Piscataway, NJ: IEEE, 2019: 406−418
[20] Khan L U, Alsenwi M, Han Zhu, et al. Self organizing federated learning over wireless networks: A socially aware clustering approach[C] //Proc of the 34th Int Conf on Information Networking (ICOIN). Piscataway, NJ: IEEE, 2020: 453−458
[21] Xu Jie, Wang Heqiang. Client selection and bandwidth allocation in wireless federated learning networks: A long-term perspective[J]. IEEE Transactions on Wireless Communications, 2020, 20(2): 1188−1200
[22] Damaskinos G, Guerraoui R, Kermarrec A M, et al. Fleet: Online federated learning via staleness awareness and performance prediction[C] //Proc of the 21st Int Middleware Conf. New York: ACM, 2020: 163−177
[23] Sprague M R, Jalalirad A, Scavuzzo M, et al. Asynchronous federated learning for geospatial applications[C] //Proc of the Joint European Conf on Machine Learning and Knowledge Discovery in Databases. Cham, Switzerland: Springer, 2018: 21−28
[24] Wu Wentai, He Ligang, Lin Weiwei, et al. Safa: A semi-asynchronous protocol for fast federated learning with low overhead[J]. IEEE Transactions on Computers, 2020, 70(5): 655−668
[25] Nishio T, Yonetani R. Client selection for federated learning with heterogeneous resources in mobile edge[C/OL] //Proc of the 53rd IEEE Int Conf on Communications. Piscataway, NJ: IEEE, 2019[2022-09-05].https://ieeexplore.ieee.org/document/8761315
[26] Yoshida N, Nishio T, Morikura M, et al. Hybrid-FL for wireless networks: Cooperative learning mechanism using non-IID data[C/OL] //Proc of the 54th IEEE Int Conf on Communications (ICC). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9149323
[27] Khan L U, Pandey S R, Tran N H, et al. Federated learning for edge networks: Resource optimization and incentive mechanism[J]. IEEE Communications Magazine, 2020, 58(10): 88−93 doi: 10.1109/MCOM.001.1900649
[28] Kang Jiawen, Xiong Zehui, Niyato D, et al. Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory[J]. IEEE Internet of Things Journal, 2019, 6(6): 10700−10714 doi: 10.1109/JIOT.2019.2940820
[29] Kim H, Park J, Bennis M, et al. Blockchained on-device federated learning[J]. IEEE Communications Letters, 2019, 24(6): 1279−1283
[30] Li Tian, Sanjabi M, Beirami A, et al. Fair resource allocation in federated learning[J]. arXiv preprint, arXiv: 1905.10497, 2020
[31] Pandey S R, Tran N H, Bennis M, et al. A crowdsourcing framework for on-device federated learning[J]. IEEE Transactions on Wireless Communications, 2020, 19(5): 3241−3256 doi: 10.1109/TWC.2020.2971981
[32] Le T H T, Tran N H, Tun Y K, et al. Auction based incentive design for efficient federated learning in cellular wireless networks[C/OL] //Proc of the IEEE Wireless Communications and Networking Conf (WCNC). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9120773
[33] Jiao Yutao, Wang Ping, Niyato D, et al. Toward an automated auction framework for wireless federated learning services market[J]. IEEE Transactions on mobile Computing, 2020, 20(10): 3034−3048
[34] Gao Xiaozheng, Wang Ping, Niyato D, et al. Auction-based time scheduling for backscatter-aided RF-powered cognitive radio networks[J]. IEEE Transactions on Wireless Communications, 2019, 18(3): 1684−1697 doi: 10.1109/TWC.2019.2895340
[35] Ko BongJun, Wang Shiqiang, He Ting, et al. On data summarization for machine learning in multi-organization federations[C] //Proc of the 7th IEEE Int Conf on Smart Computing (SMARTCOMP). Piscataway, NJ: IEEE, 2019: 63−68
[36] Valerio L, Passarella A, Conti M. Optimal trade-off between accuracy and network cost of distributed learning in mobile edge Computing: An analytical approach[C/OL] //Proc of the 18th Int Symp on a World of Wireless, Mobile and Multimedia Networks (WoWMoM). Piscataway, NJ: IEEE, 2017[2022-09-05].https://ieeexplore.ieee.org/abstract/document/7974310
[37] Skatchkovsky N, Simeone O. Optimizing pipelined computation and communication for latency-constrained edge learning[J]. IEEE Communications Letters, 2019, 23(9): 1542−1546 doi: 10.1109/LCOMM.2019.2922658
[38] Huang Yutao, Zhu Yifei, Fan Xiaoyi, et al. Task scheduling with optimized transmission time in collaborative cloud-edge learning[C/OL] //Proc of the 27th Int Conf on Computer Communication and Networks (ICCCN). Piscataway, NJ: IEEE, 2018[2022-09-05].https://ieeexplore.ieee.org/abstract/document/8487352
[39] Dey S, Mukherjee A, Pal A, et al. Partitioning of CNN models for execution on fog devices[C] //Proc of the 1st ACM Int Workshop on Smart Cities and Fog Computing. New York: ACM, 2018: 19−24
[40] Zhang Shigeng, Li Yinggang, Liu Xuan, et al. Towards real-time cooperative deep inference over the cloud and edge end devices[J]. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2020, 4(2): 1−24
[41] Dey S, Mukherjee A, Pal A. Embedded deep inference in practice: Case for model partitioning[C] //Proc of the 1st Workshop on Machine Learning on Edge in Sensor Systems. New York: ACM, 2019: 25−30
[42] Lin Bing, Huang Yinhao, Zhang Jianshan, et al. Cost-driven off-loading for DNN-based applications over cloud, edge, and end devices[J]. IEEE Transactions on Industrial Informatics, 2019, 16(8): 5456−5466
[43] Wang Lingdong, Xiang Liyao, Xu Jiayu, et al. Context-aware deep model compression for edge cloud computing[C] //Proc of the 40th Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE, 2020: 787−797
[44] Wang Ji, Zhang Jianguo, Bao Weidong, et al. Not just privacy: Improving performance of private deep learning in mobile cloud[C] //Proc of the 24th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining. New York: ACM, 2018: 2407−2416
[45] Zhang Jiale, Wang Junyu, Zhao Yanchao, et al. An efficient federated learning scheme with differential privacy in mobile edge computing[C] //Proc of the Int Conf on Machine Learning and Intelligent Communications. Berlin: Springer, 2019: 538−550
[46] Ivkin N, Rothchild D, Ullah E, et al. Communication-efficient distributed SGD with sketching[J]. Advances in Neural Information Processing Systems, 2019, 32: 13144−13154
[47] Zhang Boyu, Davoodi A, Hu Yuhen. Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2018, 8(4): 836−84 doi: 10.1109/JETCAS.2018.2833383
[48] Konen J, Mcmahan H B, Yu F X, et al. Federated learning: Strategies for improving communication efficiency[J]. arXiv preprint, arXiv: 1610.05492, 2016
[49] Caldas S, Konečny J, Mcmahan H B, et al. Expanding the reach of federated learning by reducing client resource requirements[J]. arXiv preprint, arXiv: 1812.07210, 2018
[50] Rothchild D, Panda A, Ullah E, et al. FetchSGD: Communication-efficient federated learning with sketching[C] //Proc of the 37th Int Conf on Machine Learning. New York: PMLR, 2020: 8253−8265
[51] Jeong E, Oh S, Kim H, et al. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-IID private data[J]. arXiv preprint, arXiv: 1811.11479, 2018
[52] Ahn J H, Simeone O, Kang J. Wireless federated distillation for distributed edge learning with heterogeneous data[C/OL] //Proc of the 30th Annual Int Symp on Personal, Indoor and Mobile Radio Communications (PIMRC). Piscataway, NJ: IEEE, 2019[2022-09-05]. https://ieeexplore.ieee.org/abstract/document/8904164
[53] Reisizadeh A, Mokhtari A, Hassani H, et al. FedPAQ: A communication-efficient federated learning method with periodic averaging and quantization[C] //Proc of the 23rd Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2020: 2021−2031
[54] Karimireddy S P, Kale S, Mohri M, et al. SCAFFOLD: Stochastic controlled averaging for federated learning[C] //Proc of the 37th Int Conf on Machine Learning. New York: PMLR, 2020: 5132−5143
[55] Li Tian, Sahu A K, Zaheer M, et al. Federated optimization in heterogeneous networks[J]. Proceedings of Machine Learning and Systems, 2020, 2: 429−450
[56] Wang Hongyi, Yurochkin M, Sun Yuekai, et al. Federated learning with matched averaging[J]. arXiv preprint, arXiv: 2002.06440, 2020
[57] Pillutla K, Kakade S M, Harchaoui Z. Robust aggregation for federated learning[J]. IEEE Transactions on Signal Processing, 2022, 70: 1142−1154 doi: 10.1109/TSP.2022.3153135
[58] Grama M, Musat M, Muñoz-González L, et al. Robust aggregation for adaptive privacy preserving federated learning in healthcare[J]. arXiv preprint, arXiv: 2009.08294, 2020
[59] Ang Fan, Chen Li, Zhao Nan, et al. Robust federated learning with noisy communication[J]. IEEE Transactions on Communications, 2020, 68(6): 3452−3464 doi: 10.1109/TCOMM.2020.2979149
[60] Lu Yanyang, Fan Lei. An efficient and robust aggregation algorithm for learning federated CNN[C/OL] //Proc of the 3rd Int Conf on Signal Processing and Machine Learning. New York: ACM, 2020[2022-09-05].https://dl.acm.org/doi/abs/10.1145/3432291.3432303
[61] Chen Zhou, Lv Na, Liu Pengfei, et al. Intrusion detection for wireless edge networks based on federated learning[J]. IEEE Access, 2020, 8: 217463−217472 doi: 10.1109/ACCESS.2020.3041793
[62] So J, Güler B, Avestimehr A S. Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning[J]. IEEE Journal on Selected Areas in Information Theory, 2021, 2(1): 479−489 doi: 10.1109/JSAIT.2021.3054610
[63] Wang Shiqiang, Tuor T, Salonidis T, et al. Adaptive federated learning in resource constrained edge computing systems[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(6): 1205−1221 doi: 10.1109/JSAC.2019.2904348
[64] Zhang Xiongtao, Zhu Xiaomin, Wang Ji, et al. Federated learning with adaptive communication compression under dynamic bandwidth and unreliable networks[J]. Information Sciences, 2020, 540(5): 242−262
[65] Liu Lumin, Zhang Jun, Song Shenghui, et al. Client-edge-cloud hierarchical federated learning[C/OL] //Proc of the 54th IEEE Int Conf on Communications (ICC). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9148862
[66] Mohammad U, Sorour S. Adaptive task allocation for mobile edge learning[C/OL] //Proc of the Wireless Communications and Networking Conf Workshop (WCNCW). Piscataway, NJ: IEEE, 2019[2022-09-05].https://ieeexplore.ieee.org/abstract/document/8902527
[67] Jiang Hui, Liu Min, Yang Bo, et al. Customized federated learning for accelerated edge computing with heterogeneous task targets[J]. Computer Networks, 2020, 183(12): 107569−107569
[68] Lin Yujun, Han Song, Mao Huizi, et al. Deep gradient compression: Reducing the communication bandwidth for distributed training[J]. arXiv preprint, arXiv: 1712.01887, 2017
[69] Liu Wei, Chen Li, Chen Yunfei, et al. Accelerating federated learning via momentum gradient descent[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(8): 1754−1766 doi: 10.1109/TPDS.2020.2975189
[70] Abdi A, Saidutta Y M, Fekri F. Analog compression and communication for federated learning over wireless MAC[C/OL] //Proc of the 21st Int Workshop on Signal Processing Advances in Wireless Communications (SPAWC). Piscataway, NJ: IEEE, 2020[2022-09-05]. https://ieeexplore.ieee.org/abstract/document/9154309
[71] Alistarh D, Grubic D, Li J, et al. QSGD: Communication-efficient SGD via gradient quantization and encoding[J]. Advances in Neural Information Processing Systems, 2017, 30: 1709−1720
[72] Bernstein J, Wang Yuxiang, Azizzadenesheli K, et al. signSGD: Compressed optimisation for non-convex problems[C] //Proc of the 35th Int Conf on Machine Learning. New York: PMLR, 2018: 560−569
[73] Zhu Guangxu, Wang Yong, Huang Kaibin. Broadband analog aggregation for low-latency federated edge learning[J]. IEEE Transactions on Wireless Communications, 2019, 19(1): 491−506
[74] Amiri M M, Gündüz D. Federated learning over wireless fading channels[J]. IEEE Transactions on Wireless Communications, 2020, 19(5): 3546−3557 doi: 10.1109/TWC.2020.2974748
[75] Wu Jiaxiang, Huang Weidong, Huang Junzhou, et al. Error compensated quantized SGD and its applications to large-scale distributed optimization[C] //Proc of the 35th Int Conf on Machine Learning. New York: PMLR, 2018: 5325−5333
[76] Basu D, Data D, Karakus C, et al. Qsparse-local-SGD: Distributed SGD with quantization, sparsification, and local computations[J]. arXiv preprint, arXiv: 1906.02367, 2019
[77] Xin Ran, Kar S, Khan U A. An introduction to decentralized stochastic optimization with gradient tracking[J]. arXiv preprint, arXiv: 1907.09648, 2019
[78] Haddadpour F, Kamani M M, Mokhtari A, et al. Federated learning with compression: Unified analysis and sharp guarantees[C] //Proc of the 24th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2021: 2350−2358
[79] Tang Hanlin, Lian Xiangru, Yan Ming, et al. D2: Decentralized training over decentralized data[C] //Proc of the 35th Int Conf on Machine Learning. New York: PMLR, 2018: 4848−4856
[80] Amiri M M, Gündüz D. Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air[J]. IEEE Transactions on Signal Processing, 2020, 68(1): 2155−2169
[81] Zhu Guangxu, Du Yuqing, Gündüz D, et al. One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis[J]. IEEE Transactions on Wireless Communications, 2020, 20(3): 2120−2135
[82] Lu Yunlong, Huang Xiaohong, Dai Yueyue, et al. Differentially private asynchronous federated learning for mobile edge computing in urban informatics[J]. IEEE Transactions on Industrial Informatics, 2019, 16(3): 2134−2143
[83] Sun Jun, Chen Tianyi, Giannakis G B, et al. Communication-efficient distributed learning via lazily aggregated quantized gradients[J]. arXiv preprint, arXiv: 1909.07588, 2019
[84] Shokri R, Shmatikov V. Privacy-preserving deep learning[C] //Proc of the 22nd ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2015: 1310−1321
[85] Elgabli A, Park J, Bedi A S, et al. Q-GADMM: Quantized group ADMM for communication efficient decentralized machine learning[J]. IEEE Transactions on Communications, 2020, 69(1): 164−181
[86] Elgabli A, Park J, Bedi A S, et al. GADMM: Fast and communication efficient framework for distributed machine learning[J]. Journal of Machine Learning Research, 2020, 21(76): 1−39
[87] Elgabli A, Park J, Ahmed S, et al. L-FGADMM: Layer-wise federated group ADMM for communication efficient decentralized deep learning[C/OL] //Proc of the IEEE Wireless Communications and Networking Conf(WCNC). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9120758
[88] Zhang Wei, Gupta S, Lian Xiangru, et al. Staleness-aware async-SGD for distributed deep learning[J]. arXiv preprint, arXiv: 1511.05950, 2015
[89] Tao Zeyi, Li Qun. eSGD: Communication efficient distributed deep learning on the edge[C/OL] //Proc of the 1st USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18). Berkeley, CA: USENIX Association, 2018[2022-09-05].https://www.usenix.org/conference/hotedge18/presentation/tao
[90] Wang Luping, Wang Wei, Li Bo. CMFL: Mitigating communication overhead for federated learning[C] //Proc of the 39th Int Conf on Distributed Computing Systems (ICDCS). Piscataway, NJ: IEEE: 954−964
[91] Xing Hong, Simeone O, Bi Suzhi. Decentralized federated learning via SGD over wireless D2D networks[C/OL] //Proc of the 21st Int Workshop on Signal Processing Advances in Wireless Communications (SPAWC). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9154332
[92] Shiri H, Park J, Bennis M. Communication-efficient massive UAV online path control: Federated learning meets mean-field game theory[J]. IEEE Transactions on Communications, 2020, 68(11): 6840−6857 doi: 10.1109/TCOMM.2020.3017281
[93] Zeng Tengchan, Semiari O, Mozaffari M, et al. Federated learning in the sky: Joint power allocation and scheduling with UAV swarms[C/OL] //Proc of the 54th IEEE Int Conf on Communications (ICC). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9148776
[94] Pham Q V, Zeng Ming, Ruby R, et al. UAV communications for sustainable federated learning[J]. IEEE Transactions on Vehicular Technology, 2021, 70(4): 3944−3948 doi: 10.1109/TVT.2021.3065084
[95] Fadlullah Z M, Kato N. HCP: Heterogeneous computing platform for federated learning based collaborative content caching towards 6G networks[J]. IEEE Transactions on Emerging Topics in Computing, 2020, 10(1): 112−123
[96] Chen Mingzhe, Mozaffari M, Saad W, et al. Caching in the sky: Proactive deployment of cache-enabled unmanned aerial vehicles for optimized quality-of-experience[J]. IEEE Journal on Selected Areas in Communications, 2017, 35(5): 1046−1061 doi: 10.1109/JSAC.2017.2680898
[97] Lahmeri M A, Kishk M A, Alouini M S. Artificial intelligence for UAV-enabled wireless networks: A survey[J]. IEEE Open Journal of the Communications Society, 2021, 2: 1015−1040 doi: 10.1109/OJCOMS.2021.3075201
[98] Wang Yuntao, Su Zhou, Zhang Ning, et al. Learning in the air: Secure federated learning for UAV-assisted crowdsensing[J]. IEEE Transactions on Network Science and Engineering, 2020, 8(2): 1055−1069
[99] Lim W Y B, Huang Jianqiang, Xiong Zehui, et al. Towards federated learning in UAV-enabled Internet of vehicles: A multi-dimensional contract-matching approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(8): 5140−5154 doi: 10.1109/TITS.2021.3056341
[100] Samarakoon S, Bennis M, Saad W, et al. Distributed federated learning for ultra-reliable low-latency vehicular communications[J]. IEEE Transactions on Communications, 2019, 68(2): 1146−1159
[101] Ye Dongdong, Yu Rong, Pan Miao, et al. Federated learning in vehicular edge computing: A selective model aggregation approach[J]. IEEE Access, 2020, 8: 23920−23935 doi: 10.1109/ACCESS.2020.2968399
[102] Lu Yunlong, Huang Xiaohong, Dai Yueyue, et al. Federated learning for data privacy preservation in vehicular cyber-physical systems[J]. IEEE Network, 2020, 34(3): 50−56 doi: 10.1109/MNET.011.1900317
[103] Du Zhaoyang, Wu Celimuge, Yoshinaga T, et al. Federated learning for vehicular Internet of things: Recent advances and open issues[J]. IEEE Open Journal of the Computer Society, 2020, 1: 45−61 doi: 10.1109/OJCS.2020.2992630
[104] Deveaux D, Higuchi T, Uçar S, et al. On the orchestration of federated learning through vehicular knowledge networking[C/OL] //Proc of IEEE Vehicular Networking Conf (VNC). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9318386
[105] Chen Mingzhe, Semiari O, Saad W, et al. Federated echo state learning for minimizing breaks in presence in wireless virtual reality networks[J]. IEEE Transactions on Wireless Communications, 2019, 19(1): 177−191
[106] Mozaffari M, Saad W, Bennis M, et al. A tutorial on UAVs for wireless networks: Applications, challenges, and open problems[J]. IEEE Communications Surveys & Tutorials, 2019, 21(3): 2334−2360
[107] Samarakoon S, Bennis M, Saad W, et al. Federated learning for ultra-reliable low-latency V2V communications[C/OL] //Proc of the IEEE Global Communications Conf (GLOBECOM). Piscataway, NJ: IEEE, 2018[2022-09-05].https://ieeexplore.ieee.org/abstract/document/8647927
[108] Feyzmahdavian H R, Aytekin A, Johansson M. An asynchronous mini-batch algorithm for regularized stochastic optimization[J]. IEEE Transactions on Automatic Control, 2016, 61(12): 3740−3754 doi: 10.1109/TAC.2016.2525015
[109] Lu Yunlong, Huang Xiaohong, Zhang Ke, et al. Blockchain empowered asynchronous federated learning for secure data sharing in Internet of vehicles[J]. IEEE Transactions on Vehicular Technology, 2020, 69(4): 4298−4311 doi: 10.1109/TVT.2020.2973651
[110] Yin Feng, Lin Zhidi, Kong Qinglei, et al. FedLoc: Federated learning framework for data-driven cooperative localization and location data processing[J]. IEEE Open Journal of Signal Processing, 2020, 1: 187−215 doi: 10.1109/OJSP.2020.3036276
[111] Merluzzi M, Di Lorenzo P, Barbarossa S. Dynamic resource allocation for wireless edge machine learning with latency and accuracy guarantees[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 9036−9040
[112] Yang Zhaohui, Chen Mingzhe, Saad W, et al. Energy efficient federated learning over wireless communication networks[J]. IEEE Transactions on Wireless Communications, 2020, 20(3): 1935−1949
[113] Luo Siqi, Chen Xu, Wu Qiong, et al. Hfel: Joint edge association and resource allocation for cost-efficient hierarchical federated edge learning[J]. IEEE Transactions on Wireless Communications, 2020, 19(10): 6535−6548 doi: 10.1109/TWC.2020.3003744
[114] Abad M S H, Ozfatura E, Gunduz D, et al. Hierarchical federated learning across heterogeneous cellular networks[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 8866−8870
[115] Liu Dongzhu, Zhu Guangxu, Zhang Jun, et al. Data-importance aware user scheduling for communication-efficient edge machine learning[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 7(1): 265−278
[116] Zhan Yufeng, Li Peng, Guo Song. Experience-driven computational resource allocation of federated learning by deep reinforcement learning[C] //Proc of the 34th 2020 IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2020: 234−243
[117] Zeng Qunsong, Du Yuqing, Huang Kaibin, et al. Energy-efficient radio resource allocation for federated edge learning[C/OL] //Proc of the 54th 2020 IEEE Intl Conf on Communications Workshops (ICC Workshops). Piscataway, NJ: IEEE, 2020[2022-09-05]. https://ieeexplore.ieee.org/abstract/document/9145118
[118] Chen Mingzhe, Poor H V, Saad W, et al. Convergence time optimization for federated learning over wireless networks[J]. IEEE Transactions on Wireless Communications, 2020, 20(4): 2457−2471
[119] Mo Xiaopeng, Xu Jie. Energy-efficient federated edge learning with joint communication and computation design[J]. Journal of Communications and Information Networks, 2021, 6(2): 110−124 doi: 10.23919/JCIN.2021.9475121
[120] Ren Jinke, Yu Guanding, Ding Guangyao. Accelerating DNN training in wireless federated edge learning systems[J]. IEEE Journal on Selected Areas in Communications, 2020, 39(1): 219−232
[121] Anh T T, Luong N C, Niyato D, et al. Efficient training management for mobile crowd-machine learning: A deep reinforcement learning approach[J]. IEEE Wireless Communications Letters, 2019, 8(5): 1345−1348 doi: 10.1109/LWC.2019.2917133
[122] Nguyen H T, Luong N C, Zhao J, et al. Resource allocation in mobility-aware federated learning networks: A deep reinforcement learning approach[C/OL] //Pro of the 6th World Forum on Internet of Things (WF-IoT). Piscataway, NJ: IEEE, 2020[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9221089
[123] Zhang Xueqing, Liu Yanwei, Liu Jinxia, et al. D2D-assisted federated learning in mobile edge computing networks [C/OL] //Pro of the 2021 IEEE Wireless Communications and Networking Conf (WCNC). Piscataway, NJ: IEEE, 2021[2022-09-05].https://ieeexplore.ieee.org/abstract/document/9417459
[124] Yang Kai, Jiang Tao, Shi Yuanming, et al. Federated learning via over-the-air computation[J]. IEEE Transactions on Wireless Communications, 2020, 19(3): 2022−2035 doi: 10.1109/TWC.2019.2961673
[125] Qin Zhijin, Li G Y, Ye Hao. Federated learning and wireless communications[J]. IEEE Wireless Communications, 2021, 28(5): 134−140 doi: 10.1109/MWC.011.2000501
[126] Amiria M M, Dumanb T M, Gündüzc D, et al. Collaborative machine learning at the wireless edge with blind transmitters[C/OL] //Proc of the 7th IEEE Global Conf on Signal and Information Processing. Piscataway, NJ: IEEE, 2019[2022-09-05].https://iris.unimore.it/handle/11380/1202665
[127] Chen Mingzhe, Yang Zhaohui, Saad W, et al. A joint learning and communications framework for federated learning over wireless networks[J]. IEEE Transactions on Wireless Communications, 2020, 20(1): 269−283
[128] Yang H H, Arafa A, Quek T Q, et al. Age-based scheduling policy for federated learning in mobile edge networks[C] //Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE: 8743−8747
[129] Dinh C, Tran N H, Nguyen M N, et al. Federated learning over wireless networks: Convergence analysis and resource allocation[J]. IEEE/ACM Transactions on Networking, 2020, 29(1): 398−409
[130] Yang Hao, Liu Zuozhu, Quek T Q, et al. Scheduling policies for federated learning in wireless networks[J]. IEEE Transactions on Communications, 2019, 68(1): 317−333
[131] Shi Wenqi, Zhou Sheng, Niu Zhisheng. Device scheduling with fast convergence for wireless federated learning[C/OL] //Proc of the 54th IEEE Int Conf on Communications (ICC). Piscataway, NJ: IEEE, 2020[2022-09-05]. https://ieeexplore.ieee.org/abstract/document/9149138
[132] Amiri M M, Gündüz D, Kulkarni S R, et al. Update aware device scheduling for federated learning at the wireless edge[C] //Proc of the 2020 IEEE Int Symp on Information Theory (ISIT). Piscataway, NJ: IEEE, 2020: 2598−2603
[133] Bonawitz K, Ivanov V, Kreuter B, et al. Practical secure aggregation for privacy-preserving machine learning[C] //Proc of the ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2017: 1175−1191
-
期刊类型引用(1)
1. LUO Haoran,HU Shuisong,WANG Wenyong,TANG Yuke,ZHOU Junwei. Research on Multi-Core Processor Analysis for WCET Estimation. ZTE Communications. 2024(01): 87-94 . 必应学术
其他类型引用(4)