面向AIoT的协同智能综述

罗宇哲; 李玲; 侯朋朋; 于佳耕; 程丽敏; 张常有; 武延军; 赵琛

doi:10.7544/issn1000-1239.202330975

面向AIoT的协同智能综述

罗宇哲^{1, 2,},
李玲^1, ,,
侯朋朋¹,
于佳耕¹,
程丽敏^{1, 2},
张常有¹,
武延军¹,
赵琛¹

1.
中国科学院软件研究所　北京　100190
2.
中国科学院大学　北京　100049

基金项目: 广东省重点领域研发计划项目（2019B010154004）

详细信息

作者简介:
罗宇哲: 1995 年生. 博士研究生. 主要研究方向为分布式机器学习

李玲: 1982 年生. 博士，研究员. CCF 高级会员. 主要研究方向为智能计算

侯朋朋: 1985 年生. 博士，副研究员. 主要研究方向为操作系统

于佳耕: 1983 年生. 博士，副研究员. 主要研究方向为操作系统、云计算、智能软件

程丽敏: 1988年生. 博士研究生. 主要研究方向为智能系统软件、机器学习和视频分析

张常有: 1970年生. 博士，研究员. 主要研究方向为并行与分布式软件、软件工程

武延军: 1979年生. 博士，研究员. CCF高级会员. 主要研究方向为操作系统、系统安全

赵琛: 1967年生. 博士，研究员. 主要研究方向为逻辑、自动程序分析和编程语言设计与实现

通讯作者:
李玲（liling@iscas.ac.cn）

中图分类号: TP391
计量
- 文章访问数: 530
- HTML全文浏览量: 126
- PDF下载量: 136
出版历程
- 收稿日期: 2023-12-04
- 修回日期: 2024-08-18
- 录用日期: 2024-09-02
- 网络出版日期: 2024-09-11
- 刊出日期: 2024-12-31

Survey of AIoT-Oriented Collaborative Intelligence

1.
Institute of Software, Chinese Academy of Sciences, Beijing 100190
2.
University of Chinese Academy of Sciences, Beijing 100049

Funds: This work was supported by the Key-Area Research and Development Program of Guangdong Province（2019B010154004）.

More Information

Author Bio:
Luo Yuzhe: born in 1995. PhD candidate. His main research interest includes distributed machine learning

Li Ling: born in 1982. PhD, professor.Senior member of CCF. Her mian research interest includes intelligent computing

Hou Pengpeng: born in 1985. PhD, associate professor. His main research interest includes operating system

Yu Jiageng: born in 1983. PhD, associate professor. His main research interests include operating system, cloud computing and intelligent software

Cheng Limin: born in 1988. PhD candidate. Her main research interests include intelligent system software, machine learning and video analysis

Zhang Changyou: born in 1970. PhD, professor. His main research interests include parallel distributed software and software engineering

Wu Yanjun: born in 1979. PhD, professor. Senior member of CCF. His main research interests include operating system and system security

Zhao Chen: born in 1967. PhD, professor. His main research interests include logic and automatic program analysis, and design and implementation of programming languages

摘要

摘要:
深度学习和物联网的融合发展有力地促进了AIoT生态的繁荣. 一方面AIoT设备为深度学习提供了海量数据资源，另一方面深度学习使得AIoT设备更加智能化. 为保护用户数据隐私和克服单个AIoT设备的资源瓶颈，联邦学习和协同推理成为了深度学习在AIoT应用场景中广泛应用的重要支撑. 联邦学习能在保护隐私的前提下有效利用用户的数据资源来训练深度学习模型，协同推理能借助多个设备的计算资源来提升推理的性能. 引入了面向AIoT的协同智能的基本概念，围绕实现高效、安全的知识传递与算力供给，总结了近十年来联邦学习和协同推理算法以及架构和隐私安全3个方面的相关技术进展，介绍了联邦学习和协同推理在AIoT应用场景中的内在联系. 从设备共用、模型共用、隐私安全机制协同和激励机制协同等方面展望了面向AIoT的协同智能的未来发展.
- 协同智能 /
- 联邦学习 /
- 协同推理 /
- 智能物联网 /
- 智能计算系统
Abstract:
The fusion of deep learning and the Internet of things has significantly promoted the development of the AIoT ecosystem. On the one hand, the huge amounts of multi-modal data collected by AIoT devices provide deep learning with abundant training data resources, which plays a more important role in the era of big models. On the other hand, the development of deep learning makes AIoT devices smarter, which shows great potential for promoting social development and the convenience of human life. As major support for the usage of deep learning in AIoT, federated learning effectively makes use of the training data provided by AIoT devices to train deep learning models with data privacy protection while collaborative inference overcomes the obstacles in the deployment of deep learning brought by the limited computation resource of AIoT devices. We induce the concept of AIoT-oriented collaborative intelligence. Aiming at implementing knowledge transmission and computation resource supply with high efficiency and security, we review the related works, published in the past 10 years, about the architecture, algorithm, privacy, and security of federated learning and collaborative inference, and introduce the inner connection of federated learning and collaborative inference. The algorithm part summarizes the federated learning and collaborative inference algorithm related to AIoT use cases and their optimization goals. The architecture part introduces the related works about deep learning accelerators, deep learning compilation, deep learning frameworks, communication among devices, and collaboration among devices from the view of AI computing systems. The privacy and security part introduces the privacy and security threats faced by AIoT-oriented collaborative intelligence and the defense methods against them. We also provide insights into the future development of AIoT-oriented collaborative intelligence in the aspect of equipment sharing, model sharing, collaboration of privacy and security mechanisms, and collaboration of incentive mechanisms.
- collaborative intelligence /
- federated learning /
- collaborative inference /
- artificial intelligence of things /
- AI computing system
根据IoT-Analytics 的报告，近年来AIoT的设备数目和市场规模均保持年均15%以上的增长，详见https://iot-analytics.com/number-connected-iot-devices/和https://iot-analytics.com/iot-market-size/

当本文分别论述联邦学习和协同推理这2个领域中与AIoT应用场景相关的技术进展时，是从广义的角度来介绍面向AIoT的协同智能；当本文论述这2种技术的联系或涉及两者联合起作用的新的应用形态时，则是从狭义的角度来介绍面向AIoT的协同智能.

攻击者保持模型收敛精度不受显著影响是为了获得有价值的全局模型参数同时防止被发现.

HTML全文

得益于深度学习模型强大的特征提取和识别能力以及大规模、高质量文本中蕴含的丰富知识，预训练语言模型（pre-trained language models，PLMs）在多种自然语言处理（natural language processing，NLP）下游任务上表现出了卓越性能^[1-7].

与有监督学习模型相比，预训练语言模型能够充分利用大规模、无标注数据来学习通用语言特征，具备一定的常识和认知泛化能力. 通过少试（few-shot）、单试（one-shot）甚至零试（zero-shot）学习就能完成各种自然语言处理下游任务. 近年来，越来越多以中文为基础的预训练模型不断涌现^[8-14]. 然而，中文预训练语言模型在实际应用中仍面临着许多挑战.

1）在资源有限的环境中部署大型预训练语言模型十分困难. 由于预训练语言模型的性能通常与模型规模成正比^[6]，因此许多中文预训练语言模型致力于使用大量的训练数据训练具有更多参数的模型. 例如，首个大型中文预训练语言模型^[11]具有26亿个参数，目前最大的中文基座预训练语言模型^[12]具有超过2 000亿个参数. 对于大多数研究人员或小型公司/研究机构来说，这类大模型训练和部署所需的时间和资源成本是难以承受的.

2）大多数基于自回归的预训练语言模型只能利用单向信息. 许多自然语言理解（natural language understanding，NLU）任务需要上下文信息进行决策，仅能利用单向信息的基于自回归的预训练语言模型通常在这类任务上表现欠佳^[11]. 我们认为，充分利用双向信息可以使具有强大文本生成能力的自回归预训练语言模型，尤其是小型预训练语言模型，在自然语言理解任务中表现出更加优异的性能.

3）现有的大规模自回归预训练语言模型，如CPM-2^[14]和PanGu- $\mathrm{\alpha }$ ^[12]，都依赖基于词语的字典来构建输入. 然而，这种策略具有以下缺点：一个问题是借助分词工具或者分词算法进行分词可能会导致分词错误^[15]. 例如，对于句子“胜利海上油田产油创新高”来说，正确的分词结果为“胜利|海上|油田|产|油|创|新高”. 然而，CPM-2使用的分词工具将该句子分为“胜利|海上|油田|产|油|创新|高”，这与原始的句意并不一致. 另一个问题是，中文的词语量通常比汉字的量大得多. 对于一份仅需要几千个汉字就能覆盖99.9%数据的语料来说，要达到相同的覆盖率，则需要超过100 000个词语（见2.2节）. 字典规模的增大会导致模型参数的增加，从而带来更大的训练开销. 而且，基于词语的策略容易造成数据稀疏和未知词汇（out of vocabulary，OOV）的问题，这种情况下，模型难以充分地学习到不常用字的知识. 因此，我们认为基于汉字的数据处理策略更适合中文预训练语言模型，因为汉字更符合中文的语言特性并能减少模型参数的数量.

为了解决以上挑战，我们提出并训练了一个高质量的、基于自回归的小型中文预训练语言模型——玲珑. 玲珑的名字寓意其规模虽小但是具有强大的能力. 玲珑在50 GB的高质量训练数据上充分训练，并使用基于汉字的字典构造输入. 凭借仅仅3.17亿个参数，玲珑在文本生成、问题解答和数学计算等多种下游任务中取得了出色的结果.

总体来讲，本文有4点贡献：

1）提出并训练了一个仅有3.17亿个参数的高质量中文预训练语言模型——玲珑.

2）基于通用规范汉字表构建了一个基于汉字的字典，有效地避免和减少了未知词汇和分词错误带来的负面影响.

3）训练了玲珑及其反向版本玲珑_B，通过下游任务验证了基于自回归的模型可以通过利用双向信息取得更好的性能.

4）制定的模板可以将多种下游任务转换为生成任务. 实验结果表明，在多种任务上，玲珑可以达到与大型预训练语言模型相媲美甚至更好的性能.

1. 相关工作

1.1 预训练语言模型结构

语言模型（language model，LM）就是计算序列的概率，因此可以根据已有的一部分语句来预测句子的下一个元素是什么. 其标准定义为：对于语言序列 ${w}_{1},{w}_{2},… ,{w}_{n}$ ，计算该序列的概率，即 $P\left({w}_{1},{w}_{2},… ,{w}_{n}\right)= P\left({{w}}_{\text{1}}\right)P\left({{w}}_{\text{2}}|{{w}}_{\text{1}}\right)… P\left({{w}}_{{n}}|{{w}}_{\text{1}},{{w}}_{\text{2}},\cdots ,{{w}}_{{n-}\text{1}}\right)$ .

语言模型发展至今经历了3个阶段，分别为80年代的专家语法规则模型、2000年左右的统计语言模型，以及目前最常使用神经网络预训练语言模型. 预训练语言模型通过自监督学习的方式，能够充分利用大规模、无标注的数据学习通用语言特征，具备一定常识和认知泛化能力.

预训练模型结构主要分为3类：基于编码-解码器的模型^[5,16-17]、自编码模型^[3]和自回归模型^[1-2,6].

基于编码-解码器的模型通常在编码器部分采用双向自注意力机制，在解码器部分采用单向自注意力机制，编码器部分学习到的信息会传递给解码器. 这类模型适用于解决文本生成、问题回答等条件生成任务.

自编码模型使用双向自注意力机制，通常对原始文本进行遮蔽（mask）处理. 这类模型通过理解上下文信息对遮蔽部分进行补充，适用于完形填空等自然语言理解任务.

自回归模型使用单向自注意力机制，按照文本顺序依次进行学习，通常适用于文本生成任务. 然而，由于其使用单向自注意力机制，通常无法在自然语言理解相关任务上取得较好的效果.

1.2 中文预训练语言模型

目前已有较多中文预训练模型被提出. 一些较为小型的模型有MacBERT^[18]，NEZHA^[19]等. MacBERT^[18]在BERT^[3]基础上进行了模型结构调整，并利用词汇量达54亿的中文数据进行训练. MacBERT可以在阅读理解、句子分类等自然语言理解任务上取得较好性能. NEZHA使用10台，每台具有8个32 GB内存NVIDIA Tesla V100 GPU的华为云服务器来进行模型训练和微调，在阅读理解、命名实体识别和情绪分类等自然语言理解任务上具有一定竞争力.

大模型有CPM^[11]、GLM^[13]、PLUG、PanGu- $\mathrm{\alpha }$ ^[12]、CPM-2^[14]、Claude-2、通义千问^[20]等. CPM^[10]从100 GB大规模中文语料库中学习通用语言模式，耗费3周时间使用64块V100 GPU进行模型训练，得到了首个基于解码器结构（自回归）的大规模中文预训练语言模型. 实验表明，CPM模型能够用于问题解答、文章摘要、对话以及各类型的生成任务. GLM^[13]基于自回归模型，通过结合多种预训练目标，可以更好地兼顾自然语言生成和自然语言理解任务. 其公开发布的中文预训练模型参数量高达100亿. 2021年4月19日，阿里达摩院发布了具有270亿参数量的中文预训练语言模型PLUG，其训练使用超过1 TB高质量中文文本数据. PLUG以80.614的分数刷新了中文语言理解评测基准CLUE^[21]分类榜单纪录. PanGu- $\mathrm{\alpha }$ ^[12]模型参数量高达2 000亿，不仅占据CLUE总榜单榜首，在落地应用上也具有卓越优势. Claude-2模型由Anthropic研发并发布于2023年7月，其参数量高达1 300亿. Claude-2支持高达10万个标记（token）上下文并且训练数据更新到2023年，相对于其他模型，它能够利用更长、更准确的信息. 此外，Claude-2在代码和数学方面的能力有显著提升，可以更准确地解析和理解复杂代码和数学表达式. 2023年8月，阿里云发布具有70亿参数量的通义千问模型，模型在超过2.2万亿个标记上进行预训练，上下文长度为2 048. 在人文、社会科学和其他专业领域的52个主题上进行测试，通义千问在现有相似规模的模型中表现最佳，甚至超过了更大规模的模型.

目前已有的小规模中文预训练模型通常基于自编码结构^[22-23]，适用于理解任务，在生成任务上表现不佳. 而大模型由于其参数量巨大，导致训练成本十分高昂. 大模型的使用权和所有权通常掌握在对应研发机构手中，个人研究者或中小型研究机构即使了解模型结构和原理也难以承担训练和使用方面的成本.

2. 玲　　珑

本文提出了一个基于自回归目标的小型中文预训练语言模型——玲珑. 玲珑基于汉字字典使用约50 GB来源广泛的高质量训练数据进行模型预训练. 此外，通过将各种类型的自然语言下游任务转变为自然语言生成任务，可以很好地利用玲珑解决问题. 结合玲珑与玲珑_B可以对双向文本信息加以利用，从而进一步提升模型在下游任务上的性能.

2.1 训练数据

规模适当的高质量中文语料对中文预训练语言模型及其他中文自然语言处理模型的效果起到至关重要的作用. 研究表明，使用小型高质量语料库训练的模型比仅使用更大规模的中文维基百科训练的模型表现更好^[24].

我们从公开渠道收集了总计近0.72 TB的原始数据，覆盖百科、新闻、教育和网络数据等多种类别. 为了构建一个高质量的中文数据集，我们对原始收集得到的数据进行了中文数据提取、基于规则的清理、去重和过滤处理^[25]，得到约50 GB，包含约1 500万个标记的数据，各类别数据训练时使用的数据量如表1所示.

表 1 训练数据统计信息

Table 1. Statistics of Training Data GB

类别	数据量
新闻	36.26
Common Crawl网络数据	7.93
百科	5.50
网络	2.35
专利	2.68
教育	1.77
小说	0.62

下载: 导出CSV

| 显示表格

2.2 基于汉字的预处理

我们主要基于《通用规范汉字表》^[26]和清洗后的高质量语料构建了基于汉字的字典. 具体来说，我们首先将《通用规范汉字表》中全部一级和二级汉字加入字典. 《通用规范汉字表》是中国政府公布的现行标准汉字表，一、二级字表合计6 500字，能够满足出版印刷、辞书编纂和信息处理等方面的一般用字需要. 然后，我们从训练语料中统计经常出现的汉字和其他非中文标记来补充字典. 虽然玲珑是一个中文预训练语言模型，但我们仍然保留了一些在中文文章或对话中经常出现的非中文标记. 最终构建得到的字典包含13 312个标记，只有其他基于中文词语的字典规模的1/4~1/2^[11-12,27].

对于清洗后得到的高质量数据集中的每篇文章，我们在其开头和结尾分别添加特殊标记[START]和[END]. 然后，使用基于汉字的字典将文章中的标记转换为整数ID. 最后我们将不同文章进行拼接串联，并使用上下文长度大小的滑动窗口对其进行切分，这样可以使得到的每条训练数据的长度都保持一致，模型在训练过程中可以充分利用数据和算力.

2.3 模型结构设计

玲珑是一个基于自回归结构的预训练语言模型，由1个嵌入层、多个解码器模块和1个输出层组成. 玲珑对文本生成过程进行建模，当前位置标记的生成概率取决于序列中前面的标记. 模型整体结构如图1所示.

图 1 玲珑模型整体结构

Figure 1. The overall structure of LingLong

下载: 全尺寸图片幻灯片

嵌入层用于将高维特征映射到低维，并且同时考虑标记的含义（文本语义）和位置（文本位置）信息. 因此嵌入层需要同时学习2个嵌入矩阵，分别用于计算标记嵌入和位置嵌入，即

${\boldsymbol{h}}^{\left(0\right)}=t {\boldsymbol{W}}_{t}+p {\boldsymbol{W}}_{p},$

(1)

其中 ${\boldsymbol{h}}^{\left(0\right)}$ 表示数据进行嵌入后得到的低维表示， $t$ 表示标记ID， $p$ 表示位置索引， ${\boldsymbol{W}}_{t}$ 和 ${\boldsymbol{W}}_{p}$ 分别为标记嵌入矩阵和位置嵌入矩阵.

每个解码器模块包含1个单向稀疏多头自注意层和1个前馈层，公式表示为

$\left\{\begin{aligned} &{\boldsymbol{a}}^{\left(l-1\right)}=S parseMultiHeadSelfAttention\left({\boldsymbol{h}}^{\left(l-1\right)}\right)\text{，}\\ & {\boldsymbol{f}}^{\left(l-1\right)}=FeedForward\left({\boldsymbol{a}}^{\left(l-1\right)}+{\boldsymbol{h}}^{\left(l-1\right)}\right)\text{，}\\ & {\boldsymbol{h}}^{\left(l\right)}={\boldsymbol{f}}^{(l-1)}+{\boldsymbol{h}}^{(l-1)}\text{，} \end{aligned}\right.$

(2)

${\boldsymbol{h}}^{\left({l}\right)}$ 是第 $l$ 个解码器模块的输出. 注意层可以学习相同标记在不同语境中可能具有不同的语义，而稀疏机制则有助于减轻计算开销.

输出层用于预测下一个标记，计算公式为

${\boldsymbol{o}}={\boldsymbol{h}}^{n} {\boldsymbol{W}}_{t}^{\mathrm{T}}\text{，}$

(3)

$\boldsymbol{o}$ 表示字典中每一个标记出现在下一个位置的概率.

模型训练过程中，我们使用交叉熵损失函数并持续更新模型参数. 表2列出了模型训练使用的重要超参数设置.

表 2 玲珑中使用的超参数

Table 2. Hyperparameters of LingLong

超参数	取值
字典规模	13 312
嵌入维度	1 024
隐藏层维度	1 024
解码模块数量	24
自注意头数量	16
稀疏自注意步长	128
稀疏自注意表现力	8
上下文窗口长度	1 024
可训练参数量	316989440

下载: 导出CSV

| 显示表格

2.4 双向预训练

我们在模型训练阶段使用相同数据集训练了2个结构完全相同的模型——前向模型（玲珑）和反向模型（玲珑_B）. 玲珑以自然文本顺序获取训练标记，玲珑_B则以反向文本顺序获取训练标记，2个模型使用2种不同的输入顺序来利用双向信息，同时保持自回归的模型训练目标. 例如，玲珑使用的训练标记是“[START]今天天气不好[END]”，玲珑_B相应的训练标记是“[START]好不气天天今[END]”. 简单的方法往往是十分有效的，通过利用双向信息，许多下游任务的性能都得到了有效提升，3.2.2节中进行了相关验证.

我们使用Adam优化器进行参数更新过程，选用的超参数设置为 ${\beta }_{1}=0.9$ ， ${\beta }_{2}=0.95$ ， $eps=1\times {10}^{-8}$ . 为了保持训练稳定，我们在不同阶段使用了不同学习率. 具体来讲，在预训练早期阶段，我们使用线性热身（warm-up）策略，在 $6\times {10}^{8}$ 个训练标记中将学习率从0逐步提高到最大值 $2.5\times {10}^{-4}$ . 在学习率达到峰值后，使用余弦衰减策略，将其缓慢降至一个较小的值，余弦衰减策略持续 $1\times {10}^{10}$ 个训练标记. 此外，我们还会定期以比当前学习率稍高的学习率重新启动训练过程，以帮助模型摆脱局部极小值. 在实际训练时，我们使用了数据并行方案，共计使用20个NVIDIA Tesla V100S GPU完成预训练，全局批次大小（batch size）为80，训练过程中在单个GPU上最多需要32 GB GPU内存.

2.5 模型微调

作为自回归模型，预训练得到的玲珑主要适用于文本生成任务. 因此，在完成下游任务时我们使用将所有下游任务都转换为文本生成任务的策略，使微调和预训练目标更加接近，以更好地利用预训练模型的生成能力. 例如，对于1对给定的 $\left(x,y\right)$ ，其中 $x$ 为原始问题， $y$ 为标签，我们可以通过一个转换模板将其转换为生成任务 $\hat{y}=G(x)$ . 最简单和直观的方式就是采用自然语言编写的能够表达任务语义的模板，例如，对于玲珑来说，一个分词任务可以构造为“原始文本：回首来时的路，坚定的信念载着我们走了很远. [SEP]分词结果：回首[SEP2]来[SEP2]时[SEP2]的[SEP2]路[SEP2]，[SEP2]坚定[SEP2]的[SEP2]信念[SEP2]载[SEP2]着[SEP2]我们[SEP2]走[SEP2]了[SEP2]很[SEP2]远[SEP2]. ”的格式. 而对于玲珑_B来说，由于模型在训练时采用反向的文本表述顺序，因此在模型微调时也需对数据进行反向调整. 为了保持认知上的“由因推果”，我们仍然采用问题在前、答案在后的构造顺序，对于相同的分词任务，对于玲珑_B构造的输入为“原始文本：. 远很了走们我着载念信的定坚，路的时来首回[SEP]分词结果：. [SEP2]远[SEP2]很[SEP2]了[SEP2]走[SEP2]们我[SEP2]着[SEP2]载[SEP2]念信[SEP2]的[SEP2]定坚[SEP2]，[SEP2]路[SEP2]的[SEP2]时[SEP2]来[SEP2]首回”. 表3中展示了我们为多个下游任务设计的正向模板和反向模板.

表 3 下游任务模板

Table 3. Templates for Downstream Tasks

任务	数据集	正向模板	反向模板
文本摘要	CEPSUM	类别：[“家居用品”，“箱包”，“服装”]；特征信息：格式化信息；商品描述：商品描述[SEP]商品简介：商品简介	类别：[“装服”，“包箱”，“品用居家”]；特征信息：息信化式格；商品描述：述描品商[SEP]商品简介：介简品商
文本摘要	LCSTS	文本：原始文本[SEP]摘要：摘要	文本：本文始原[SEP]摘要：要摘
基于结构化数据的文本生成	AdGen	标题信息：标题；标签信息：标签；特征信息：商品特征；[SEP]商品描述：商品描述	标题信息：题标；标签信息：签标；特征信息：征特品商；[SEP]商品描述：述描品商
基于结构化数据的文本生成	E-Reviews	特征信息：特征[SEP]广告文案：文案	特征信息：征特[SEP]广告文案：案文
问答	KBQA	问题：问题[SEP]答案：实体[SEP2]关系	问题：题问[SEP]答案：系关[SEP2]体实
中文分词	Weibo & MSR	原始文本：原始文本[SEP]分词结果：分词结果	原始文本：本文始原[SEP]分词结果：果结词分
句子对分类	LCQMC	“句子1”与“句子2”的意思是否相似？[SEP][“是”，“否”]	？似相否是思意的”1子句“与”2子句“[SEP][“是”，“否”]
数学推理	Math23K	问题：问题[SEP]答案：计算公式	问题：题问[SEP]答案：式公算计
阅读理解	CMRC	文本：文本；问题：问题[SEP]答案：答案	文本：本文；问题：题问[SEP]答案：案答
注：斜体文字表示来自数据集中的数据，正体文字是提示符或特殊分隔符.

下载: 导出CSV

| 显示表格

微调时使用与预训练阶段一致的交叉熵损失函数，并同样采用热身和衰减的学习率调整计划.

3. 实　　验

本节首先进行消融实验来验证玲珑采用的策略的有效性，然后对玲珑在一系列下游任务中的表现进行评估来验证其有效性. 玲珑的代码实现和预训练权重全部开源^[28]，均可在GNU GPLv3许可下使用.

3.1 实验设置

为了评估和对比玲珑以及其他以中文为主的预训练语言模型的性能，我们从CUGE^[29]中精心选择了7个不同的自然语言处理下游任务，选定任务可用于全面地评估模型的自然语言生成和自然语言理解能力. 研究中具体使用的评估基准和指标概述如下：

1）文本摘要是一项对给定的一段长文本进行摘要生成的自然语言生成任务. 我们在CEPSUM 2.0^[30]和LCSTS^[31]数据集上评估语言模型生成摘要的能力. CEPSUM 2.0包含家居用品（home applications）、服装（clothing）和箱包（cases and bags）相关的产品描述，分别使用3个类别的数据对模型进行微调然后进行评分，使用3个类别的平均得分作为CEPSUM 2.0的评估结果. 评价指标采用RouGe-L^[32]，它根据2个序列的最长公共子序列长度来衡量2个序列的相似度.

2）基于结构化数据的文本生成是一项基于结构化数据生成文本的自然语言生成任务. 我们使用AdGen^[33]数据集来评估模型，使用BLEU-4^[34]作为评价指标. 该数据集中的每个实例都包含输入的产品信息（表格形式的格式化信息）和预期广告文本（字符串）.

3）问答（question answering，QA）任务需要模型回答用自然语言描述的问题. 我们使用NLPCC2018-KBQA^[35]数据集来评估玲珑和其他基线模型的自然语言理解能力. NLPCC2018-KBQA包含一份知识图谱数据和一些针对知识图谱内容的提问及答案. 我们将NLPCC2018-KBQA中的问题转化为关系提取问题，具体来讲就是模型接收自然语言描述的问题，期望模型输出提取的实体和关系，然后利用提取的实体和关系在知识图谱中匹配答案. 我们使用准确率，也就是正确回答问题比例作为评价模型性能的指标.

4）中文分词是一项将句子分解为词序列的自然语言理解任务. 中文分词的一个重要挑战在于没有标准词库，也很难定义什么是词. 在不同语境或者不同需求下，分词的标准答案都是不同的. 我们使用微博数据集^[36]和MSR数据集^[37]作为评测基准， $F1=2\times \dfrac{{precision}\times {recall}}{{precision}+{recall}}$ 作为评价指标，其中，精确度（precision）是指正确分割单词数与模型预测单词数之比，召回率（recall）是指正确分割单词数与标签中单词数之比.

5）句子对分类是一项经典的自然语言理解任务，涉及区分2个句子之间的关系，如相似性或包含关系. 我们使用问题匹配数据集LCQMC^[38]作为基准，使用准确率作为评价指标.

6）数学推理是指利用算法/模型解决用自然语言描述的数学问题的任务. 经过不断尝试，我们发现玲珑的优势在于理解问题而非数值计算. 因此，我们使用模型从原始问题中提取数学公式，然后使用eval函数（Python中的函数）计算最终的数值结果. 评测基准使用Math23K^[39]数据集，评价指标使用准确率.

7）中文拼写检查（Chinese spell-checking，CSC）是一项关于纠正中文句子中拼写错误的任务. 使用SIGHAN13数据集^[40]作为评价基准， $F1$ 作为评估指标，其中精确度是指正确找到的拼写错误数量与模型预测的错误数量之比，召回率表示正确找到的错误数量与实际错误数量之比.

8）阅读理解（reading comprehension）任务是一项自然语言理解任务，旨在让模型理解文本和问题并回答关于给定文本的问题. 在阅读理解任务中，通常会提供一段文本和相关问题，要求模型根据文本内容生成合适的答案. 这个过程涉及到多个步骤，包括语义理解、推理以及答案生成等，需要模型具备较高的语言处理和推理能力. 本文采用EM（exact match）指标进行性能评估，将预测结果与真实结果进行比较，正确回答问题比例作为评价模型性能的指标. EM指标值越高，说明模型性能越好.

我们为所有任务构建的模板展示在表3中，其中来自数据集中的数据使用斜体展示，其他为提示符或特殊分隔符. 例如，在LCSTS的模板中，“原始文本”应替换为相应数据的原文，“摘要”应替换为相应数据的摘要，“[SEP]”为分隔符，其他文本均为构造的提示文本.

3.2 消融实验

3.2.1 字比词更有优势

为了验证基于汉字进行数据处理策略的优越性，我们使用不同的字典在相同数据集上训练了基于汉字策略和基于词语策略的2个规模相近的模型. 字典分别采用本文构建的基于汉字的字典和CPM-2^[14]中使用的基于词语的字典，CPM-2字典使用BPE（byte pair encoding）方式构造得到，2个字典互不为子集. 此外，出于快速验证的目的，我们将2个模型的解码器模块数减少到12个，隐藏层维度减少到768，每个模型大约有1亿个参数（基于词语的模型有1.06亿个参数，基于汉字的模型有0.96亿个参数）.

使用不同标记化策略处理后的预训练数据集信息如表4所示. 特殊标记[UNK]表示字典中没有的标记. 在使用基于词语的标记策略时，[UNK]比例达到了0.417 7%，是基于汉字策略的近3倍，这说明了基于词语的策略更容易导致OOV问题.

表 4 使用不同标记化策略进行预训练数据集处理结果

Table 4. Summary of Our Pre-training Dataset Using Different Tokenization Strategies

数据量	基于汉字标记策略	基于词语标记策略
字典规模	13 312	26 240
标记数量	23 710 716 503	19 177 964 849
[UNK]数量	34 566 039	80 110 810
[UNK]比例/%	0.145 8	0.417 7

下载: 导出CSV

| 显示表格

我们还验证了这2个模型在6个下游任务上的性能，结果展示在表5中. 整体来看，基于汉字的模型能更好地完成各种下游任务. 这是因为基于词语的模型经常无法分割或者错误地分割原始文本. 此外，由于大多数中文语素（语言表达中最小的有意义成分）是单个中文汉字，因此，使用基于汉字的字典更符合中国人的表达习惯. 基于汉字的字典还可以有效减少[UNK]出现，从而增加数据中有效标记数量.

表 5 基于汉字策略和基于词语策略训练模型在下游任务数据集上的性能

Table 5. Performance of Models Trained Using Character-Based Strategy and Word-Based Strategy on Downstream Task Datasets %

策略	文本摘要		基于结构化数据的文本生成	问答	中文分词		句子对分类	数学推理
策略	CEPSUM 2.0	LCSTS	AdGen	KBQA	Weibo	MSR	LCQMC	Math23K
基于词语策略模型	19.21	23.75	6.56	56.90	52.78	60.19	80.63	6.10
基于汉字策略模型	23.73	30.85	9.28	73.00	93.97	95.37	83.00	54.10
注：对于所有任务来说，数值越高越好.

下载: 导出CSV

| 显示表格

3.2.2 双向比单向更有优势

我们比较了单向模型（玲珑、玲珑_B）和双向模型（玲珑_F+B，玲珑和玲珑_B的组合）在7个下游任务上的性能. 为了保证模型性能，本文针对每个任务设计相应策略结合玲珑和玲珑_B的输出. 例如，单向模型完成中文拼写检查任务时，我们使用模型计算句子中每个位置上字典中的标记出现的概率，并使用每个位置的top-k个标记作为该位置的候选字. 如果实际句子中的标记不在候选集合中，那么我们认为该位置的标记是错误的. 当使用双向模型时，我们使用玲珑和玲珑_B的top-k标记共同构建候选集. 表6展示了我们为下游任务设计的双向信息使用方式. 虽然为不同任务设计不同双向信息的使用方式会带来额外开销，并损失一定泛化性. 但对于小规模模型来说，以较少的资源就可以实现并行，推理时间不会显著增加，精巧的模板构建和结果融合设计可以显著地提升模型在下游任务上的性能，因此这部分的开销是必要和值得的.

表 6 玲珑和玲珑_B输出结果的结合方式

Table 6. Methods for Aggregating the Outputs of LingLong and LingLong_B

任务	中文拼写检查	文本摘要	基于结构化数据的文本生成	问答	中文分词	句子对分类	数学推理	阅读理解
方案	结合2个模型输出结果，将所有查找到的错字进行输出.	2个模型分别输出完整结果，将结果与原始输入计算Rouge-L分数，选取得分较高的1个.	2个模型分别输出完整结果，将结果与原始输入计算Rouge-L分数，选取得分较高的1个.	分别使用2个模型得到的结果进行查询. 若均能查询到结果，且结果不一致，则取正向模型输出结果（查询方式见3.1节）.	结合2个模型结果，给出1个细粒度分词方案和1个粗粒度分词方案. （粗粒度：仅当玲珑和玲珑_B均认为应当进行分词时才进行分词；细粒度：有任意1个模型认为应该分词时即进行分词）.	选择2个模型输出中概率最高的类别.	从模型输出结果中选取语法正确（函数eval能够正确解析则为语法正确）的1个作为最终结果. 若均语法正确，且结果不一致，则取正向模型输出结果.	2个模型分别输出完整结果，将结果与原始输入计算Rouge-L分数，选取得分较高的1个.

下载: 导出CSV

| 显示表格

表7展示了双向模型与单向模型在下游任务上的实验结果. 双向模型相对于玲珑和玲珑_B的平均改进率为6.97%和20.45%. 尤其是在中文拼写检查（SIGHAN）等自然语言推理任务上大大优于单向模型. 这是因为对于自然语言理解任务，上下文信息对于做判断来说更为重要.

表 7 双向模型与单向模型在下游任务上的性能

Table 7. Performance of Bidirectional Models Versus Unidirectional Models on Downstream Tasks %

模型	中文拼写检查	文本摘要		基于结构化数据的文本生成	问答	中文分词		句子对分类	数学推理	阅读理解
模型	SIGHAN	CEPSUM 2.0	LCSTS	AdGen	KBQA	Weibo	MSR	LCQMC	Math23K	CMRC
玲珑	49.80	25.23	41.07	19.24	84.80	95.41	96.03	84.46	68.60	70.09
玲珑_B	43.88	23.93	33.92	9.46	69.20	95.04	95.84	85.91	59.90	46.63
玲珑_F+B	63.10	26.04	45.98	20.57	84.90	95.20	96.14	91.60	75.60	78.20

下载: 导出CSV

| 显示表格

对于中文拼写检查任务，我们没有设计模板，而是直接使用模型计算当前位置每个标记出现的概率，因此模型在预测第 $i$ 个位置的标记时，只能利用前 $i-1$ 个位置的信息. 而由于句子可以以任何标记开头，仅使用玲珑无法确定句子第1个标记是否正确，需要双向信息才能更好地确定句子每一个位置标记的正确性.

总之，通过结合使用玲珑和玲珑_B模型可以更有效地利用双向信息，对于许多下游任务来说，双向模型比单向模型更可靠. 虽然双向模型增加了一定开销，但它仍然是用户友好的，玲珑和玲珑_B可以以较少的计算资源独立运行或并行推理，使用双向信息增强模型表现的同时不显著增加推理时间. 实验结果也表明，结合双向信息通常能取得明显优于单向模型的性能. 总体来说，使用双向模型是一种灵活可选的方法，可以根据特定需求进行定制. 在资源很有限的情况下，我们更推荐单独使用正向玲珑模型.

3.3 文本生成

文本生成任务使用预训练语言模型根据前文信息生成后续文本，因此可以直接使用玲珑的预训练版本. 值得注意的是，我们没有对生成结果进行任何后期编辑. 与CPM和PanGu- $\mathrm{\alpha }$ 一样，只有当模型没有在合理的点停止生成时，我们才会截断生成文本. 在进行文本生成任务时，我们使用top- $k=10$ ，top- $p=0.9$ ，以及 ${temperature}=1.0$ 作为超参数设置，以合理地控制模型生成文本的正确性和多样性. 表8中展示了一些生成示例，通过这些示例，可以对玲珑的能力有一个直观的了解和认识. 在检查数千个生成结果后，我们发现玲珑在人物传记生成、对联生成和故事生成等任务中表现十分出色.

表 8 玲珑文本生成示例

Table 8. Examples of Text Generated by LingLong

类别	提示文本	生成文本
人物介绍	齐小明，知名学者	文化产业研究者. 2013年，出版《文化产业的新趋势》《中国文化产业与资本市场》.
对联	上联：一帆风顺年年好；下联：	风平浪静日日新
故事	有一天，一只小蝴蝶正在森林里飞，	突然，它看见一只老鹰正从空中俯瞰着她.

下载: 导出CSV

| 显示表格

3.4 下游任务

3.4.1 实验设置

本节对比了玲珑与其他7个大小相近的中文预训练语言模型以及5个大型中文预训练语言模型在7个下游任务上的性能. 在微调和评估阶段，我们使用模板将下游任务转换为生成任务，模板构造方式见表3. 为了保证与其他模型进行公平的比较，我们尽量复用了模型公开测评时报告的分数. 在缺少结果的数据集上对已经公开预训练权重的模型进行测评，对于规模相近的模型测评时使用与微调玲珑时一样的策略和模板，对于大模型采用不微调直接测评的方式.

3.4.2 实验结果

如表9所示，总体来说玲珑在各个任务和数据集上均表现出了强大的性能. 和规模相近的预训练模型相比，比如ERNIE 3.0 XBase和mT5-small，玲珑的性能更具优势. 虽然这些规模相似的预训练语言模型同样针对下游任务进行了微调，然而它们只能在少数任务上获得与玲珑相似的性能，而在其他任务上表现不佳. 例如，CPT系列模型（CPT_g/CPT_u）在文本摘要任务中表现不错，但在数学推理任务上表现非常差. 此外，玲珑在经过微调之后可以获得与其他大型预训练语言模型可比的结果. 例如，mT5 XXL和CPM-2分别具有130亿个和110亿个参数，但在所有任务上，微调后的玲珑都具有更好的性能. 这也说明了这些在大规模语料库上训练参数超过100亿的模型的零样本能力仍然有所欠缺，不能直接应用于下游任务，需要进行微调. 玲珑仅有3.17亿个参数，微调玲珑更容易也更节约资源，使得玲珑相比大模型更适合现实世界应用.

表 9 不同语言模型在6个下游任务的性能

Table 9. Performance of Different Language Models on Six Downstream Tasks %

模型	规模	文本摘要		基于结构化数据的文本生成	问答	中文分词		句子对分类	数学推理	阅读理解
模型	规模	CEPSUM 2.0	LCSTS	AdGen	KBQA	Weibo	MSR	LCQMC	Math23K	CMRC2018
ZEN 2.0^[41]	233M^*	-	-	-	-	-	98.35	88.81	-	70.77
ERNIE 3.0 XBase^[42]	280M	2.35†	18.86†	7.80†	0†	50.55†	70.37†	89.06†	0.70†	75.99
mT5-small^[43]	300M	9.02†	33.10	10.20	0†	49.18†	49.32†	82.10	18.40	0.90
CPT_g^[44]	393M	26.26†	42.80	10.70	84.38†	-	-	90.68†	36.90†	-
CPT_u^[44]	393M	-	-	-	-	42.00†	98.51†	91.29†	-	68.80
BART^[5]	406M	22.16†	40.90	12.68†	84.68†	46.14†	44.42†	90.93†	49.10†	61.32
GLM	335M	17.27†	34.25†	1.30†	99.10†	92.70†	89.18†	84.61†	51.70†	70.74
LEBERT^[45]	7.5B^*	-	-	-	-	-	98.69	-	-	-
ERNIE 3.0^[42]	10B	-	48.46	30.16	-	-	-	90.38	75.00	75.30
CPM-2^[14]	11B	0.91‡	35.90	10.60	0‡	34.72‡	33.22‡	89.16	69.37	15.66
mT5-XXL^[43]	13B	0.17‡	34.80	9.80	0‡	42.07‡	43.51‡	88.30	61.60	25.20
Yuan 1.0^[27]	245B	-	-	-	-	-	-	-	76.90	5.58‡
玲珑_F+B	317M	26.04	45.98	20.57	84.90	95.20	96.14	91.60	75.60	78.20
注：在“规模”列中，“50M”表示该模型有5亿（50 million）个参数，“10B”表示该模型有100亿（10 billion）个参数. “∗”表示该模型规模由估算得到；“†”表示该结果由本文使用与玲珑相同策略进行微调和评估得到；“‡”表示该结果在零样本模式（未微调）下评估得到；“-”表示由于模型无法在对应任务/基准上评估或者由于模型权重不公开，因此无法得到结果. 黑体数值表示最优结果.

下载: 导出CSV

| 显示表格

接下来，我们将详细讨论玲珑在每个任务上的性能. 在文本摘要任务中，玲珑具有正确生成简洁摘要的能力. 然而，由于数据本身的限制，玲珑的能力不能得到完全地发挥和评测. 例如一些标注摘要是输入文本的标题，导致标注摘要不能完全覆盖文本表达的信息，或者标注摘要包含输入文本中不存在的额外背景信息. 在任务更加明确的情况下（即微调数据更贴合任务目标，不包含干扰信息），玲珑的能力有希望进一步得到提升. 此外，由于ERNIE 3.0具有大规模模型参数和为特定任务设计的表示模块，因此在文本摘要任务上比玲珑具有优势. 这提示我们设计任务特定的模块有助于提升模型在特定任务上的表现.

在基于结构化数据的文本生成任务中，玲珑的性能明显优于其他规模相似的模型. 由于使用了基于汉字的输入策略，玲珑能够从较短的结构化输入中提取有效信息来生成连贯、有趣的长篇广告文本.

在问答任务中，玲珑准确率达到84.90%，且所有模型使用我们设计的模板进行微调后均能取得84%以上的分数. 这充分说明模型成功地从自然语言问题中提取到了解决问题的关键信息，也说明了我们为NLPCC2018-KBQA数据集设计的关系提取策略和提示模板的有效性. GLM模型在规模相近的情况下可以取得99.10%的分数，是所有模型中效果最好的. 经过分析，我们认为这得益于GLM的训练目标与自回归语言模型不同，GLM将输入文本中的1个或多个词用1个特殊标记进行替换，然后训练模型预测被替换掉的词. 这种训练目标使得GLM具有更强的预测实体或者关系（实体与关系均为完整词语）的能力.

在中文分词任务中，玲珑取得了与其他预训练语言模型可以相比的结果，表明了通过构造合适的模板，小型预训练语言模型可以很好地执行分词任务. 然而，玲珑的性能略低于小规模的ZEN 2.0和CPT_u，尤其是在MSR数据集上. 经过分析，我们认为ZEN 2.0和CPT_u执行任务时使用的输入模板是由研究人员通过实验或经验选择的更适合其参数和模型结构的. 因为在使用不同模板时，这些模型的性能波动很大，正如CPT_u在微博数据集上的得分远远低于在MSR数据集上的得分. 此外，大模型LEBERT在MSR数据集上取得了较好的性能. 然而相对于LEBERT，玲珑仍具有推理速度快、开销低的优势，更具成本收益.

在句子对分类任务中，由于LCQMC数据集中每个句子对中的句子通常在结构和措辞上非常相似. 例如，在句子对“古诗咏柳中的咏字是什么意思？”和“古诗咏柳是什么意思？”中，只有3个字不同，但是这2个句子的含义是完全不同的. 语言模型必须深入理解句子的含义才能区分它们. 整体来看，玲珑的性能略高于所有其他模型，无论是小规模还是大规模的预训练语言模型都证明玲珑能够很好地理解文本意图.

数学推理任务中，玲珑准确率达到了75.60%，接近ERNIE 3.0和Yuan 1.0的性能，表明了即使是小规模模型也具有一定的解决数学问题的能力. 此外，mT5-small和mT5-XXL的结果表明，随着模型尺寸增加，模型性能仍有一定改进空间. 即便如此，玲珑的准确率显著高于mT5-small模型，甚至优于相同结构且具有更大参数量的mT5-XXL模型.

在阅读理解任务上，结合了双向信息的玲珑相对其他测评方法来说具有较大优势. 结合表7中的结果来看，仅使用玲珑时可以取得与其他测评方法可比较的结果，通过结合玲珑与玲珑_B，模型整体性能有了显著提升，2个模型起到了较好的互补作用.

玲珑仍然有很大改进空间，特别是在需要更具体知识或更好微调策略的任务中. 尽管如此，实验结果表明，玲珑是一个很有前景的模型，即使与规模大得多的模型相比，它也可以在广泛的自然语言处理任务中获得有竞争力的性能.

4. 总　　结

在本文介绍的工作中，我们训练了一个基于自回归的中文预训练语言模型——玲珑，该模型具有约3.17亿个参数. 玲珑利用经过完整清洗流程处理的高质量语料库进行训练，训练数据采用了基于汉字的标记化策略. 在预训练阶段我们还引入一个新颖的反向训练流程，得到了玲珑_B. 通过将玲珑与玲珑_B结合来完成下游任务，使得自回归语言模型具有了处理双向信息的能力. 大量实验结果表明，与相近规模的预训练语言模型相比，玲珑适用于更广泛的下游任务且具有更加优秀的性能；与更大的模型相比，玲珑在自然语言处理下游任务中也可以获得相当的性能，而玲珑以较少的参数量在使用时具有更小的资源需求和更少的推理时间. 总体来讲，玲珑为后续研究奠定了良好的基础.

5. 局限性分析

尽管玲珑在低资源环境中可以很好地处理各种下游任务，但它仍有进一步优化的空间.

首先，本文使用了手工制定的方法为每个任务构建模板. 如何自动构建模板或如何使用连续/自动提示（soft prompt）来帮助模型更加自适应地获得更好的结果仍然是值得研究的问题. 此外，一些大型的英文基座预训练语言模型，如GPT-3，在零样本模式下取得了优异效果，而现有中文基座预训练语言模型，无论其规模大小，仍然需要微调才能较好地适用于下游任务. 因此，积累足够的高质量训练数据或设计更好的模型结构，使中文预训练语言模型能够在零样本模式下取得优异的结果也是至关重要的. 最后，玲珑生成的文本在语法或道德上并不总是合适的，生成的文本中可能包含冒犯性词语或不恰当的短语. 如何通过控制模型学习不当知识或者通过对模型生成的结果进行一定处理，使生成的文本更容易被直接使用也是个迫在眉睫的课题.

作者贡献声明：李东闻负责算法和实验方案设计、部分实验验证以及论文撰写；钟震宇负责算法和实验方案设计、部分实验验证并修改论文；孙羽菲提供方案及论文指导；申峻宇完成部分实验验证以及数据集收集；马子智完成部分实验；于川越整理实验数据和文献；张玉志提供整体方案及论文指导.

根据IoT-Analytics 的报告，近年来AIoT的设备数目和市场规模均保持年均15%以上的增长，详见https://iot-analytics.com/number-connected-iot-devices/和https://iot-analytics.com/iot-market-size/

当本文分别论述联邦学习和协同推理这2个领域中与AIoT应用场景相关的技术进展时，是从广义的角度来介绍面向AIoT的协同智能；当本文论述这2种技术的联系或涉及两者联合起作用的新的应用形态时，则是从狭义的角度来介绍面向AIoT的协同智能.

攻击者保持模型收敛精度不受显著影响是为了获得有价值的全局模型参数同时防止被发现.

图 1 各领域之间的相互关系

Figure 1. The relationship among the domains

下载: 全尺寸图片幻灯片

图 2 不同的通信拓扑结构^[68,137]

Figure 2. The different communication topology structure^[68,137]

下载: 全尺寸图片幻灯片

表 1 相关综述简介

Table 1 A Brief Summary of Related Surveys

相关综述	AIoT	大模型	联邦学习								协同推理
相关综述	AIoT	大模型	定义	架构	异构	多模态	FCL	FRL	P&S	优化	定义	架构	P&S	优化
文献[1]	●			◐					◐			◐		◐
文献[16]	●										●	●		●
文献[20]			●	●					●	●
文献[22]			●	●					●
文献[25]	●		●	●	◐				●
文献[43]											●	●		●
文献[30]				●	◐				◐	●		●		●
文献[37]	◐	◐	◐	●	◐			◐	◐	◐		◐
本文	●	◐	●	●	●	●	●	◐	●	●	●	●	●	●
注：隐私安全（privacy and pecurity，P&S）；联邦持续学习（federated continual learning，FCL）；联邦强化学习（federated reinforcement learning，FRL）. ◐ 简略介绍；● 详细介绍.

下载: 导出CSV

表 2 联邦学习的算法相关工作总结

Table 2 Summary of Related Works About the Algorithm of Federated Learning

联邦学习算法		适用情形	参考文献
异构环境中的联邦学习	统计异构	不同AIoT端侧设备上样本训练样本分布不同	[61−66]
	设备异构	不同AIoT端侧设备进行本地模型训练的速度不同	[67−70]
	模型异构	统计异构或设备异构环境	[71−75]
联邦多模态学习		AIoT端侧设备收集的数据具有多模态特性	[56, 77, 79−81]
联邦持续学习		AIoT端侧设备的数据分布随时间变化	[58, 85−90]
联邦强化学习		基于联邦学习为AIoT 设备训练决策模型	[59, 82, 91, 93−96]

下载: 导出CSV

表 3 协同推理的算法相关工作总结

Table 3 Summary of Related Works About the Algorithm of Collaborative Inference

主要优化目标	相关工作	模型切分方法	任务调度方法
性能	DeepThings^[49]	卷积层并行	任务窃取
	DeepSlicing^[26]	通信量优化、模型并行	同步开销优化
	IONN^[97]	执行图生成与最短路径搜索
	OFL^[98]	基于层融合的模型切分	动态规划
	PICO^[99]	基于结束片的模型切分	动态规划
	EdgeFlow^[100]	模型并行	线性规划
	IAO^[50]	延迟预测
	Neurosurgeon^[107]	基于延迟预测的模型切分
延迟鲁棒性	DistrEdge^[102]	强化学习
	ICE^[103]	服务质量感知、执行图最短路径搜索
	MTS^[105]	强化学习
能耗	CoEdge^[21]	模型并行	线性规划
	Neurosurgeon^[107]	基于能耗估计的模型切分
	AutoScale^[101]		强化学习

下载: 导出CSV

表 4 面向AIoT的协同智能架构各层次相关工作总结

Table 4 Summary of Related Works at Different Levels of AIoT-Oriented Collaborative Intelligence Achitecture

架构层级	分类		优势	劣势	参考文献
架构层级	分类		优势	劣势	联邦学习	协同推理
深度学习加速器	GPU		高性能、软件栈成熟、兼顾通用计算任务	面积大、能耗高	[27, 47]	[41, 97, 101]
深度学习加速器	深度学习处理器		面积较小、能效比高	任务类型相对单一	[114−115]	[111, 116]
深度学习编译	即时编译		可以获取运行时信息^[109]	增加启动开销^[171]	[114−115]
深度学习编译	预编译		更大的静态搜索空间、支持交叉编译等^[109]	无法获取运行时信息	[127]*	[116, 130]
深度学习框架	AIoT联邦学习框架	FedML	基于MPI和MQTT的分布式通信协议支持、支持多种通信拓扑结构、对真实 AIoT设备的实验平台支持	没有对推理任务提供专门支持和优化	[73, 137]
	AIoT联邦学习框架	Flower	支持大量异构端侧设备和各种通信状况的模拟	没有对推理任务提供专门支持和优化	[138]
	轻量级端侧框架	TensorFlow Lite	支持嵌入式设备的轻量级运行时	一般只用于端侧设备	[136]	[21, 111]
	轻量级端侧框架	MNN	基于半自动搜索的最佳执行策略搜索	一般只用于端侧设备		[141]
	端边云通用框架	PyTorch	编程风格简洁、多进程并行计算和通信优化	嵌入式设备等资源受限设备难以支持	[72, 77, 134]	[99−100, 103]
		TensorFlow	良好的可扩展性、调度策略优化	嵌入式设备等资源受限设备难以支持	[85]	[50]
		TensorRT	高性能、高吞吐量推理	没有对训练任务提供支持		[139]
设备间通信	通信拓扑结构^{[30, 137]}	中心化	结构简单、易于管理	中心节点通信瓶颈，可能依赖第三方提供的计算服务	[44]	[97]
		层次化	缓解中心节点通信瓶颈	增加额外通信层级，可能依赖第三方提供的计算服务	[143]	[144]
		去中心化	P2P直接通信、系统自治、拜占庭容错	共识开销，系统管理复杂	[147]	[49]
		混合式	可以兼具多种通信拓扑结构的优点	结构和系统管理较为复杂	[68]
	减少通信的次数		降低通信开销	一般只用于联邦学习场景	[143, 148, 150−151, 172]
	减少每次通信的数据量		降低通信开销	可能降低模型精度	[152−157]	[144, 158]
	通信干涉管理		减少通信干涉的负面影响	对Wi-Fi 6等新通信网络需要进一步研究	[159−161]
多设备协同^[16]	端云协同		云服务器计算、存储资源充足，有利于数据的长期存储	云服务器带宽受限、广域网不稳定、隐私安全问题	[166−167]	[41, 101]
	端边协同		降低通信延迟	边缘服务器计算和存储资源受限、隐私安全问题	[58]	[97]
	端边云协同		减轻云服务器计算和通信负担	隐私安全问题	[143]	[144]
	本地协同		高速和稳定的数据传输、隐私安全保障	只适用于封闭场景，不适用于开放场景		[21, 26, 49, 111]
	大小模型协同		既可以使用大模型中包含的丰富知识来提供高质量的服务，又可以使用小模型来提升服务的响应速度	大小模型之间的知识传递需要进一步研究	[72, 170]	[141]
注：“*”表示潜在解决方案.

下载: 导出CSV

表 5 面向AIoT的协同智能面临的攻击和对应防御方法总结

Table 5 Summary of the Attacks That the AIoT-Oriented Collaborative Intelligence Faces and the Corresponding Defense Methods

攻击类型			攻击面	攻击场景	参考文献	防御机制
数据隐私	训练样本相关	模型反演攻击	模型参数、模型输出	联邦学习、协同推理	[52, 119, 174−175]	混淆^[186-188]、同态加密^{[119, 192, 195]}、多方安全计算^{[51, 197-198, 230-231]}、可信执行环境^{[199, 201-203]}
		成员推断攻击	模型参数、模型输出	联邦学习、协同推理	[53, 176−177]
		性质推断攻击	模型参数	联邦学习	[178−179]
	模型参数相关	模型提取攻击	模型输出	联邦学习、协同推理	[52, 173−174, 181]	异常检测^[205]、改变输出^[208-210]
	模型参数相关	free-rider攻击	模型参数	联邦学习	[183−184]	异常检测^[206-207]、区块链^{[68, 147]}
模型安全	投毒攻击		训练数据、模型参数	联邦学习、协同推理	[214−215]	异常检测^{[120, 226]}
模型安全	逃逸攻击		模型输出、模型参数	联邦学习、协同推理	[213, 217−219]	异常检测^[227-228]、对抗学习^[224]、混淆^[229]

下载: 导出CSV

参考文献(238)

[1]	Chang Zhuoqing, Liu Shubo, Xiong Xingxing, et al. A survey of recent advances in edge-computing-powered artificial intelligence of things[J]. IEEE Internet of Things Journal, 2021, 8(18): 13849−13875 doi: 10.1109/JIOT.2021.3088875
[2]	Wang Wenbo, Zhang Yingfeng, Gu Jinan, et al. A proactive manufacturing resources assignment method based on production performance prediction for the smart factory [J]. IEEE Transactions on Industrial Informatics, 18(1): 46−55
[3]	Yu Liang, Xie Weiwei, Xie Di, et al. Deep reinforcement learning for smart home energy management[J]. IEEE Internet of Things Journal, 2020, 7(4): 2751−2762 doi: 10.1109/JIOT.2019.2957289
[4]	Shaikh F K, Karim S, Zeadally S, et al. Recent trends in Internet-of-things-enabled sensor technologies for smart agriculture[J]. IEEE Internet of Things Journal, 2022, 9(23): 23583−23598 doi: 10.1109/JIOT.2022.3210154
[5]	Zhao Jianxin, Chang Xinyu, Feng Yanhao, et al. Participant selection for federated learning with heterogeneous data in intelligent transport system[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 24(1): 1106−1115
[6]	Analitics IoT. IoT 2020 in review: The 10 most relevant IoT developments of the year [EB/OL]. (2021-01-12)[2024-07-16]. https://iot-analytics.com/iot-2020-in-review/
[7]	Analitics IoT. IoT 2021 in review: The 10 most relevant IoT developments of the year [EB/OL]. (2022-01-11)[2024-07-16]. https://iot-analytics.com/iot-2021-in-review/
[8]	张玉清,周威,彭安妮. 物联网安全综述[J]. 计算机研究与发展,2017,54(10):2130−2143 doi: 10.7544/issn1000-1239.2017.20170470 Zhang Yuqing, Zhou Wei, Peng Anni. Survey of Internet of things security[J]. Journal of Computer Research and Development, 2017, 54(10): 2130−2143(in Chinese) doi: 10.7544/issn1000-1239.2017.20170470
[9]	Dong Yudi, Yao Yudong. Secure mmwave-radar-based speaker verification for IoT smart home[J]. IEEE Internet of Things Journal, 2021, 8(5): 3500−3511 doi: 10.1109/JIOT.2020.3023101
[10]	Liu Yangyang, Chang Shuo, Wei Zhiqing, et al. Fusing mmwave radar with camera for 3-D detection in autonomous driving[J]. IEEE Internet of Things Journal, 2022, 9(20): 20408−20421 doi: 10.1109/JIOT.2022.3175375
[11]	Zhang Chaoyun, Patras P, Haddadi H, et al. Deep learning in mobile and wireless networking: A survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21((3): ): 2224−2287
[12]	He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition[C]// Proc of the 2016 IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
[13]	Amodei D, Ananthanarayanan S, Anubhai R, et al. Deep speech 2: End-to-end speech recognition in English and Mandarin[C]// Proc of the 33rd Int Conf on Machine Learning. New York: ACM, 2016: 173–182
[14]	Otter D W, Medina J R, Kalita J K. A survey of the usages of deep learning for natural language processing[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(2): 604−624 doi: 10.1109/TNNLS.2020.2979670
[15]	Hasselt H V, Guez A, Silver D. Deep reinforcement learning with double Q-learning[C]// Proc of the 30th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2016: 2094−100
[16]	Ren Weiqing, Qu Yuben, Dong Chao, et al. A survey on collaborative DNN inference for edge intelligence[J]. Machine Intelligence Research, 2023, 20(3): 370−395 doi: 10.1007/s11633-022-1391-7
[17]	EU. Regulation (EU) 2016/679 of the European parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) [EB/OL]. (2018-05-25) [2024-07-16]. https://gdpr-info.eu/
[18]	Li Mu, Andersen D G, Park J W, et al. Scaling distributed machine learning with the parameter server[C]// Proc of the 11th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2014: 583–598
[19]	Teerapittayanon S, Mcdanel B, Kung H T. Distributed deep neural networks over the cloud, the edge and end devices[C]// Proc of the 37th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2017: 328−339
[20]	Lim W Y B, Luong N C, Hoang D T, et al. Federated learning in mobile edge networks: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2019, 22: 2031−2063
[21]	Zeng Liekang, Chen Xu, Zhou Zhi, et al. CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices[J]. IEEE/ACM Transactions on Networking, 2021, 29(2): 595−608 doi: 10.1109/TNET.2020.3042320
[22]	Yang Qiang, Liu Yang, Chen Tianjian, et al. Federated machine learning: Concept and applications [J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10(2): Article 12
[23]	朱泓睿,元国军,姚成吉,等. 分布式深度学习训练网络综述[J]. 计算机研究与发展,2021,58(1):98−115 doi: 10.7544/issn1000-1239.2021.20190881 Zhu Hongrui, Yuan Guojun, Yao Chengji, et al. Survey on network of distributed deep learning training[J]. Journal of Computer Research and Development, 2021, 58(1): 98−115 (in Chinese) doi: 10.7544/issn1000-1239.2021.20190881
[24]	Nguyen D C, Ding Ming, Pathirana P N, et al. Federated learning for internet of things: A comprehensive survey[J]. IEEE Communications Surveys & Tutorials, 2021, 23(3): 1622−1658
[25]	Khan L U, Saad W, Han Zhu, et al. Federated learning for Internet of things: Recent advances, taxonomy, and open challenges[J]. IEEE Communications Surveys & Tutorials, 2021, 23(3): 1759−1799
[26]	Zhang Shuai, Zhang Sheng, Qian Zhuzhong, et al. DeepSlicing: Collaborative and adaptive CNN inference with low latency[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(9): 2175−2187 doi: 10.1109/TPDS.2021.3058532
[27]	Mao Yunlong, Hong Wenbo, Wang Heng, et al. Privacy-preserving computation offloading for parallel deep neural networks training[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(7): 1777−1788
[28]	Bommasani R, Hudson D, Adeli E, et al. On the opportunities and risks of foundation models [J]. arXiv preprint, arXiv: 2108.07258, 2021
[29]	Cao Yihan, Li Siyu, Liu Yixin, et al. A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT [J]. arXiv preprint, arXiv: 2303.04226, 2023
[30]	Zhou Zhi, Chen Xu, Li En, et al. Edge intelligence: Paving the last mile of artificial intelligence with edge computing[J]. Proceedings of the IEEE, 2019, 107: 1738−1762 doi: 10.1109/JPROC.2019.2918951
[31]	陈云霁,李玲,李威,等. 智能计算系统[M]. 北京:机械工业出版社,2020 Chen Yunji, Li Ling, Li Wei et al. AI Computing System [M] Beijing: China Machine Press, 2020(in Chinese)
[32]	Poirot M G, Vepakomma P, Chang Ken, et al. Split learning for collaborative deep learning in healthcare [J]. arXiv preprint, arXiv: 1912.12115, 2019
[33]	Zhuang Fuzhen, Qi Zhiyuan, Duan Keyu, et al. A comprehensive survey on transfer learning[J]. Proceedings of the IEEE, 2021, 109(1): 43−76 doi: 10.1109/JPROC.2020.3004555
[34]	Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]// Proc of the 34th Int Conf on Machine Learning. New York: ACM, 2017: 1126–1135
[35]	Yao Jiangchao, Wang Feng, Jia Kunyang, et al. Device-cloud collaborative learning for recommendation[C]// Proc of the 27th ACM SIGKDD Conf on Knowledge Discovery & Data Mining. New York: ACM, 2021: 3865−3874
[36]	Chen Zeyuan, Yao Jiangchao, Wang Feng, et al. Mc²-SF: Slow-fast learning for mobile-cloud collaborative recommendation [J]. arXiv preprint, arXiv: 2109.12314, 2021
[37]	Yao Jiangchao, Zhang Shengyu, Yao Yang, et al. Edge-cloud polarization and collaboration: A comprehensive survey for AI[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(7): 6866−6886
[38]	Zhao Yuxi, Gong Xiaowen, Mao Shiwen. Truthful incentive mechanism for federated learning with crowdsourced data labeling[C]// Proc of the 2023 IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2023: 1−10
[39]	Zhang Tuo, Feng Tiantian, Alam S, et al. GPT-FL: Generative pre-trained model-assisted federated learning [J]. arXiv preprint, arXiv: 2306.02210, 2023
[40]	郭斌,刘思聪,刘琰,等. 智能物联网:概念、体系架构与关键技术[J]. 计算机学报,2023,46(11): 2259−2278 Guo Bin, Liu Sicong, Liu Yan, et al. AIoT: The concept, architecture and key techniques[J]. Chinese Journal of Computers, 2023, 46(11): 2259−2278 (in Chinese)
[41]	Kang Yiping, Hauswald J, Gao Cao, et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge[C] //Proc of the 22nd Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2016: 615−629
[42]	Pan Jingyu, Chang C C, Xie Zhiyao, et al. Towards collaborative intelligence: Routability estimation based on decentralized private data[C] //Proc of the 59th ACM/IEEE Design Automation Conf. New York: ACM, 2017: 961−966
[43]	王睿,齐建鹏,陈亮,等. 面向边缘智能的协同推理综述[J]. 计算机研究与发展,2023,60(2):398−414 doi: 10.7544/issn1000-1239.202110867 Wang Rui, Qi Jianpeng, Chen Liang, et al. Survey of collaborative inference for edge intelligence[J]. Journal of Computer Research and Development, 2023, 60(2): 398−414 (in Chinese) doi: 10.7544/issn1000-1239.202110867
[44]	Mcmahan H B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data[C]// Proc of the 20th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2017: 1273−1282
[45]	Kairouz P, Mcmahan H B, Avent B, et al. Advances and open problems in federated learning[J]. Foundation Trends in Machine Learning, 2021, 14(1): 1−210
[46]	Hinton G E, Vinyals O, Dean J. Distilling the knowledge in a neural network [J]. arXiv preprint, arXiv: 1503.02531, 2015
[47]	Thapa C, Chamikara M A P, Camtepe S, et al. SplitFed: When federated learning meets split learning[C]// Proc of the 36th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2022: 8485−8493
[48]	Lu Ying, Luo Lingkun, Huang Di, et al. Knowledge transfer in vision recognition: A survey [J]. ACM Computing Surveys, 2020, 53(2): Article 37
[49]	Zhao Zhuoran, Barijough K M, Gerstlauer A. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2348−2359 doi: 10.1109/TCAD.2018.2858384
[50]	Tang Xin, Chen Xu, Zeng Liekang, et al. Joint multiuser dnn partitioning and computational resource allocation for collaborative edge intelligence[J]. IEEE Internet of Things Journal, 2021, 8(12): 9511−9522 doi: 10.1109/JIOT.2020.3010258
[51]	Huang P H, Tu C H, Chung S M, et al. SecureTVM: A TVM-based compiler framework for selective privacy-preserving neural inference[J]. ACM Transactions on Design Automation of Electronic Systems, 2023, 28(4): 1−28
[52]	He Zecheng, Zhang Tianwei, Lee R B. Model inversion attacks against collaborative inference[C]// Proc of the 35th Annual Computer Security Applications Conf. New York: ACM, 2019: 148–162
[53]	Chen Hanxiao, Li Hongwei, Dong Guishan, et al. Practical membership inference attack against collaborative inference in industrial IoT[J]. IEEE Transactions on Industrial Informatics, 2022, 18(1): 477−487 doi: 10.1109/TII.2020.3046648
[54]	Ayad A, Renner M, Schmeink A. Improving the communication and computation efficiency of split learning for IoT applications[C/OL]// Proc of the 2021 IEEE Global Communications Conf. Piscataway, NJ: IEEE, 2021[2024-08-17]. https://ieeexplore.ieee.org/document/9685493
[55]	Li Tian, Sahu A K, Talwalkar A, et al. Federated learning: Challenges, methods, and future directions[J]. IEEE Signal Processing Magazine, 2020, 37(3): 50−60 doi: 10.1109/MSP.2020.2975749
[56]	Zhao Yuchen, Barnaghi P, Haddadi H. Multimodal federated learning on IoT data[C]// Proc of the 7th IEEE/ACM Int Conf on Internet-of-Things Design and Implementation. Piscataway, NJ: IEEE, 2022: 43−54
[57]	Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521−3526 doi: 10.1073/pnas.1611835114
[58]	Zhang Zhouyangzi, Guo Bin, Sun Wen, et al. Cross-FCL: Toward a cross-edge federated continual learning framework in mobile edge computing systems[J]. IEEE Transactions on Mobile Computing, 2022, 23(1): 313−326
[59]	Zhuo H H, Feng Wenfeng, Lin Yufeng, et al. Federated deep reinforcement learning [J]. arXiv preprint, arXiv: 1901.08277, 2019
[60]	Kingma D P, Ba J. Adam: A method for stochastic optimization[C/OL]// Proc of the 3rd Int Conf on Learning Representations. Washington: ICLR, 2015[2024-08-16]. https://www.semanticscholar.org/reader/a6cb366736791bcccc5c8639de5a8f9636bf87e8
[61]	Zhang Jianyi, Li Ang, Tang Minxue, et al. Fed-CBS: A heterogeneity-aware client sampling mechanism for federated learning via class-imbalance reduction[C]// Proc of the 40th Int Conf on Machine Learning. New York: ACM, 2023: Article 1734
[62]	Duan Moming, Liu Duo, Chen Xianzhang, et al. Self-balancing federated learning with global imbalanced data in mobile systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 32(1): 59−71
[63]	Li Tian, Sahu A K, Zaheer M, et al. Federated optimization in heterogeneous networks[C/OL]// Proc of the 3rd Conf on Machine Learning and Systems. Indio, CA: MLSys. org, 2020[2024-08-16]. https://proceedings.mlsys.org/paper_files/paper/2020/hash/1f5fe83998a09396ebe6477d9475ba0c-Abstract.html
[64]	Karimireddy S P, Kale S, Mohri M, et al. SCAFFOLD: Stochastic controlled averaging for federated learning[C]// Proc of the 37th Int Conf on Machine Learning. New York: ACM, 2020: 5132−5143
[65]	Arivazhagan M G, Aggarwal V, Singh A K, et al. Federated learning with personalization layers [J]. arXiv preprint, arXiv: 1912.00818, 2019
[66]	Li Tian, Hu Shengyuan, Beirami A, et al. Ditto: Fair and robust federated learning through personalization[C]// Proc of the 38th Int Conf on Machine Learning. New York: ACM, 2021: 6357−6368
[67]	Xie Cong, Koyejo O, Gupta I. Asynchronous federated optimization [J]. arXiv preprint, arXiv: 1903.03934, 2019
[68]	Lu Yunlong, Huang Xiaohong, Zhang Ke, et al. Blockchain empowered asynchronous federated learning for secure data sharing in Internet of vehicles[J]. IEEE Transactions on Vehicular Technology, 2020, 69(4): 4298−4311 doi: 10.1109/TVT.2020.2973651
[69]	Sun Yuchang, Shao Jiawei, Mao Yuyi, et al. Semi-decentralized federated edge learning with data and device heterogeneity[J]. IEEE Transactions on Network and Service Management, 2023, 20(2): 1487−1501 doi: 10.1109/TNSM.2023.3252818
[70]	Zhang Feilong, Liu Xianming, Lin Shiyi, et al. No one idles: Efficient heterogeneous federated learning with parallel edge and server computation[C]// Proc of the 40th Int Conf on Machine Learning. New York: ACM, 2023: 41399−41413
[71]	Diao Enmao, Ding Jie, Tarokh V. HeteroFL: Computation and communication efficient federated learning for heterogeneous clients[C] // Proc of the 2021 Int Conf on Learning Representations. Washington: ICLR, 2021: 1−24
[72]	Alam S, Liu Luyang, Yan Ming, et al. FedRolex: Model-heterogeneous federated learning with rolling sub-model extraction[C] // Proc of the 36th Annual Conf on Neural Information Processing Systems. Cambridge, MA: MIT, 2022: 29677−29690
[73]	He Chaoyang, Annavaram M, Avestimehr S. Group knowledge transfer: Federated learning of large CNNs at the edge[C]// Proc of the 34th Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2020: Article 1180
[74]	Itahara S, Nishio T, Koda Y, et al. Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data[J]. IEEE Transactions on Mobile Computing, 2023, 22(1): 191−205 doi: 10.1109/TMC.2021.3070013
[75]	Lin Tao, Kong Lingjing, Stich S U, et al. Ensemble distillation for robust model fusion in federated learning[C]// Proc of the 34th Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2020: Article 198
[76]	Lin Yiming, Gao Yuan, Gong Maoguo, et al. Federated learning on multimodal data: A comprehensive survey[J]. Machine Intelligence Research, 2023, 20(4): 539−553 doi: 10.1007/s11633-022-1398-0
[77]	Xiong Baochen, Yang Xiaoshan, Qi Fan, et al. A unified framework for multi-modal federated learning [J]. Neurocomputing, 2022, 480: 110−118
[78]	Lu Jiasen, Yang Jianwei, Batra D, et al. Hierarchical question-image co-attention for visual question answering[C]// Proc of the 30th Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2016: 289−297
[79]	Liu Fenglin, Wu Xian, Ge Shen, et al. Federated learning for vision-and-language grounding problems[C]// Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 11572−11579
[80]	Chen Jiayi, Zhang Aidong. FedMSplit: Correlation-adaptive federated multi-task learning across multimodal split networks[C]// Proc of the 28th ACM SIGKDD Conf on Knowledge Discovery and Data Mining. New York: ACM, 2022: 87–96
[81]	Zhang Rongyu, Chi Xiaowei, Liu Guiliang, et al. Unimodal training-multimodal prediction: Cross-modal federated learning with hierarchical aggregation [J]. arXiv preprint, arXiv: 2303.15486, 2023
[82]	Liu Boyi, Wang Lujia, Liu Ming. Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems[J]. IEEE Robotics and Automation Letters, 2019, 4(4): 4555−4562 doi: 10.1109/LRA.2019.2931179
[83]	Jiang Ziyue, Ren Yi, Lei Ming, et al. FedSpeech: Federated text-to-speech with continual learning[C]// Proc of the 30th Int Joint Conf on Artifical Intelligence. Berlin: Springer, 2021: 3829−3835
[84]	Hung S C Y, Tu Chenghao, Wu Chengen, et al. Compacting, picking and growing for unforgetting continual learning[C]// Proc of the 33rd Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2019: Article 1225
[85]	Usmanova A, Portet F, Lalanda P, et al. Federated continual learning through distillation in pervasive computing[C]// Proc of the 2022 IEEE Int Conf on Smart Computing. Piscataway, NJ: IEEE, 2022: 86−91
[86]	Yoon J H, Jeong W Y, Lee G W, et al. Federated continual learning with weighted inter-client transfer[C]// Proc of the 38th Int Conf on Machine Learning. New York: PMLR, 2021: 12073−12086
[87]	Mori J, Teranishi I, Furukawa R. Continual horizontal federated learning for heterogeneous data[C/OL]// Proc of the 2022 Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2022[2024-08-16]. https://www.semanticscholar.org/reader/3674cbf1900f748e5d1e981f296790256989a62e
[88]	Hendryx S M, Kc D R, Walls B, et al. Federated reconnaissance: Efficient, distributed, class-incremental learning [J]. arXiv preprint, arXiv: 2109.00150, 2021
[89]	Xu Chencheng, Hong Zhiwei, Huang Minlie, et al. Acceleration of federated learning with alleviated forgetting in local training[C/OL]// Proc of the 10th Int Conf on Learning Representations. Washington: ICLR, 2022[2024-07-30]. https://openreview.net/pdf?id=541PxiEKN3F
[90]	Dong Jiahua, Wang Lixu, Fang Zhen, et al. Federated class-incremental learning[C]// Proc of the 2022 IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2022: 10154−10163
[91]	Wang Tianyu, Liang Teng, Li Jun, et al. Adaptive traffic signal control using distributed MARL and federated learning[C]// Proc of the 20th IEEE Int Conf on Communication Technology. Piscataway, NJ: IEEE, 2020: 1242−1248
[92]	Liu Haotian, Wu Wenchuan. Federated reinforcement learning for decentralized voltage control in distribution networks[J]. IEEE Transactions on Smart Grid, 2022, 13(5): 3840−3843 doi: 10.1109/TSG.2022.3169361
[93]	Rezazadeh F, Bartzoudis N. A federated DRL approach for smart micro-grid energy control with distributed energy resources[C]// Proc of the 27th IEEE Int Workshop on Computer Aided Modeling and Design of Communication Links and Networks. Piscataway, NJ: IEEE, 2022: 108−114
[94]	Wang Xiaofei, Wang Chenyang, Li Xiuhua, et al. Federated deep reinforcement learning for Internet of things with decentralized cooperative edge caching[J]. IEEE Internet of Things Journal, 2020, 7(10): 9441−9455 doi: 10.1109/JIOT.2020.2986803
[95]	Yu Shuai, Chen Xu, Zhou Zhi, et al. When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network[J]. IEEE Internet of Things Journal, 2021, 8(4): 2238−2251 doi: 10.1109/JIOT.2020.3026589
[96]	Wang Xiaoding, Hu Jia, Lin Hui, et al. QoS and privacy-aware routing for 5G-enabled industrial Internet of things: A federated reinforcement learning approach[J]. IEEE Transactions on Industrial Informatics, 2022, 18(6): 4189−4197 doi: 10.1109/TII.2021.3124848
[97]	Jeong H J, Lee H J, Shin C H, et al. IONN: Incremental offloading of neural network computations from mobile devices to edge servers[C] //Proc of the 2018 ACM Symp on Cloud Computing. New York: ACM, 2018: 401−411
[98]	Zhou Li, Samavatian M H, Bacha A, et al. Adaptive parallel execution of deep neural networks on heterogeneous edge devices[C] // Proc of the 4th ACM/IEEE Symp on Edge Computing. New York: ACM, 2019: 195−208
[99]	Yang Xiang, Xu Zikang, Qi Qi, et al. PICO: Pipeline inference framework for versatile CNNs on diverse mobile devices[J]. IEEE Transactions on Mobile Computing, 2023, 23(4): 2712−2730
[100]	Hu Chenghao, Li Baochun. Distributed inference with deep learning models across heterogeneous edge devices[C]// Proc of the 2022 IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2022: 330−339
[101]	Kim Y G, Wu C J. AutoScale: Energy efficiency optimization for stochastic edge inference using reinforcement learning[C]// Proc of the 53rd Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2020: 1082−1096
[102]	Hou Xueyu, Guan Yongjie, Han Tao, et al. DistrEdge: Speeding up convolutional neural network inference on distributed edge devices[C]// Proc of the 2022 IEEE Int Parallel and Distributed Processing Symp. Piscataway, NJ: IEEE, 2022: 1097−1107
[103]	Fu Kaihua, Shi Jiuchen, Chen Quan, et al. QoS-aware irregular collaborative inference for improving throughput of DNN services[C/OL] //Proc of the 2022 Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2022[2024-08-16]. https://dl.acm.org/doi/10.5555/3571885.3571976
[104]	Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[C/OL]// Proc of the 4th Int Conf on Learning Representations. Washington: ICLR, 2016[2024-07-30]. https://www.semanticscholar.org/reader/024006d4c2a89f7acacc6e4438d156525b60a98f
[105]	Wang Lingdong, Xiang Liyao, Xu Jiayu, et al. Context-aware deep model compression for edge cloud computing[C]// Proc of the 40th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2020: 787−797
[106]	Molina M, Muñoz O, Pascual-Iserte A, et al. Joint scheduling of communication and computation resources in multiuser wireless application offloading[C]// Proc of the 25th IEEE Annual Int Symp on Personal, Indoor, and Mobile Radio Communication. Piscataway, NJ: IEEE, 2014: 1093−1098
[107]	Kang Yiping, Hauswald J, Gao Cao, et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge[C] // Proc of the 22nd Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2017: 615−629
[108]	Zhuang Weiming, Chen Chen, Lyu Lingjuan. When foundation model meets federated learning: Motivations, challenges, and future directions [J]. arXiv preprint, arXiv: 2306.15546, 2023
[109]	Li Mingzhen, Liu Yi, Liu Xiaoyan, et al. The deep learning compiler: A comprehensive survey[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(3): 708−727 doi: 10.1109/TPDS.2020.3030548
[110]	Zeng Qunsong, Du Yuqing, Huang Kaibin, et al. Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing[J]. IEEE Transactions on Wireless Communications, 2021, 20(12): 7947−7962 doi: 10.1109/TWC.2021.3088910
[111]	Han M, Hyun J, Park S, et al. MOSAIC: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference[C]// Proc of the 28th Int Conf on Parallel Architectures and Compilation Techniques. Piscataway, NJ: IEEE, 2019: 165−177
[112]	Chen Yunji, Luo Tao, Liu Shaoli, et al. DaDianNao: A machine-learning supercomputer[C]// Proc of the 47th Annual IEEE/ACM Int Symp on Microarchitecture. Piscataway, NJ: IEEE, 2014: 609−622
[113]	Jouppi N P, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit[C/OL] // Proc of the 44th ACM/IEEE Annual Int Symp on Computer Architecture. New York: ACM, 2017[2024-08-16]. https://dl.acm.org/doi/10.1145/3079856.3080246
[114]	Ro J H, Suresh A T, Wu Ke. FedJAX: Federated learning simulation with JAX [J]. arXiv preprint, arXiv: 2108.02117, 2021
[115]	Lee J Y, Park W P, Mitchell N, et al. JaxPruner: A concise library for sparsity research [J]. arXiv preprint, arXiv: 2304.14082, 2023
[116]	Villarrubia J, Costero L, Igual F D, et al. Improving inference time in multi-TPU systems with profiled model segmentation[C]// Proc of the 31st Euromicro Int Conf on Parallel, Distributed and Network-Based Processing. Piscataway, NJ: IEEE, 2023: 84−91
[117]	Wang Zixiao, Che Biyao, Guo Liang, et al. PipeFL: Hardware/software co-design of an FPGA accelerator for federated learning[J]. IEEE Access, 2022, 10: 98649−98661 doi: 10.1109/ACCESS.2022.3206785
[118]	Li H M, Rieger P, Zeitouni S, et al. FLAIRS: FPGA-accelerated inference-resistant & secure federated learning [J]. arXiv preprint, arXiv: 2308.00553, 2023
[119]	Phong L T, Aono Y, Hayashi T, et al. Privacy-preserving deep learning via additively homomorphic encryption[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(5): 1333−1345 doi: 10.1109/TIFS.2017.2787987
[120]	Nguyen T D, Rieger P, Chen Huili, et al. FLAME: Taming backdoors in federated learning[C]// Proc of the 31st USENIX Security Symp USENIX Association. Berkeley, CA, 2022: 1415−1432
[121]	包云岗,常轶松,韩银和,等. 处理器芯片敏捷设计方法:问题与挑战[J]. 计算机研究与发展,2021,58(6):1131−1145 doi: 10.7544/issn1000-1239.2021.20210232 Bao Yungang, Chang Yisong, Han Yinhe, et al. Agile design of processor chips: Issues and challenges[J]. Journal of Computer Research and Development, 2021, 58(6): 1131−1145 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210232
[122]	王凯帆,徐易难,余子濠,等. 香山开源高性能RISC-V处理器设计与实现[J]. 计算机研究与发展,2023,60(3):476−493 doi: 10.7544/issn1000-1239.202221036 Wang Kaifan, Xu Yinan, Yu Zihao, et al. XiangShan open-source high performance RISC-V processor design and implementation[J]. Journal of Computer Research and Development, 2023, 60(3): 476−493 (in Chinese) doi: 10.7544/issn1000-1239.202221036
[123]	Dhilleswararao P, Boppu S, Manikandan M S, et al. Efficient hardware architectures for accelerating deep neural networks: Survey[J]. IEEE Access, 2022, 10: 131788−131828 doi: 10.1109/ACCESS.2022.3229767
[124]	Zhao Yongwei, Du Zidong, Guo Qi, et al. Cambricon-F: Machine learning computers with fractal von neumann architecture[C]// Proc of the 46th ACM/IEEE Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2019: 788−801
[125]	Chen Tianqi, Moreau T, Jiang Ziheng, et al. TVM: An automated end-to-end optimizing compiler for deep learning[C]// Proc of the 13th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2018: 579–594
[126]	Pytorch. Pytorch on XLA devices [EB/OL]. (2024-07-17)[2023-10-21]. https://pytorch.org/xla/master/
[127]	Pytorch. Aot autograd —How to use and optimize?[EB/OL]. (2023-10-25)[2024-07-17]. https://pytorch.org/functorch/stable/notebooks/aot_autograd_optimizations.html
[128]	Coral. Edge TPU compiler [EB/OL]. (2020-05-15)[2024-07-16]. https://coral.ai/docs/edgetpu/compiler/#help
[129]	Nvidia. Optimizing inference on large language models with NVIDIA tensorrt-LLM, now publicly available [EB/OL]. (2023-10-19)[2024-07-16]. https://developer.nvidia.com/blog/optimizing-inference-on-llms-with-tensorrt-llm-now-publicly-available/
[130]	NVIDIA. TensorRT-LLM [EB/OL]. (2023-10-24)[2024-07-17]. https://github.com/NVIDIA/TensorRT-LLM
[131]	ONNX. ONNX [EB/OL]. (2024-05-24)[2024-07-17]. https://onnx.ai/
[132]	MLIR. Multi-level intermediate representation overview [EB/OL]. (2017-07-17)[2024-07-17]. https://mlir.llvm.org/
[133]	Jin Tian, Bercea G T, Tung D L, et al. Compiling ONNX neural network models using MLIR [J]. arXiv preprint, arXiv: 2008.08272, 2020
[134]	Gao Liang, Li Li, Chen Yingwen, et al. FIFL: A fair incentive mechanism for federated learning[C]// Proc of the 50th Int Conf on Parallel Processing. New York: ACM, 2021: Article 82
[135]	Team Tensorflow Lite. On-device training in tensorflow lite [EB/OL]. (2021-11-09)[2024-07-17]. https://blog.tensorflow.org/2021/11/on-device-training-in-tensorflow-lite.html
[136]	Space Ts2. A comprehensive guide to tensorflow lite’s federated learning [EB/OL]. (2023-04-07)[2024-07-17]. https://ts2.space/en/a-comprehensive-guide-to-tensorflow-lites-federated-learning/
[137]	He Chaoyang, Li Songze, So Jinhyun, et al. FedML: A research library and benchmark for federated machine learning [J]. arXiv preprint, arXiv: 2007.13518, 2020
[138]	Beutel D J, Topal T, Mathur A, et al. Flower: A friendly federated learning research framework [J]. arXiv preprint, arXiv: 2007.14390, 2020
[139]	Jeong E J, Kim J R, Ha S H. TensorRT-based framework and optimization methodology for deep learning inference on jetson boards [J]. ACM Transactions on Embedded Computing Systems, 2022, 21(5): Article 51
[140]	Jiang Xiaotang, Wang Huan, Chen Yiliu, et al. MNN: A universal and efficient inference engine[C/OL]// Proc of the 3rd Conf on Machine Learning and Systems. Austin, Texas: MLSys. org, 2020[2024-07-30]. https://proceedings.mlsys.org/paper_files/paper/2020/file/bc19061f88f16e9ed4a18f0bbd47048a-Paper.pdf
[141]	Lv Chengfei, Niu Chaoyue, Gu Renjie, et al. Walle: An end-to-end, general-purpose, and large-scale production system for device-cloud collaborative machine learning[C/OL]// Proc of the 16th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2022[2024-08-16]. https://www.usenix.org/conference/osdi22/presentation/lv
[142]	Aminabadi R Y, Rajbhandari S, Awan A A, et al. DeepSpeed- inference: Enabling efficient inference of transformer models at unprecedented scale[C/OL]// Proc of the 2022 Int Conf for High Performance Computing, Networking, Storage and Analysis. Piscataway, NJ: IEEE, 2022[2024-08-16]. https://dl.acm.org/doi/abs/10.5555/3571885.3571946
[143]	Liu Lumin, Zhang Jun, Song S H, et al. Client-edge-cloud hierarchical federated learning[C/OL]// Proc of the 2020 IEEE Int Conf on Communications. Piscataway, NJ: IEEE, 2020[2024-08-17]. https://ieeexplore.ieee.org/document/9148862
[144]	Yang Shusen, Zhang Zhanhua, Zhao Cong, et al. CNNPC: End-edge-cloud collaborative cnn inference with joint model partition and compression[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(12): 4039−4056 doi: 10.1109/TPDS.2022.3177782
[145]	Korkmaz C, Kocas H E, Uysal A, et al. Chain FL: Decentralized federated machine learning via blockchain[C]// Proc of the 2nd Int Conf on Blockchain Computing and Applications. Piscataway, NJ: IEEE, 2020: 140−146
[146]	Du Jiangsu, Shen Minghua, Du Yunfei. A distributed in-situ CNN inference system for IoT applications[C]// Proc of the 38th Int Conf on Computer Design. Piscataway, NJ: IEEE, 2020: 279−287
[147]	Lyu L, Yu Jiangshan, Nandakumar K, et al. Towards fair and privacy-preserving federated deep models[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(11): 2524−2541 doi: 10.1109/TPDS.2020.2996273
[148]	Luping Wang, Wei Wang, Li Bo. CMFL: Mitigating communication overhead for federated learning[C]// Proc of the 39th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2019: 954−964
[149]	Yu Hao, Yang Sen, Zhu Shenghuo. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning[C] //Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: 5693−5700
[150]	You Chaoqun, Guo Kun, Feng Gang, et al. Automated federated learning in mobile-edge networks—Fast adaptation and convergence[J]. IEEE Internet of Things Journal, 2023, 10(15): 13571−13586 doi: 10.1109/JIOT.2023.3262664
[151]	Heinbaugh C E, Luz-Ricca E, Shao Huajie. Data-free one-shot federated learning under very high statistical heterogeneity[C/OL] //Proc of the 11th Int Conf on Learning Representations. Washington: ICLR, 2023[2024-07-30]. https://openreview.net/forum?id=_hb4vM3jspB
[152]	Sattler F, Wiedemann S, Müller K R, et al. Robust and communication-efficient federated learning from non-I. I. D. data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(9): 3400−3413 doi: 10.1109/TNNLS.2019.2944481
[153]	Gao Hongchang, Xu An, Huang Heng. On the convergence of communication-efficient local SGD for federated learning[C]// Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 7510−7518
[154]	Hönig R, Zhao Yiren, Mullins R D. DAdaQuant: Doubly-adaptive quantization for communication-efficient federated learning[C] //Proc of the 39th Int Conf on Machine Learning. New York: ACM, 2022: 8852−8866
[155]	Nguyen M D, Lee S M, Pham Q V, et al. HCFL: A high compression approach for communication-efficient federated learning in very large scale IoT networks[J]. IEEE Transactions on Mobile Computing, 2023, 22(11): 6495−6507 doi: 10.1109/TMC.2022.3190510
[156]	Dai Rong, Shen Li, He Fengxiang, et al. DisPFL: Towards communication-efficient personalized federated learning via decentralized sparse training[C]// Proc of the 39th Int Conf on Machine Learning. New York: ACM, 2022: 4587−4604
[157]	Wen Hui, Wu Yue, Li Jingjing, et al. Communication-efficient federated data augmentation on non-IID data[C]// Proc of the 2022 IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops Piscataway, NJ: IEEE, 2022: 3376−3385
[158]	Zhang Zhanhua, Yang Shusen, Zhao Cong, et al. RtCoInfer: Real-time collaborative CNN inference for stream analytics on ubiquitous images[J]. IEEE Journal on Selected Areas in Communications, 2023, 41(4): 1212−1226 doi: 10.1109/JSAC.2023.3242730
[159]	Chen Xu, Jiao Lei, Li Wenzhong, et al. Efficient multi-user computation offloading for mobile-edge cloud computing[J]. IEEE/ACM Transactions on Networking, 2016, 24(5): 2795−2808 doi: 10.1109/TNET.2015.2487344
[160]	Yi Changyan, Cai Jun, Su Zhou. A multi-user mobile computation offloading and transmission scheduling mechanism for delay-sensitive applications[J]. IEEE Transactions on Mobile Computing, 2020, 19(1): 29−43 doi: 10.1109/TMC.2019.2891736
[161]	Ale L H, Zhang Ning, Fang Xiaojie, et al. Delay-aware and energy-efficient computation offloading in mobile-edge computing using deep reinforcement learning[J]. IEEE Transactions on Cognitive Communications and Networking, 2021, 7(3): 881−892 doi: 10.1109/TCCN.2021.3066619
[162]	Mozaffariahrar E, Theoleyre F, Menth M. A survey of wi-fi 6: Technologies, advances, and challenges[J]. Future Internet, 2022, 14(10): 293−345 doi: 10.3390/fi14100293
[163]	Das A K, Roy S, Bandara E, et al. Securing age-of-information (AoI)-enabled 5G smart warehouse using access control scheme[J]. IEEE Internet of Things Journal, 2023, 10(2): 1358−1375 doi: 10.1109/JIOT.2022.3205245
[164]	Mehr H D, Polat H. Human activity recognition in smart home with deep learning approach[C]// Proc of the 7th Int Istanbul Smart Grids and Cities Congress and Fair. Piscataway, NJ: IEEE, 2019: 149−153
[165]	Qi Lianyong, Hu Chunhua, Hu Chunhua, et al. Privacy-aware data fusion and prediction with spatial-temporal context for smart city industrial environment[J]. IEEE Transactions on Industrial Informatics, 2021, 17(6): 4159−4167 doi: 10.1109/TII.2020.3012157
[166]	Chen Yiqiang, Wang Jindong, Yu Chaohui, et al. FedHealth: A federated transfer learning framework for wearable healthcare[J]. IEEE Intelligent Systems, 2020, 35(4): 83−93 doi: 10.1109/MIS.2020.2988604
[167]	Lee S, Choi D H. Federated reinforcement learning for energy management of multiple smart homes with distributed energy resources[J]. IEEE Transactions on Industrial Informatics, 2022, 18(1): 488−497 doi: 10.1109/TII.2020.3035451
[168]	王帅,李丹. 分布式机器学习系统网络性能优化研究进展[J]. 计算机学报,2022,45(7):1384−1412 Wang Shuai, Li Dan. Research progress on network performance optimization of distributed machine learning system[J]. Chinese Journal of Computers, 2022, 45(7): 1384−1412 (in Chinese)
[169]	Martinez I, Hafid A S, Jarray A. Design, resource management, and evaluation of fog computing systems: A survey[J]. IEEE Internet of Things Journal, 2021, 8(4): 2494−2516 doi: 10.1109/JIOT.2020.3022699
[170]	Lu Yan, Shu Yuanchao, Tan Xu, et al. Collaborative learning between cloud and end devices: An empirical study on location prediction[C]// Proc of the 4th ACM/IEEE Symp on Edge Computing. New York: ACM, 2019: 139–151
[171]	Encora. Ahead-of-time compilation vs just-in-time compilation— Part 1 of understanding angular [EB/OL]. (2024-07-14)[2024-07-16]. https://www.encora.com/insights/ahead-of-time-compilation-vs-just-in-time-compilation-part-1
[172]	Yu Hao, Yang Sen, Zhu Shenghuo. Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning[C]// Proc of the 33rd AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2019: Article 698
[173]	Tramer F, Zhang Fan, Juels A, et al. Stealing machine learning models via prediction APIs[C]// Proc of the 25th USENIX Conf on Security Symp. Berkeley, CA: USENIX Association, 2016: 601–618
[174]	Yin Yupeng, Zhang Xianglong, Zhang Huanle, et al. Ginver: Generative model inversion attacks against collaborative inference[C]// Proc of the 2023 ACM Web Conf. New York: ACM, 2023: 2122–2131
[175]	Jin Xiao, Chen Pinyu, Hsu Chiayi, et al. CAFE: Catastrophic data leakage in vertical federated learning[C/OL] //Proc of the 35th Conf on Neural Information Processing Systems. Cambridge, MA: MIT, 2021[2024-07-30]. https://papers.nips.cc/paper/2021/hash/08040837089cdf46631a10aca5258e16-Abstract.html
[176]	Nguyen T D T, Lai P, Tran K, et al. Active membership inference attack under local differential privacy in federated learning[C]// Proc of the 26th Int Conf on Artificial Intelligence and Statistics. Piscataway, NJ: IEEE, 2023: 5714−5730
[177]	Li Jiacheng, Li Ninghui, Ribeiro B. Effective passive membership inference attacks in federated learning against overparameterized models[C/OL] //Proc of the 11th Int Conf on Learning Representation. Washington: ICLR, 2023[2024-07-30]. https://openreview.net/pdf?id=QsCSLPP55Ku
[178]	Melis L, Song C, Cristofaro E D, et al. Exploiting unintended feature leakage in collaborative learning[C]// Proc of the 40th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2019: 691−706
[179]	Wang Zhibo, Huang Yuting, Song Mengkai, et al. Poisoning-assisted property inference attack against federated learning[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20(4): 3328−3340 doi: 10.1109/TDSC.2022.3196646
[180]	Kourtellis N, Katevas K, Perino D. FLaaS: Federated learning as a service[C] //Proc of the 1st Workshop on Distributed Machine Learning. New York: ACM, 2020: 7−13
[181]	Truong J B, Maini P, Walls R J, et al. Data-free model extraction[C]// Proc of the 2021 IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2021: 4769−4778
[182]	Liu Sijia, Chen Pinyu, Kailkhura B, et al. A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications[J]. IEEE Signal Processing Magazine, 2020, 37(5): 43−54 doi: 10.1109/MSP.2020.3003837
[183]	Fraboni Y, Vidal R, Lorenzi M. Free-rider attacks on model aggregation in federated learning[C]// Proc of the 24th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2021: 1846−1854
[184]	Lin Jierui, Du Min, Liu Jian. Free-riders in federated learning: Attacks and defenses [J]. arXiv preprint, arXiv: 1911.12560, 2019
[185]	Abadi M, Chu Andy, Goodfellow I J, et al. Deep learning with differential privacy[C]// Proc of the 2016 ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2016: 308–318
[186]	Wang Baocang, Chen Yange, Jiang Hang, et al. PPeFL: Privacy-preserving edge federated learning with local differential privacy[J]. IEEE Internet of Things Journal, 2023, 10(17): 15488−15500 doi: 10.1109/JIOT.2023.3264259
[187]	He Zecheng, Zhang Tianwei, Lee R. B. Attacking and protecting data privacy in edge–cloud collaborative inference systems[J]. IEEE Internet of Things Journal, 2021, 8(12): 9706−9716 doi: 10.1109/JIOT.2020.3022358
[188]	Jiang Bin, Li Jianqiang, Wang Huihui, et al. Privacy-preserving federated learning for industrial edge computing via hybrid differential privacy and adaptive compression[J]. IEEE Transactions on Industrial Informatics, 2023, 19(2): 1136−1144 doi: 10.1109/TII.2021.3131175
[189]	Mironov I. Rényi differential privacy [C]//Proc of the 30th IEEE Computer Security Foundations Symp. Piscataway, NJ: IEEE 2017: 263−275
[190]	Ryu Jihyeon, Zheng Yifeng, Gao Yansong, et al. Can differential privacy practically protect collaborative deep learning inference for IoT?[J/OL]. Wireless Networks, 2022[2024-07-30]. https://link.springer.com/article/10.1007/s11276-022-03113-7
[191]	Cheon J H, Kim A, Kim M, et al. Homomorphic encryption for arithmetic of approximate numbers[C/OL]// Proc of the 2017 Int Conf on the Theory and Application of Cryptology and Information Security. Berlin: Springer, 2017 [2024-07-30]. https://link.springer.com/chapter/10.1007/978-3-319-70694-8_15
[192]	Zhang Chengliang, Li Suyi, Xia Junzhe, et al. BatchCrypt: Efficient homomorphic encryption for cross-silo federated learning[C]// Proc of the 2020 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2020: 493−506
[193]	Zhu Yilan, Wang Xinyao, Ju Lei, et al. FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference[C]// Proc of the 29th IEEE Int Symp on High-Performance Computer Architecture. Piscataway, NJ: IEEE, 2023: 896−907
[194]	Yang Zhaoxiong, Hu Shuihai, Chen Kai. FPGA-based hardware accelerator of homomorphic encryption for efficient federated learning [J]. arXiv preprint, arXiv: 2007.10560, 2020
[195]	Juvekar C, Vaikuntanathan V, Chandrakasan A P. Gazelle: A low latency framework for secure neural network inference[C]// Proc of the 27th USENIX Security Symp. Berkeley, CA: USENIX Association, 2018: 1651−1668
[196]	Li Yiran, Li Hongwei, Xu Guowen, et al. Practical privacy-preserving federated learning in vehicular fog computing[J]. IEEE Transactions on Vehicular Technology, 2022, 71(5): 4692−705 doi: 10.1109/TVT.2022.3150806
[197]	Jarin I, Eshete B. PRICURE: Privacy-preserving collaborative inference in a multi-party setting[C]// Proc of the 2021 ACM Workshop on Security and Privacy Analytics. New York: ACM, 2021: 25–35
[198]	Liu Yang, Kang Yan, Xing Chaoping, et al. A secure federated transfer learning framework[J]. IEEE Intelligent Systems, 2020, 35(4): 70−82 doi: 10.1109/MIS.2020.2988525
[199]	Tramèr F, Boneh D. Slalom: Fast, verifiable and private execution of neural networks in trusted hardware [C/OL]//Proc of the 7th Int Conf on Learning Representations. Washington: ICLR, 2019 [2024-07-30]. https://openreview.net/pdf?id=rJVorjCcKQ
[200]	Intel. Innovative technology for CPU based attestation and sealing [EB/OL]. (2013-08-14)[2024-07-17]. https://www.intel.com/content/www/us/en/developer/articles/technical/innovative-technology-for-cpu-based-attestation-and-sealing.html
[201]	Kalapaaking A P, Khalil I, Rahman M S, et al. Blockchain-based federated learning with secure aggregation in trusted execution environment for Internet-of-things[J]. IEEE Transactions on Industrial Informatics, 2023, 19(2): 1703−1714 doi: 10.1109/TII.2022.3170348
[202]	Kuznetsov E, Chen Yitao, Zhao Ming. SecureFL: Privacy preserving federated learning with SGX and trustzone [C]//Proc of the 2021 IEEE/ACM Symp on Edge Computing. Piscataway, NJ: IEEE, 2021: 55−67
[203]	Li Yuepeng, Zeng Deze, Gu Lin, et al. Efficient and secure deep learning inference in trusted processor enabled edge clouds[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(12): 4311−4325 doi: 10.1109/TPDS.2022.3187772
[204]	Law A, Leung C, Poddar R, et al. Secure collaborative training and inference for xgboost[C]// Proc of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice. New York: ACM, 2020: 21–26
[205]	Juuti M, Szyller S, Marchal S, et al. PRADA: Protecting against dnn model stealing attacks[C]// Proc of the 2019 IEEE European Symp on Security and Privacy. Piscataway, NJ: IEEE, 2019: 512−527
[206]	Lin Jierui, Du Min, Liu Jian. Free-riders in federated learning: Attacks and defenses [J]. arXiv prerprint, arXiv: 1911.12560, 2019
[207]	Xu Xinyi, Lyu Lingjuan. A reputation mechanism is all you need: Collaborative fairness and adversarial robustness in federated learning [C/OL]// Proc of the 2021 Int Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML. New York: ACM, 2020[2024-08-16]. https://www.semanticscholar.org/reader/329734fdbb35faab89e14eb9b105a665d7a5f079
[208]	Zhang Jiliang, Peng Shuang, Gao Yansong, et al. APMSA: Adversarial perturbation against model stealing attacks[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 1667−1679 doi: 10.1109/TIFS.2023.3246766
[209]	Tan Jingxuan, Zhong Nan, Qian Zhenxing, et al. Deep neural network watermarking against model extraction attack[C]// Proc of the 31st ACM Int Conf on Multimedia. New York: ACM, 2023: 1588−1597
[210]	Zhang Haitian, Hua Guang, Wang Xinya, et al. Categorical inference poisoning: Verifiable defense against black-box DNN model stealing without constraining surrogate data and query times[J]. IEEE Transactions on Information Forensics and Security, 2023, 18: 1473−1486 doi: 10.1109/TIFS.2023.3244107
[211]	Dai Hongning, Zheng Zibin, Zhang Yan. Blockchain for Internet of things: A survey[J]. IEEE Internet of Things Journal, 2019, 6(5): 8076−8094 doi: 10.1109/JIOT.2019.2920987
[212]	Biggio B, Corona I, Maiorca D, et al. Evasion attacks against machine learning at test time [J]. arXiv preprint, arXiv: 1708.06131, 2013
[213]	Tang Pengfei, Wang Wenjie, Lou Jian, et al. Generating adversarial examples with distance constrained adversarial imitation networks[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19(6): 4145−4155 doi: 10.1109/TDSC.2021.3123586
[214]	Bagdasaryan E, Veit A, Hua Yiqing, et al. How to backdoor federated learning[C]// Proc of the 23rd Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2020: 2938−2948
[215]	Zhang Jiale, Chen Bing, Cheng Xiang, et al. PoisonGAN: Generative poisoning attacks against federated learning in edge computing systems[J]. IEEE Internet of Things Journal, 2021, 8(5): 3310−3322 doi: 10.1109/JIOT.2020.3023126
[216]	Qammar A, Ding Jianguo, Ning Huansheng. Federated learning attack surface: Taxonomy, cyber defences, challenges, and future directions[J]. Artificial Intelligence Review, 2022, 55(5): 3569−3606 doi: 10.1007/s10462-021-10098-w
[217]	Kim T, Singh S, Madaan N, et al. Characterizing internal evasion attacks in federated learning[C]// Proc of the 26th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2023: 907−921
[218]	Bao Hongyan, Han Yufei, Zhou Yujun, et al. Towards effcient and domain-agnostic evasion attack with high-dimensional categorical inputs[C]// Proc of the 37th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2023: 6753−6761
[219]	Demontis A, Melis M, Pintor M, et al. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks[C]// Proc of the 28th USENIX Conf on Security Symp. Berkeley, CA: USENIX Association, 2019: 321–338
[220]	Blanchard P, Mahdi E, Guerraoui R, et al. Machine learning with adversaries: Byzantine tolerant gradient descent[C]// Proc of the 31st Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2017: 118–128
[221]	Lugan S, Desbordes P, Brion E, et al. Secure architectures implementing trusted coalitions for blockchained distributed learning[J]. IEEE Access, 2019, 7: 181789−181799 doi: 10.1109/ACCESS.2019.2959220
[222]	Bao Hongyan, Han Yufei, Zhou Yujun, et al. Towards understanding the robustness against evasion attack on categorical data[C/OL]// Proc of the 10th Int Conf on Learning Representations. Washington: ICLR, 2022[2024-07-30]. https://openreview.net/pdf?id=BmJV7kyAmg
[223]	Cao Xiaoyu, Gong N Z Q. Mitigating evasion attacks to deep neural networks via region-based classification[C/OL]// Proc of the 33rd Annual Computer Security Applications Conf. New York: ACM, 2017: 278−287
[224]	Zizzo G, Rawat A, Sinn M, et al. FAT: Federated adversarial training [J]. arXiv preprint, arXiv: 2012.01791, 2020
[225]	Kumari K, Rieger P, Fereidooni H, et al. BayBFed: Bayesian backdoor defense for federated learning[C]// Proc of the 2023 IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2023: 737−754
[226]	Cao Xiaoyu, Jia Jinyuan, Zhang Zaixi, et al. FedRecover: Recovering from poisoning attacks in federated learning using historical information[C]// Proc of the 44th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2023: 1366−1383
[227]	Wen Jing, Hui L C K, Yiu S M, et al. DCN: Detector-corrector network against evasion attacks on deep neural networks[C]// Proc of the 48th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks Workshops. Piscataway, NJ: IEEE, 2018: 215−221
[228]	Debicha I, Bauwens R, Debatty T, et al. TAD: Transfer learning-based multi-adversarial detection of evasion attacks against network intrusion detection systems[J]. Future Generation Computer Systems, 2023, 138: 185−197 doi: 10.1016/j.future.2022.08.011
[229]	Lecuyer M, Atlidakis V, Geambasu R, et al. Certified robustness to adversarial examples with differential privacy[C]// Proc of the 40th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2019: 656−672
[230]	Byrd D, Polychroniadou A. Differentially private secure multi-party computation for federated learning in financial applications[C]// Proc of the 1st ACM Int Conf on AI in Finance. New York: ACM, 2021: Article 16
[231]	Rathee D, Rathee M, Kumar N, et al. Cryptflow2: Practical 2-party secure inference[C]// Proc of the 2020 ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2020: 325–342
[232]	He Xuanli, Lyu L, Xu Qiongkai, et al. Model extraction and adversarial transferability, your bert is vulnerable![C]// Proc of the 2021 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2021: 2006–2012
[233]	Keskar N S, Mccann B, Xiong Caiming. The thieves on sesame street are polyglots-extracting multilingual models from monolingual APIs[C]// Proc of the 2020 Conf on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 6203–6207
[234]	Wu Huaming, Wolter K. Stochastic analysis of delayed mobile offloading in heterogeneous networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(2): 461−474 doi: 10.1109/TMC.2017.2711014
[235]	Tu Xuezhen, Zhu Kun, Luong N C, et al. Incentive mechanisms for federated learning: From economic and game theoretic perspective[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(3): 1566−1593 doi: 10.1109/TCCN.2022.3177522
[236]	Liu Shaoshan, Liu Liangkai, Tang Jie, et al. Edge computing for autonomous driving: Opportunities and challenges[J]. Proceedings of the IEEE, 2019, 107(8): 1697−1716 doi: 10.1109/JPROC.2019.2915983
[237]	Li Yuanchun, Wen Hao, Wang Weijun, et al. Personal LLM agents: Insights and survey about the capability, efficiency and security [J]. arXiv preprint, arXiv: 2401.05459, 2024
[238]	Yu Sixing, Muñoz J P, Jannesari A. Federated foundation models: Privacy-preserving and collaborative learning for large models [J]. arXiv preprint, arXiv: 2305.11414, 2023