面向边缘智能的协同推理综述

王睿; 齐建鹏; 陈亮; 杨龙

doi:10.7544/issn1000-1239.202110867

面向边缘智能的协同推理综述

王睿^{1, 2,},
齐建鹏¹,
陈亮¹,
杨龙¹

1.
北京科技大学计算机与通信工程学院　北京　100083
2.
北京科技大学顺德研究生院　广东佛山　528300

基金项目: 国家自然科学基金项目（62173158,72004147）

详细信息

作者简介:
王睿: 1975年生.博士，教授. CCF高级会员.主要研究方向为物联网、边缘智能和智慧医疗

齐建鹏: 1992年生.博士研究生.CCF学生会员.主要研究方向为边缘智能和资源管理

陈亮: 1997年生.硕士研究生.主要研究方向为边缘智能与可靠性

杨龙: 1999年生.硕士研究生.主要研究方向为轻量级模型与方法

中图分类号: TP391
计量
- 文章访问数: 894
- HTML全文浏览量: 99
- PDF下载量: 333
出版历程
- 收稿日期: 2021-08-25
- 修回日期: 2022-04-14
- 网络出版日期: 2023-02-10
- 发布日期: 2021-08-25
- 刊出日期: 2023-01-31

Survey of Collaborative Inference for Edge Intelligence

1.
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083
2.
Shunde Graduate School of University of Science and Technology Beijing, Foshan, Guangdong 528300

Funds: This work was supported by the National Natural Science Foundation of China (62173158,72004147).

摘要

摘要:
近年来，信息技术的不断变革伴随数据量的急剧爆发，使主流的云计算解决方案面临实时性差、带宽受限、高能耗、维护费用高、隐私安全等问题. 边缘智能的出现与快速发展有效缓解了此类问题，它将用户需求处理下沉到边缘，避免了海量数据在网络中的流动，得到越来越多的关注. 由于边缘计算中资源性能普遍较低，通过资源实现协同推理正成为热点.通过对边缘智能发展的趋势分析，得出边缘协同推理目前仍处于增长期，还未进入稳定发展期. 因此，在对边缘协同推理进行充分调研的基础上，将边缘协同推理划分为智能化方法与协同推理架构2个部分，分别对其中涉及到的关键技术进行纵向归纳整理，并从动态场景角度出发，对每种关键技术进行深入分析，对不同关键技术进行横向比较以及适用场景分析.最后对动态场景下的边缘协同推理给出值得研究的若干发展方向.
- 边缘计算 /
- 边缘智能 /
- 机器学习 /
- 边缘协同推理 /
- 动态场景
Abstract:
At present, the continuous change of information technology along with the dramatic explosion of data quantity makes the cloud computing solutions face many problems such as high latency, limited bandwidth, high carbon footprint, high maintenance cost, and privacy concerns. In recent years, the emergence and rapid development of edge computing has effectively alleviated such dilemmas, sinking user demand processing to the edge and avoiding the flow of massive data in the network. As a typical scenario of edge computing, edge intelligence is gaining increasing attention, in which one of the most important stages is the inference phase. Due to the general low performance of resources in edge computing, collaborative inference through resources is becoming a hot topic. By analyzing the trends of edge intelligence development, we conclude that collaborative inference at the edge is still in the increasing phase and has not yet entered a stable phase. We divide edge-edge collaborative inference into two parts: Intelligent methods and collaborative inference architecture, based on a thorough investigation of edge collaborative inference. The involved key technologies are summarized vertically and organized from the perspective of dynamic scenarios. Each key technology is analyzed in more detail, and the different key technologies are compared horizontally and analyzed on the application scenarios. Finally, we propose several directions that deserve further studying in collaborative edge inference in dynamic scenarios.
- edge computing /
- edge intelligence /
- machine learning /
- edge collaborative inference /
- dynamic scenario

HTML全文

视觉目标跟踪是计算机视觉的重要研究方向^[1]. 其研究任务是在视频序列首帧中给定任意目标，并在后续视频序列中持续预测目标位置. 目标跟踪被广泛应用于无人驾驶、智能视频监控、人机交互等领域^[2]. 如何设计简单、高效的通用视觉目标跟踪方法是一个亟需解决的难题. 尤其在真实复杂场景中，目标表观受光照影响、尺寸变化、严重遮挡等挑战，会产生持续的剧烈变化，从而影响跟踪结果.

近些年，基于卷积神经网络（convolutional neural network, CNN）的目标跟踪获得广泛关注. 然而受限于感受野规模，CNN更多用于处理目标在时间域或空间域的局部特征，未能有效捕获目标特征之间的长期依赖关系^[3]. 当前基于CNN的主流跟踪框架主要包括：基于孪生网络（siamese network）^[4-7]的目标跟踪和基于在线学习判别式模型^[8-10]的目标跟踪. 这些方法在获取图像局部特征时表现优异，但在复杂场景中，如目标被频繁遮挡或出现剧烈形变时，则不能很好地建模特征的全局上下文关系.

此外，基于Transformer的跟踪方案通过引入全局自注意力（self-attention）机制获取特征间长期依赖关系^[11-13]. 在此类方案中，多数跟踪方法采用CNN作为主干网络对图像进行特征提取，随后利用Transformer设计编码器和解码器. 文献[-]专注简化跟踪步骤，将Transformer作为特征提取器并直接输出预测位置. 然而，此类跟踪方案中起核心作用的自注意力机制，由于其计算复杂度为 $O\left({N}^{2}\right)$ ，使得其随图像大小增加导致计算量陡增，并直接影响目标跟踪效率.

为了降低计算量，一些研究采用视觉多层感知器（multi-layer perceptron，MLP）来构建主干网络^[16-17]. 这些研究使用MLP层代替Transformer中的自注意力层，并在时域进行令牌（token）间信息交互，所谓令牌即目标和搜索区域对应的样本被切分成若干个不重叠的图像块. 令牌间的信息交互进一步简化时域信息的融合操作. MLP的引入降低了计算复杂度，但是在训练和测试中随着令牌数量增加，会显著加大MLP计算量，同样会影响目标跟踪效率.

受全局滤波网络设计启发^[3]，本文提出利用快速傅里叶变换（fast Fourier transform, FFT）对令牌进行高效融合，以降低视觉MLP模型在令牌数量增加时产生的计算开销. 首先，利用FFT将令牌时域特征转变为频域特征. 随后，在频域空间中捕获目标当前信息及其历史信息与搜索区域信息间的长程依赖关系. 最后，利用快速傅里叶逆变换（inverse FFT, IFFT）将频域特征转换回时域特征. 上述FFT，IFFT运算步骤使得所提跟踪方法，能够以较低的对数复杂度在频域空间快速学习目标在时空维度上的交互关系. 此外，为了更好地自适应目标在跟踪过程中的表观变化，提出一种基于质量评估的目标模板记忆存储机制. 该机制根据当前跟踪状态动态更新记忆存储器中稳定的历史目标信息，用于学习适应目标变化的外观模型，有助于在搜索区域内准确匹配目标.

本文的主要贡献有3点：

1）提出一种快速获取令牌间长程依赖关系的跟踪算法. 特征提取与融合以端到端的形式进行学习，同时在频域空间以更小的计算复杂度建模目标令牌与搜索区域令牌间的交互关系.

2）提出一种基于质量评估的目标模板记忆存储机制，动态自适应地捕捉目标在视频序列中的稳定变化过程，提供高质量的长期历史目标信息.

3）所提跟踪方法在3个公共跟踪数据集LaSOT^[18]，OTB100^[19]，UAV123^[20]上获得优秀评价.

1. 相关工作

1.1 基于CNN的目标跟踪

目标跟踪框架通常可以被划分为3部分：1）提取图像特征的主干网络；2）目标与搜索区域特征融合模块；3）生成预测位置模块. 多数跟踪方法^[5-9]将CNN作为主干网络. 其中，基于孪生网络的跟踪框架以端到端方式进行训练.SiamFC^[7]采用全卷积孪生网络提取目标特征，跟踪过程中不进行目标模板更新.DSiam^[21]基于孪生网络以正则化线性回归模型动态更新目标模板.SiamRPN^[6]利用孪生网络提取目标与搜索区域特征，同时结合目标检测研究中的区域推荐网络对目标位置进行精准定位.SiamRPN++^[5]在训练过程中辅以位置均衡策略缓解CNN在训练过程中存在的位置偏见问题. 此外，一些基于在线学习的判别式目标跟踪模型也取得优异性能.DiMP^[8]采用端到端网络模型离线学习目标与背景间的差异，同时在线更新目标模板.PrDiMP^[9]基于DiMP^[8]将概率回归用于端到端训练，在测试阶段对搜索区域生成关于目标状态的条件概率密度来捕获目标. 基于CNN的目标跟踪在训练时容易造成归纳偏置.

1.2 基于Transformer的目标跟踪

当前Transformer网络被广泛用于各项视觉任务中，如目标分类及检测^[22-24]. Transformer中的自注意力机制将每个输入元素与其他元素进行相关性计算. 在视觉跟踪研究中，TrDiMP^[13]使用Transformer增强目标上下文信息，在编码器中通过自注意力机制增强目标模板特征，利用解码器融合上下文模板进行目标定位. TransT^[12]提出一种基于多头注意力机制的特征融合网络，融合后特征分别输入目标分类器及边界回归器. Stark^[11]则利用ResNet^[25]作为主干网络提取目标特征，以Transformer编码器和解码器进行端到端训练. SwinTrack^[14]借鉴Swin Transformer^[22]，采用完全基于注意力机制的Transformer进行特征提取及融合. ToMP^[26]同样是一种完全基于Transformer的跟踪算法，使用一个并行的2阶段跟踪器来回归目标边界. Mixformer^[15]提出一种同时混合注意力模块用于特征提取及融合. 基于Transformer的目标跟踪方法虽然取得出色性能，但是随着搜索区域增大，其计算复杂度也将呈2次方增加，从而影响目标跟踪效率.

1.3 基于MLP的混合令牌相关工作

MLP-mixer^[16]采用MLP代替Transformer中的自注意力机制进行令牌混合.ResMLP^[17]基于MLP-mixer，利用仿射变换代替归一化进行加速处理.gMLP^[27]使用空间门控单元在空间维度上给令牌重新加权. 上述MLP混合令牌的研究同样存在计算量增加问题，即随着输入令牌数量增多其时间复杂度会以2次方增加，并且MLP通常存在固定空间权重很难扩展到高分辨率图像的情形.

2. 方法介绍

展示了基于FFT的目标与搜索区域间令牌高效混合的目标跟踪框架，该框架为端到端方式. 首先，初始目标模板大小设置为 ${{H}_{\mathrm{t}}\times W}_{\mathrm{t}}\times 3$ ，记忆存储器中存储的历史目标模板的帧数设置为 $T$ ，搜索目标区域的大小为 ${H}_{\mathrm{s}}\times {W}_{\mathrm{s}}\times 3$ . 之后，将记忆存储器内所有目标样本和搜索区域对应图像样本，切分成不重叠的、规格为 $\tau \times \tau \times 3$ 的图像块，这些图像块称为令牌. 将这些令牌拼接起来，组成1维令牌序列. 该序列包含目标信息与搜索区域信息. 下面分2步进行模型的离线训练.

图 1 本文所提跟踪算法框架

Figure 1. The tracking algorithm framework proposed in our paper

下载: 全尺寸图片幻灯片

1）针对预测目标框分支进行训练. 为了高效学习目标与搜索区域令牌间的长程依赖关系，采用3阶段网络设计进行令牌间混合. 在阶段1中，使用线性嵌入层将原始令牌投影为维度为 $C$ 的令牌特征，再将这些令牌特征输入至线性嵌入层和包含2个FFT的令牌混合网络层. 在阶段2中，为了扩大模型的感受野，通过线性合并层来减少令牌数量，并将其输出特征维度设置为 $2C$ ，这一过程由线性合并层和3个FFT的令牌混合网络层组成. 在阶段3中，继续进行线性合并，同时经过6个FFT令牌混合网络层，此时输出的特征维度设置为 $4C$ . 将在频域空间中获得的融合令牌信息进行IFFT运算，将频域特征重新转换为时域特征，并输入由3个Conv-BN-ReLU网络组成的预测头网络估计目标位置.

2）对跟踪质量评估分支进行离线训练，受Stark^[11]启发，跟踪质量评估分支由一个3层MLP网络组成，用于评价当前跟踪质量，以决定是否将当前跟踪结果更新到记忆存储器中.

下面将详细介绍基于FFT的令牌混合网络和基于跟踪质量评估的目标模板动态记忆存储机制.

2.1 基于FFT的令牌混合网络

如所示，提出的基于FFT令牌混合网络层将特征提取与融合进行集成. 具体地，先利用图像分块操作将原始的2维目标模板和搜索区域样本转化为 $N$ 个不重叠的 $\tau \times \tau \times 3$ 大小的令牌. 经过裁剪等预处理后，得到一组特征矩阵 ${{\boldsymbol P}}=\left({\boldsymbol{p}}_{0},{\boldsymbol{p}}_{1},…,{\boldsymbol{p}}_{N-1}\right)， {\boldsymbol{p}_{i}}\in {\mathbb{R}}^{3{\tau }^{2}}，i\in [0,N-1]$ . 之后，将 ${{\boldsymbol P}}$ 输入至FFT令牌混合网络，在频域空间快速获得目标特征的多尺度交互及搜索区域与目标之间的有效交互. 其中，FFT令牌融合网络层的结构如所示，对于第 $i$ 个令牌先将其映射成 $C$ 维向量：

图 2 FFT令牌融合网络结构图

Figure 2. Structure diagram of FFT tokens fusion network

下载: 全尺寸图片幻灯片

${\boldsymbol{x}}_{i}={\omega }_{0}{\boldsymbol{p}_{i}}+{\boldsymbol{b}_{0}},i\in \left[0,N-1\right] \text{，}$

(1)

其中 ${\boldsymbol{\omega}}_{0}\in {\mathbb{R}}^{3{\tau }^{2}\times C}$ 为每个令牌首层可学习权重， ${\boldsymbol{b}_{0}}$ 为首层权重位移参数向量， $N$ 为输入令牌个数.

FFT令牌融合网络层的输入特征为 $\boldsymbol{X}={(\boldsymbol{x}}_{0},{\boldsymbol{x}}_{1},…, {\boldsymbol{x}}_{N-1})\in {\mathbb{R}}^{C\times N}$ ，其中 $C$ 为输出通道数. 然后采用式（2）将输入的时域特征转换为频域特征 $\boldsymbol{X}'$ ：

${\boldsymbol{X}}'=F\left(\boldsymbol{X}\right)\in {\mathbb{C}}^{H\times W\times N} \text{，}$

(2)

其中，FFT函数为 $F\left(\cdot\right)$ 用于获得输入特征的频域表达， $W$ 为输入图像的宽， $H$ 为输入图像的高.

FFT令牌混合网络层利用可学习的滤波器 $\boldsymbol{K}\in {\mathbb{C}}^{H\times W\times N}$ 学习 ${\boldsymbol{X}}^{'}$ 的频域特征 ${\boldsymbol{X}}^{''}$ ：

${\boldsymbol{X}}^{''}=\boldsymbol{K}\odot{\boldsymbol{X}}^{'} \text{，}$

(3)

其中 $\odot$ 为 $\boldsymbol{K}$ 中每一个元素与 ${\boldsymbol{X}}^{'}$ 对应位置元素间相乘^[3].

最后，根据式（4）将频域特征 ${\boldsymbol{X}}^{''}$ 转换为时域特征 ${\boldsymbol{X}}^{{*}}$ ，并更新令牌进入下一层特征融合模块.

${\boldsymbol{X}}^{{*}}={F}^{-1}\left({\boldsymbol{X}}^{''}\right) ,$

(4)

其中 ${F}^{-1}\left(\cdot\right)$ 为IFFT，用于将频域特征转化为时域特征.

参照Stark^[11]，本文采用一个3层Conv-BN-ReLU预测头网络来估计目标位置. 具体地，估计过程被建模为预测边界框的左上角和右下角坐标的概率值图，并回归概率值图分布获得预测目标的最终坐标. 不同于Stark的预测头网络高度依赖编码器和解码器，本文所提预测头网络由3个简单的全卷积网络组成. 离线训练预测头位置分支的损失 ${L}_{\mathrm{l}\mathrm{o}\mathrm{c}}$ 由 ${L}_{1}$ 损失和 ${L}_{\mathrm{g}\mathrm{i}\mathrm{o}\mathrm{u}}$ 损失组成，具体定义为：

${L}_{\mathrm{l}\mathrm{o}\mathrm{c}}=\alpha {L}_{1}\left({B}_{i},{B}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}\right)+\beta {L}_{\mathrm{g}\mathrm{i}\mathrm{o}\mathrm{u}}\left({B}_{i},{B}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}\right) \text{，}$

(5)

其中 $\alpha$ 为 ${L}_{1}$ 损失的权重系数，设置 $\alpha=5$ ； $\beta$ 为 ${L}_{\mathrm{g}\mathrm{i}\mathrm{o}\mathrm{u}}$ 的权重系数，设置 $\beta=2$ . ${B}_{i}$ 为第 $i$ 帧搜索区域的真实标签， ${B}_{\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{d}}$ 为预测头网络输入预测的目标位置.

2.2 基于跟踪质量评估的目标模板记忆存储机制

为了提升跟踪速度的同时规避跟踪过程中引入的累计误差，多数跟踪算法仅采用第1帧目标模板进行匹配. 然而在跟踪过程中目标表观通常会出现剧烈变化，此时固定目标模板的跟踪方法容易产生漂移. 部分算法采用跟踪响应图的统计特性来预测当前跟踪质量，如使用峰旁比^[28]、平均峰值相关能量^[29]等. 然而基于上述统计数值判断跟踪质量的做法在经历长期不稳定的跟踪后，容易导致不准确的评分结果.

如果跟踪算法可以及时预先获取当前跟踪质量，并将高质量跟踪结果放入记忆存储器中，则能够有效捕获目标在时序上的稳定表观信息变化，为目标与搜索区域的令牌混合提供有效依据.

因此，在预测头网络中添加了一个用于预测当前跟踪质量的分支. 该分支的输入为令牌融合网络层最终输出的令牌时域特征，输出为2个经过softmax函数处理过后的数值 ${S}_{i0}$ 与 ${S}_{i1}$ . 其中 ${S}_{i0}$ 代表第 $i$ 帧输出的预测目标位置不是目标， ${S}_{i1}$ 表示当前预测结果是目标.当 ${S}_{i1}{ > S}_{i0}$ 时，表示当前跟踪质量良好，可以将当前跟踪结果更新到记忆存储器中，此时设置 ${{\epsilon}_{i}}=1$ ；当 ${S}_{i1}\le {S}_{i0}$ 时，表示当前跟踪质量较弱，不适宜将跟踪结果更新至记忆存储器，同时设置 ${\epsilon}_{i}=0$ . ${\epsilon}_{i}$ 表示预测当前跟踪质量评估结果. 离线训练跟踪质量评价分支使用二值交叉熵损失评估，具体定义为：

${L}_{\mathrm{C}\mathrm{E}}={l}_{i}\mathrm{l}\mathrm{b}\left({\epsilon}_{i}\right)+\left(1-{l}_{i}\right)\mathrm{l}\mathrm{b}\left(1-{\epsilon}_{i}\right),$

(6)

其中 ${l}_{i}$ 为第 $i$ 帧样本真实的标签，当 ${l}_{i}=1$ 时表示当前搜索区域包含真实目标，当 ${l}_{i}=0$ 时表示当前搜索区域不包含搜索目标.

记忆存储器 $M$ 定义为长度 $T$ 的队列，更新间隔设为 ${T}_{\mathrm{I}\mathrm{N}\mathrm{R}}$ . 对应的更新策略如算法1所示，当第 $i$ 帧的质量评估为跟踪状态良好时，即 ${\epsilon}_{i}=1$ 且符合提取间隔，则将当前跟踪结果加入记忆存储队列 $M$ . 若记忆存储队列 $M$ 的长度超过 $T$ ，则选择删除 $M$ 队列中首个元素 ${M}_{0}$ . 当跟踪失败或者跟踪质量较低时，所提基于跟踪质量评估的目标记忆存储机制，能够有效缓解目标模板产生误差带来的消极影响.

该机制的可视化展示如所示. 第1帧给定初始目标，并将其存入记忆存储器中. 记忆存储器的长度 $T$ 设置为5，根据跟踪质量评价结果，动态地将可靠的目标模板存入 $M$ 中. 第200帧时，目标被完全遮挡，此时质量评估较差，不进行更新存储操作. 至此， $M$ 中的目标模板分别来自第90帧、第100帧、第110帧、第120帧、第130帧的跟踪结果. 在第260帧时目标重新出现，此时质量评估良好，所以当前 $M$ 存储的目标模板调整为第120帧、第130帧、第240帧、第250帧、第260帧的跟踪结果.

图 3 基于质量评估的模板记忆存储算法的可视化

Figure 3. Visualization of template memory storage algorithm based on quality assessment

下载: 全尺寸图片幻灯片

算法1. 基于跟踪质量评估的目标模板记忆存储.

输入： ${I}_{0}$ 为第1帧的目标模板区域，

${I}_{i}$ 为第 $i$ 帧跟踪结果区域，

${\epsilon}_{i}$ 为第 $i$ 帧跟踪质量评估结果，

$M$ 为记忆存储器队列，

$T$ 为记忆存储器长度，

${L}_{\mathrm{V}\mathrm{I}\mathrm{D}}$ 为视频序列长度，

${T}_{\mathrm{I}\mathrm{N}\mathrm{R}}$ 为更新间隔；

输出： $M$ 为更新后的记忆存储器.

① $M=\varnothing$ ；

② while len（ $M$ ）< $T$

③ $M\leftarrow M\cup \left\{{I}_{0}\right\}；$

④ for $i=1，2，\cdots {，L}_{\mathrm{V}\mathrm{I}\mathrm{D}}$

⑤ 　　if $({\epsilon}_{i}$ ==1 $)$ and $($ mod $(i,{T}_{\mathrm{I}\mathrm{N}\mathrm{R}})$ ==0）

⑥　　　 $M\leftarrow M\cup \left\{{I}_{i}\right\}；$

⑦　　 end if

⑧ 　　if len $\left(M\right)$ < $T$

⑨　　　 $M\leftarrow M\mathrm{r}\mathrm{e}\mathrm{m}\mathrm{o}\mathrm{v}\mathrm{e}{M}_{0}；$

⑩　　 end if

⑪ end for

3. 实验结果分析

3.1 模型训练设置

目标模板大小为 ${{H}_{\mathrm{t}}\times W}_{\mathrm{t}}\times 3$ ，搜索区域的大小为 ${{H}_{\mathrm{s}}\times W}_{\mathrm{s}}\times 3$ . 设置 ${H}_{\mathrm{t}}$ ， ${W}_{\mathrm{t}}$ 两者值均为128； ${H}_{\mathrm{s}}$ ， ${W}_{\mathrm{s}}$ 两者值均为384.记忆存储器长度 $T=5$ . 记忆器更新模板的间隔 ${T}_{\mathrm{I}\mathrm{N}\mathrm{R}}=10$ . 图像分块操作中块大小 $\tau =4$ . 训练数据集为LaSOT^[18]，GOT-10k^[30]，TrackingNet^[31].

考虑到定位和分类的联合学习可能导致2个任务存在次优解^[11]. 因此，借鉴Stark^[11]和Mixformer^[15]的训练方式，分2步训练特征融合模型. 首先，进行300批次的预测目标位置分支训练，采用Adam^[32]损失优化器将学习率设置为 $1\mathrm{E}-4$ ；其次，进行100批次的预测当前跟踪质量的分支训练，学习率设置为 $1\mathrm{E}-5$ . 软件环境为Ubuntu20.04，Python3.6，Torch1.10.3，Cuda11.3.硬件环境为NVIDIA RTX3090 24 GB.

3.2 定量分析

在LaSOT^[18]，OTB100^[19]，UAV123^[20]数据集上验证本文方法的有效性. 评价指标为成功率（success ratio）和精度图（precision plot），其中成功率使用成功率曲线下面积（area under curve, AUC）作为排序依据.

LaSOT^[18]数据集包含1400个视频序列，共计70类目标. 其中1120个视频用于训练，280个视频用于测试. 视频序列平均长度2400帧. 数据集包含视野外等14种挑战.图4显示本文算法与TrDiMP^[13]，TransT^[12]，Alpha-Refine^[33]，SiamR-CNN^[34]，PrDiMP^[9]，DiMP^[8]，SiamGAT^[35]，SiamBAN^[36] 8种优秀算法比较结果. 结果表明本文算法在成功率和精度图中均处于领先水平. 精度图方面比TransT高3.3%，成功率比Alpha-Refine高0.8%.图5展示本文算法与5种先进算法在不同挑战下的实验结果，可以看出本文算法在多数挑战中均表现优异.

图 4 本文算法与其他最先进算法在LaSOT数据集上的成功率指标与精度图比较

Figure 4. Comparison of success ratio and precision plot in our algorithm and other state-of-the-art algorithms on LaSOT dataset

下载: 全尺寸图片幻灯片

图 5 LaSOT数据集上不同挑战的成功率指标和精度图指标得分比较

Figure 5. Score comparison of the indictors in success ratio and precision plot for different challenges on LaSOT dataset

下载: 全尺寸图片幻灯片

OTB100^[19]数据集包含100个视频序列，涉及快速运动等11种挑战.图6展示本文算法与TransT^[12]，SiamRPN++^[5]，SiamBAN^[36]，PrDiMP^[9]，DiMP^[8]，ECO^[37]，MDNet^[38]，ATOM^[10]的比较结果. 本文方法取得最高的成功率值和精度图值，分别比SiamRPN++ 提升0.2%和0.5%.

图 6 本文算法与其他最先进算法在OTB100数据集上的成功率与精度图比较

Figure 6. Comparison of the success ratio and precision plot in our algorithm and other state-of-the-art algorithms on OTB100 dataset

下载: 全尺寸图片幻灯片

UAV123^[20]数据集由123个无人机低空拍摄的视频序列构成. 小目标和频繁遮挡是该数据集的独特挑战. 表1显示本文算法与TrDiMP^[13]，TransT^[12]， SiamR-CNN^[34]，SiamGAT^[35]，SiamBAN^[36]，PrDiMP^[9]，DiMP^[8]，SiamRPN++^[5]的比较结果. 本文算法在成功率和精度图评价指标上均排名第一.

表 1 本文算法与其他先进算法在UAV123数据集上的比较

Table 1. Comparison of Our Algorithm and Other State-of-the-art Algorithms on UAV123 Dataset

算法	AUC	精度图
本文算法	0.702	0.877
TransT	0.691	0.864
PrDiMP	0.690	0.867
TrDiMP	0.680	0.852
DiMP	0.662	0.838
SiamBAN	0.650	0.820
SiamR-CNN	0.649	0.834
SiamGAT	0.646	0.843
SiamRPN++	0.610	0.803

下载: 导出CSV

| 显示表格

3.3 定性分析

本节用可视化展示本文算法与6种优秀算法在旋转、快速移动、尺寸变换及遮挡等挑战下的表现.

图7展示LaSOT^[18]数据集中bird-17视频序列的跟踪结果. 该视频序列具备快速移动、视野外等挑战. 目标在148~156帧快速向左移动至视野外，导致Alpha-Refine^[33]和TrDiMP^[13]发生跟踪漂移. 在第184帧中目标再次回归视野内，只有本文算法可以准确跟踪目标. 由于目标同时发生快速移动、运动模糊、旋转等挑战，其他算法均跟踪失败. 而本文算法拥有记忆存储器中的稳定目标模板，可以增强跟踪器对目标表观的自适应能力，并且在搜索目标时可快速计算目标模板和搜索区域之间的匹配关系，因此可以高效、稳健地跟踪目标.

图 7 LaSOT数据集中bird-17视频序列中的跟踪结果

Figure 7. Tracking results of bird-17 video sequence in LaSOT dataset

下载: 全尺寸图片幻灯片

图8展示LaSOT^[18]数据集中bicycle-18视频序列的跟踪结果. 在此视频中目标受遮挡、旋转等挑战影响. 第344~400帧目标被岩石遮挡，导致TransT^[12]和SiamGAT^[35]丢失目标. 第437~517帧目标发生剧烈旋转，SiamGAT，TransT，PrDiMP^[9]均无法快速应对剧烈旋转引起的外观突变而发生漂移. 本文算法则依托令牌混合方案快速对目标与搜索区域特征进行交互，有效地获取更加稳健的时空特征，最终成功跟踪目标.

图 8 LaSOT数据集中bicycle-18视频序列中的跟踪结果

Figure 8. Tracking results of bicycle-18 video sequence in LaSOT dataset

下载: 全尺寸图片幻灯片

3.4 消融实验

本节验证本文算法中基于FFT的令牌混合网络和基于跟踪质量评估的目标模板动态记忆存储机制的有效性. 表2展示不同变体在LaSOT^[18]测试集上的成功率和精度图得分.

表 2 在LaSOT数据集上本文算法的消融实验结果

Table 2. Results of the Ablation Experiments of Our Proposed algorithm on LaSOT Dataset

模型变体	基于CNN 的融合	基于FFT 的融合	记忆存储机制	成功率	精度图	平均跟踪速度/fps
变体1	√			0.648	0.684	23
变体2		√		0.661	0.709	41
变体3		√	√	0.667	0.723	34
注：√表示采用的方法.

下载: 导出CSV

| 显示表格

首先，探讨基于FFT的令牌混合网络的有效性.表2中变体1采用基于CNN融合目标与搜索区域令牌的方法，并且仅利用第1帧初始目标区域作为目标模板. 变体2采用FFT融合方法，同样仅采用第1帧初始目标区域作为目标模板进行匹配. 结果显示，基于FFT的融合方法比基于CNN的融合方法的成功率和精度图分别高1.3%和2.5%. 基于传统CNN的融合方式在训练时只能学习特征间的局部依赖关系，无法获取全局长程依赖，且利用CNN训练模型存在较大的归纳偏置. 为了更加充分融合目标与搜索区域间的信息同时建立两者间的长程依赖关系，本文提出利用FFT进行令牌间的高效融合. 可以观察到在平均跟踪速度上变体2比变体1提升近1倍，结果证实基于FFT令牌混合网络的有效性.

其次，变体3在变体2的基础上增加了基于质量评估的目标模板动态记忆存储机制，用于获得更新稳定的目标模板信息，从而自适应目标表观变化. 由于记忆存储机制增加了目标模板数量，所以对平均跟踪速度上有一定影响. 变体3在测试时的平均跟踪速度比变体2降低了7 fps，但变体3在成功率和精度图上，分别比变体2高出0.6%和1.4%. 结果显示基于跟踪质量评估的目标模板动态记忆存储机制有效.

此外，为了进一步验证本文方法具备高效的特征提取与融合能力. 在LaSOT数据集上将本文方法与基于1阶段训练的Mixformer^[15]和基于2阶段训练的TrDiMP^[13]进行对比，结果如表3所示. 与采用2阶段训练的TrDiMP^[13]相比，本文方法的成功率和精度图分别提升2.7%和5.7%，同时平均跟踪速度比TrDiMP^[13]快8 fps.与基于1阶段训练的Mixformer^[15]相比，虽然成功率和精度图降低2.5%和2.4%，但是推理速度比Mixformer^[15]高9 fps.实验结果表明本文方法在准确率和推理速度间的平衡能力更好，同时34 fps的平均跟踪速度达到跟踪实时性^[11]要求（>30 fps）.

表 3 LaSOT数据集上推理速度的对比实验结果

Table 3. Comparative Experimental Results of Reasoning Speed on LaSOT Dataset

方法	出处	成功率	精度图	平均跟踪速度/fps
TrDiMP	CVPR21	0.640	0.666	26
Mixformer	CVPR22	0.692	0.747	25
本文算法		0.667	0.723	34

下载: 导出CSV

| 显示表格

4. 总　　结

本文提出了一种端到端的基于傅里叶变换的高效混合目标与搜索区域令牌的视觉目标跟踪方法. 该方法将特征提取与融合相结合，利用傅里叶变换将令牌的时域特征转换为频域特征，以便快速学习搜索区域与目标模板之间的长程依赖关系. 为了捕获目标在时序上的外观变化，提出了一种基于跟踪质量评估的目标模板动态记忆存储机制，确保更新目标外观模板的合理性. 广泛的实验结果验证了所提方法的有效性.

作者贡献声明：薛万利提出论文整体思路并负责撰写与修改论文；张智彬负责算法设计与实验并撰写论文；裴生雷负责算法设计及论文审核；张开华负责论文修改；陈胜勇参与了论文思路的讨论及审核.

图 1 边缘智能发展趋势

Figure 1. Edge intelligence developmental trend

下载: 全尺寸图片幻灯片

图 2 协同推理关键技术出现时间

Figure 2. Emerging time of key techniques in collaborative inference

下载: 全尺寸图片幻灯片

图 3 边缘协同推理关键技术、过程及应用场景

Figure 3. Key techniques, processes and application scenarios of edge collaborative inference

下载: 全尺寸图片幻灯片

图 4 模型切割方式

Figure 4. Model partition methods

下载: 全尺寸图片幻灯片

图 5 早期退出模式

Figure 5. Early exit pattern

下载: 全尺寸图片幻灯片

图 6 模型选择模式

Figure 6. Model selection pattern

下载: 全尺寸图片幻灯片

图 7 主流的边缘计算协同推理架构

Figure 7. Mainstream of collaborative inference framework in edge computing

下载: 全尺寸图片幻灯片

表 1 模型切割方法比较

Table 1 Comparison of Model Partition Methods

方法	模型切割执行者	切割的依据收集方式	切片依赖关系处理/服务发现方式	切片更新方式	优化目标	运行时涉及的切片数量
DeepThings^[46-47]	网关（gateway）	周期性收集节点状态	网关统一调度	节点拥有完整模型	内存、通信	≥2
ADCNN^[55]	中心节点（central node）	基于历史任务执行时延估计	中心节点调度	节点拥有完整模型	时延、通信	≥2
Neurosurgeon^[34]	客户端	实时观测当前网络、能耗状态	IP绑定（固定）	节点拥有完整模型	能耗（时延）	2
MoDNN^[48]	中心节点（group owner）	节点注册到中心节点时获取	中心节点调度	部署一次,无更新	时延	≥2
DeepX^[50]	中心节点（execution planner）	实时收集与线性回归预测	中心节点调度	每次运行推断重新生成执行计划	能耗、内存	≥2
AOFL^[51]	云端或中心节点	周期性收集节点状态	IP绑定（固定）	重新部署	时延、通信	≥2
CRIME^[52]	任意节点	节点实时交互	直接邻居集合	节点拥有完整模型	时延、能耗	≥2
DeepSlicing^[53]	主节点调度（master）	基于历史任务执行时延估计	中心节点调度	节点拥有完整模型	时延、内存	≥2
Edgent^[54]	主节点（边缘服务器）	观测的历史网络数据	IP绑定(固定)	重新部署	准确率、时延	2
文献[45]	中心节点	实时收集节点状态	IP绑定（固定）	节点拥有完整模型	内存	≥2
Cogent^[49]	中心节点（DDPG adgent）	周期性收集节点状态	Kubernetes提供的静态虚拟IP绑定 (固定)	重新部署	准确率、时延	2
文献[56−57]	边缘服务器（server）	根据模型及优化目标折中分析	IP绑定（固定）	重新部署	计算、通信时延	2

下载: 导出CSV

表 2 不同架构的比较

Table 2 Comparison of Different Architectures

序号	名称	关键结合技术	针对的问题	适用场景
1	基于模型切割的云边协同推理	模型切割、数据压缩、量化、矩阵分解/压缩、早期退出	边端设备能耗、算力有限、能耗与时延“折中”	有云端支撑、数据预处理、隐私、负载实时调整
2	基于模型切割的边边协同推理	模型切割、数据压缩、量化、矩阵分解/压缩	与云端链接不可靠、单节点资源受限、能耗与时延“折中”	无云端支撑、单节点资源不足且有邻居节点、通信代价低
3	基于模型选择的云边协同推理	数据压缩、模型压缩、知识蒸馏	边端设备推理精度不可靠	推理精度高可信、边端节点资源相对充足
4	基于多模型结果聚合的边边协同推理	数据/模型融合、数据压缩、异/同步通信	协同推理并行度低、推理精度不可靠	多场景协同推理、边端节点资源充足、对时延要求相对较低

下载: 导出CSV

参考文献(128)

[1]	David S, David C, Nick J. Top 10 strategic technology trends for 2020[EB/OL]. (2019-10-20)[2022-02-05].https://www.gartner.com/smarterwithgartner/gartner-top-10-strategic-technology-trends-for-2020
[2]	Carrie M, David R, Michael S. The growth in connected IoT devices is expected to generate 79.4ZB of data in 2025, according to a new IDC forecast[EB/OL]. (2019-06-18) [2022-02-15].https://www.businesswire.com/news/home/20190618005012/en/The-Growth-in-Connected-IoT-Devices-is-Expected-to-Generate-79.4ZB-of-Data-in-2025-According-to-a-New-IDC-Forecast
[3]	Xiao Yinhao, Jia Yizhen, Liu Chunchi, et al. Edge computing security: State of the art and challenges[J]. Proceedings of the IEEE, 2019, 107(8): 1608−1631 doi: 10.1109/JPROC.2019.2918437
[4]	Kevin M, Amir E. AWS customers rack up hefty bills for moving data[EB/OL]. (2019-10-21)[2022-02-15].https://www.theinformation.com/articles/aws-customers-rack-up-hefty-bills-for-moving-data
[5]	Jin Hai, Jia Lin, Zhou Zhi. Boosting edge intelligence with collaborative cross-edge analytics[J]. IEEE Internet of Things Journal, 2020, 8(4): 2444−2458
[6]	Xiang Chong, Wang Xinyu, Chen Qingrong, et al. No-jump-into-latency in China's Internet! toward last-mile hop count based IP geo-localization[C/OL] //Proc of the 19th Int Symp on Quality of Service. New York: ACM, 2019[2021-03-15].https://doi.org/10.1145/3326285.3329077
[7]	Jiang Xiaolin, Shokri-Ghadikolaei H, Fodor G, et al. Low-latency networking: Where latency lurks and how to tame it[J]. Proceedings of the IEEE, 2018, 107(2): 280−306
[8]	施巍松,张星洲,王一帆,等. 边缘计算: 现状与展望[J]. 计算机研究与发展,2019,56(1):69−89 Shi Weisong, Zhang Xingzhou, Wang Yifan, et al. Edge computing: Status quo and prospect[J]. Journal of Computer Research and Development, 2019, 56(1): 69−89 (in Chinese)
[9]	Zamora-Izquierdo MA, Santa J, Martínez JA, et al. Smart farming IoT platform based on edge and cloud computing[J]. Biosystems Engineering, 2019, 177(1): 4−17
[10]	肖文华,刘必欣,刘巍,等. 面向恶劣环境的边缘计算综述[J]. 指挥与控制学报,2019,5(3):181−190 Xiao Wenhua, Liu Bixin, Liu Wei, et al. A review of edge computing for harsh environments[J]. Journal of Command and Control, 2019, 5(3): 181−190 (in Chinese)
[11]	Stojkoska BLR, Trivodaliev KV. A review of Internet of things for smart home: Challenges and solutions[J]. Journal of Cleaner Production, 2017, 140(3): 1454−1464
[12]	Wan Shaohua, Gu Zonghua, Ni Qiang. Cognitive computing and wireless communications on the edge for healthcare service robots[J]. Computer Communications, 2020, 149(1): 99−106
[13]	吕华章,陈丹,范斌,等. 边缘计算标准化进展与案例分析[J]. 计算机研究与发展,2018,55(3):487−511 Lü Huazhang, Chen Dan, Fan Bin, et al. Standardization progress and case analysis of edge computing[J]. Journal of Computer Research and Development, 2018, 55(3): 487−511 (in Chinese)
[14]	Qi Jianpeng. Awesome edge computing[EB/OL]. (2003-06-02) [2022-03-15]. https://github.com/qijianpeng/awesome-edge-computing#engine
[15]	Cheol-Ho H, Blesson V. Resource management in fog/edge computing: A survey on architectures, infrastructure, and algorithms[J]. ACM Computing Surveys, 2019, 52(5): 1−37 doi: 10.1145/3342101
[16]	曾鹏,宋纯贺. 边缘计算[J]. 中国计算机学会通讯,2020,16(1):8−10 Zeng Peng, Song Chunhe. Edge computing[J]. Communications of China Computer Federation, 2020, 16(1): 8−10 (in Chinese)
[17]	高晗,田育龙,许封元,等. 深度学习模型压缩与加速综述[J]. 软件学报,2021,32(1):68−92 Gao Han, Tian Yulong, Xu Fengyuan, et al. Overview of deep learning model compression and acceleration[J]. Journal of Software, 2021, 32(1): 68−92 (in Chinese)
[18]	Zhou Zhi, Chen Xu, Li En, et al. Edge intelligence: Paving the last mile of artificial intelligence with edge computing[J]. Proceedings of the IEEE, 2019, 107(8): 1738−1762 doi: 10.1109/JPROC.2019.2918951
[19]	李肯立,刘楚波. 边缘智能: 现状和展望[J]. 大数据,2019,5(3):69−75 Li Kenli, Liu Chubo. Edge intelligence: Status quo and prospect[J]. Big Data, 2019, 5(3): 69−75 (in Chinese)
[20]	谈海生,郭得科,张弛,等. 云边端协同智能边缘计算的发展与挑战[J]. 中国计算机学会通讯,2020,16(1):38−44 Tan Haisheng, Guo Deke, Zhang Chi, et al. Development and challenges of cloud-edge-device collaborative intelligent edge computing[J]. Communications of China Computer Federation, 2020, 16(1): 38−44 (in Chinese)
[21]	张星洲,鲁思迪,施巍松. 边缘智能中的协同计算技术研究[J]. 人工智能,2019,5(7):55−67 Zhang Xingzhou, Lu Sidi, Shi Weisong. Research on collaborative computing technology in edge intelligence[J]. Artificial Intelligence, 2019, 5(7): 55−67 (in Chinese)
[22]	王晓飞. 智慧边缘计算: 万物互联到万物赋能的桥梁[J]. 人民论坛·学术前沿,2020(9):6−17 Wang Xiaofei. Smart edge computing: The bridge from the Internet of everything to the empowerment of everything[J]. People’s Forum·Academic Frontiers, 2020(9): 6−17 (in Chinese)
[23]	Fan Zhenyu, Wang Yang, Fan Wu, et al. Serving at the edge: An edge computing service architecture based on ICN[J]. ACM Transactions on Internet Technology, 2021, 22(1): 1−27
[24]	Jennings A , Copenhagen R V , Rusmin T. Aspects of network edge intelligence[R/OL]. 2001 [2022-03-16]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.6997&rep=rep1&type=pdf
[25]	Romaniuk R S. Intelligence in optical networks[G] //Proceedings of SPIE 5125: Proc of the Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments. Bellingham, WA: SPIE, 2003: 17−31
[26]	Okagawa T, Nishida K, Yabusaki M. A proposed mobility management for IP-based IMT network platform[J]. IEICE Transactions on Communications, 2005, 88(7): 2726−2734
[27]	Liang Ye. Mobile intelligence sharing based on agents in mobile peer-to-peer environment[C] //Proc of the 3rd Int Symp on Intelligent Information Technology and Security Informatics. Piscataway, NJ: IEEE, 2010: 667−670
[28]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84−90
[29]	Szegedy C, Liu Wei, Jia Yangqing, et al. Going deeper with convolutions[C/OL] //Proc of the 28th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015 [2022-03-16]. https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Szegedy_Going_Deeper_With_2015_CVPR_paper.html
[30]	Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[EB/OL]. (2016-11-04) [2022-03-16]. https://arxiv.org/abs/1602.07360
[31]	Cao Yu, Chen Songqing, Hou Peng, et al. FAST: A Fog computing assisted distributed analytics system to monitor fall for Stroke mitigation[C] //Proc of the 10th IEEE Int Conf on Networking, Architecture and Storage. Piscataway, NJ: IEEE, 2015: 2−11
[32]	Teerapittayanon S, McDanel B, Kung H T. Distributed deep neural networks over the cloud, the edge and end devices[C] //Proc of the 37th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2017: 328−339
[33]	Wang Xiaofei, Han Yiwen, Wang Chenyang, et al. In-edge AI: Intelligentizing mobile edge computing, caching and communication by federated learning[J]. IEEE Network, 2019, 33(5): 156−165 doi: 10.1109/MNET.2019.1800286
[34]	Kang Yiping, Johann H, Gao Cao, et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge[J]. ACM SIGARCH Computer Architecture News, 2017, 45(1): 615−629 doi: 10.1145/3093337.3037698
[35]	Li En, Zhou Zhi, and Chen Xu. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy[C] //Proc of the 2018 Workshop on Mobile Edge Communications. New York, ACM, 2018: 31−36
[36]	李逸楷,张通,陈俊龙. 面向边缘计算应用的宽度孪生网络[J]. 自动化学报,2020,46(10):2060−2071 Li Yikai, Zhang Tong, Chen Junlong. Wide twin networks for edge computing applications[J]. Acta Automatica Sinica, 2020, 46(10): 2060−2071 (in Chinese)
[37]	Al-Rakhami M, Alsahli M, Hassan M M, et al. Cost efficient edge intelligence framework using docker containers[C] //Proc of the 16th IEEE Int Conf on Dependable, Autonomic and Secure Computing. Piscataway, NJ: IEEE, 2018: 800−807
[38]	Al-Rakhami M, Gumaei A, Alsahli M, et al. A lightweight and cost effective edge intelligence architecture based on containerization technology[J]. World Wide Web, 2020, 23(2): 1341−1360 doi: 10.1007/s11280-019-00692-y
[39]	Verbraeken J, Wolting M, Katzy J, et al. A survey on distributed machine learning[J]. ACM Computing Surveys, 2020, 53(2): 1−33 doi: 10.1145/3389414
[40]	杨涛,柴天佑. 分布式协同优化的研究现状与展望[J]. 中国科学:技术科学,2020,50(11):1414−1425 doi: 10.1360/SST-2020-0040 Chai Tianyou, Yang Tao. Research status and prospects of distributed collaborative optimization[J]. Scientia Sinica Technologica, 2020, 50(11): 1414−1425 (in Chinese) doi: 10.1360/SST-2020-0040
[41]	Merenda M, Porcaro C, Iero D. Edge machine learning for AI-enabled IoT devices: A review[J/OL]. Sensors, 2020, 20(9) [2022-03-18]. https://doi.org/10.3390/s20092533
[42]	Véstias M P, Duarte R P, de Sousa J T, et al. Moving deep learning to the edge[J/OL]. Algorithms, 2020, 13(5) [2022-03-18]. https://doi.org/10.3390/a13050125
[43]	Chen Jiasi, Ran Xukan. Deep learning with edge computing: A review[J]. Proceedings of the IEEE, 2019, 107(8): 1655−1674 doi: 10.1109/JPROC.2019.2921977
[44]	洪学海,汪洋. 边缘计算技术发展与对策研究[J]. 中国工程科学,2018,20(2):28−34 Hong Xuehai, Wang Yang. Research on the development and countermeasures of edge computing technology[J]. China Engineering Science, 2018, 20(2): 28−34 (in Chinese)
[45]	Hadidi R, Cao Jiashen, Ryoo M S, et al. Toward collaborative inferencing of deep neural networks on Internet-of-things devices[J]. IEEE Internet of Things Journal, 2020, 7(6): 4950−4960 doi: 10.1109/JIOT.2020.2972000
[46]	Zhao Zhuoran, Barijough K M, Gerstlauer A. Deepthings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11): 2348−2359 doi: 10.1109/TCAD.2018.2858384
[47]	Pnevmatikatos D N, Pelcat M, Jung M. Embedded Computer Systems: Architectures, Modeling, and Simulation[M]. Berlin: Springer, 2019
[48]	Mao Jiachen, Chen Xiang, Nixon K W, et al. MoDNN: Local distributed mobile computing system for deep neural network[C] //Proc of the 24th Design, Automation Test in Europe Conf Exhibition. Piscataway, NJ: IEEE, 2017: 1396−1401
[49]	Shan Nanliang, Ye Zecong, Cui Xiaolong. Collaborative intelligence: Accelerating deep neural network inference via device-edge synergy[J/OL]. Secrrity and Communication Networks, 2020 [2022-03-16]. https://doi.org/10.1155/2020/8831341
[50]	Lane N D, Bhattacharya S, Georgiev P, et al. DeepX: A software accelerator for low-power deep learning inference on mobile devices[C/OL] //Proc of the 15th ACM/IEEE Int Conf on Information Processing in Sensor Networks (IPSN). 2016 [2022-04-06]. https://doi.org/10.1109/IPSN.2016.7460664
[51]	Zhou Li, Samavatian M H, Bacha A, et al. Adaptive parallel execution of deep neural networks on heterogeneous edge devices[C] //Proc of the 4th ACM/IEEE Symp on Edge Computing. New York: ACM, 2019: 195−208
[52]	Jahierpagliari D, Chiaro R, Macii E, et al. CRIME: Input-dependent collaborative inference for recurrent neural networks[J]. IEEE Transactions on Computers, 2020, 70(10): 1626−1639
[53]	Zhang Shuai, Zhang Sheng, Qian Zhuzhong, et al. DeepSlicing: Collaborative and adaptive CNN inference with low latency[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 22(9): 2175−2187
[54]	Li En, Zeng Liekang, Zhou Zhi, et al. Edge AI: On-demand accelerating deep neural network inference via edge computing[J]. IEEE Transactions on Wireless Communications, Institute of Electrical and Electronics Engineers, 2020, 19(1): 447−457
[55]	Zhang Saiqian, Lin Jieyu, Zhang Qi. Adaptive distributed convolutional neural network inference at the network edge with ADCNN[C/OL] //Proc of the 49th Int Conf on Parallel Processing. 2020 [2022-03-18]. https://doi.org/10.1145/3404397.3404473
[56]	Shao Jiawei, Zhang Jun. BottleNet++: An end-to-end approach for feature compression in device-edge co-inference systems[C/OL] //Proc of the IEEE Int Conf on Communications Workshops. Piscataway, NJ: IEEE, 2020 [2022-03-18]. https://doi.org/10.1109/ICCWorkshops49005.2020.9145068
[57]	Shao Jiawei, Zhang Jun. Communication-computation trade-off in resource-constrained edge inference[J]. IEEE Communications Magazine, 2020, 58(12): 20−26 doi: 10.1109/MCOM.001.2000373
[58]	Avasalcai C, Tsigkanos C, Dustdar S. Resource management for latency-sensitive IoT applications with satisfiability[J/OL]. IEEE Transactions on Services Computing, 2021 [2022-03-18]. https://doi.ieeecomputersociety.org/10.1109/TSC.2021.3074188
[59]	Chen Min, Li Wei, Hao Yiyue, et al. Edge cognitive computing based smart healthcare system[J]. Future Generation Computer Systems, 2018, 86(9): 403−411
[60]	Hu Diyi, Krishnamachari B. Fast and accurate streaming cnn inference via communication compression on the edge[C] //Proc of the 5th ACM/IEEE Int Conf on Internet of Things Design and Implementation. Piscataway, NJ: IEEE, 2020: 157−163
[61]	Hsu K J, Choncholas J, Bhardwaj K, et al. DNS does not suffice for MEC-CDN[C] //Proc of the 19th ACM Workshop on Hot Topics in Networks. New York: ACM, 2020: 212−218
[62]	Campolo C, Lia G, Amadeo M, et al. Towards named AI networking: Unveiling the potential of NDN for edge AI[G] //LNCS 12338: Proc of the 19th Int Conf on Ad-Hoc Networks and Wireless. Cham: Springer, 2020: 16−22
[63]	Jiang A H, Wong D L K, Canel C, et al. Mainstream: Dynamic stem-sharing for multi-tenant video processing[C] //Proc of the 2018 USENIX Annual Technical Conf. New York: ACM, 2018: 29−42
[64]	Mhamdi E, Guerraoui R, Rouault S. On the robustness of a neural network[C] //Proc of the 36th IEEE Symp on Reliable Distributed Systems. Piscataway, NJ: IEEE, 2017: 84−93
[65]	Yousefpour A, Devic S, Nguyen B Q, et al. Guardians of the Deep Fog: Failure-resilient DNN inference from edge to cloud[C] //Proc of the 1st Int Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things. New York: ACM, 2019: 25−31
[66]	Hu Chuang, Bao Wei, Wang Dan, et al. Dynamic adaptive DNN surgery for inference acceleration on the edge[C] //Proc of the 38th IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2019: 1423−1431
[67]	Song Han, Mao Huizi, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[EB/OL]. (2016-02-15) [2022-03-18]. https://arxiv.org/abs/1510.00149
[68]	Masana M, van de Weijer J, Herranz L, et al. Domain-adaptive deep network compression[C] //Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 22−29
[69]	Courbariaux M, Bengio Y, David J P. BinaryConnect: Training deep neural networks with binary weights during propagations[C] //Proc of the 28th Int Conf on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2015: 3123−3131
[70]	Gholami A, Kim S, Zhen Dong, et al. A survey of quantization methods for efficient neural network inference[J]. arXiv preprint, arXiv: 2103.13630, 2021
[71]	Cao Qingqing, Irimiea A E, Abdelfattah M, et al. Are mobile DNN accelerators accelerating DNNs?[C] //Proc of the 5th Int Workshop on Embedded and Mobile Deep Learning. New York: ACM, 2021: 7−12
[72]	Guo Kaiyuan, Song Han, Song Yao, et al. Software-hardware codesign for efficient neural network acceleration[J]. IEEE Micro, 2017, 37(2): 18−25 doi: 10.1109/MM.2017.39
[73]	Guo Kaiyuan, Li Wenshuo, Zhong Kai, et al. Neural network accelerator comparison[EB/OL]. (2018-01-01) [2022-12-26]. https://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator tsinghua.edu.cn/project.html
[74]	Li Hao, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets[J]. arXiv preprint, arXiv: 1608.08710, 2017
[75]	Luo Jianhao, Zhang Hao, Zhou Hongyu, et al. ThiNet: Pruning cnn filters for a thinner net[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(10): 2525−2538 doi: 10.1109/TPAMI.2018.2858232
[76]	He Yihui, Zhang Xianyu, Sun Jian. Channel pruning for accelerating very deep neural networks[C] //Proc of the 16th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 1398−1406
[77]	Hu Hengyuan, Peng Rui, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures[J]. arXiv preprint, arXiv: 1607.03250, 2016
[78]	Wen Wei, Wu Chunpeng, Wang Yandan, et al. Learning structured sparsity in deep neural networks[C] //Proc of the 30th Int Conf on Neural Information Processing Systems. New York: ACM, 2016: 2082−2090
[79]	Chen Hanting, Wang Yunhe, Xu Chang, et al. Data-free learning of student networks[C] //Proc of the 17th IEEE/CVF Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 3513−3521
[80]	Niu Wei, Ma Xiaolong, Lin Sheng, et al. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning[C] //Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 907−922
[81]	Qin Haotong, Gong Ruihao, Liu Xianglong, et al. Binary neural networks: A survey [J]. Pattern Recognition, 2020, 105(9): 107281
[82]	卢冶,龚成,李涛. 深度神经网络压缩自动化的挑战与机遇[J]. 中国计算机学会通讯,2021,17(3):41−47 Lu Ye, Gong Cheng, Li Tao. Challenges and opportunities of deep neural network compression automation[J]. China Computer Society Communications, 2021, 17(3): 41−47 (in Chinese)
[83]	Hubara I, Courbariaux M, Soudry D, et al. Binarized neural networks[C] //Proc of the 30th Int Conf on Neural Information Processing Systems. New York: ACM, 2016: 4114−4122
[84]	Li Fengfu, Liu Bin. Ternary weight networks[J]. arXiv preprint, arXiv: 1605.04711, 2016
[85]	Alemdar H, Leroy V, Prost-Boucle A, et al. Ternary neural networks for resource-efficient AI applications[C] //Proc of the 30th Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2017: 2547−2554
[86]	Chen Yao, Zhang Kang, Gong Cheng, et al. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA[C] //Proc of the 14th IEEE Computer Society Annual Symp on VLSI. Piscataway, NJ: IEEE, 2019: 13−18
[87]	Zhou Shuchuang, Wu Yuxin, Ni Zekun, et al. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients[J]. arXiv preprint, arXiv: 1606.06160, 2018
[88]	Wang Peisong, Hu Qinghao, Zhang Yifan, et al. Two-step quantization for low-bit neural networks[C] //Proc of the 31st IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 4376−4384
[89]	Jung Sangli, Son Changyong, Lee Seohyung, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss[C] //Proc of the 32nd IEEE/CVF Conf on Computer Vision and Pattern Recognition . Piscataway, NJ: IEEE, 2019: 4345−4354
[90]	Gong Cheng, Li Tao, Lu Ye, et al. µL2Q: An ultra-low loss quantization method for DNN compression[C/OL] //Proc of the Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2019 [2022-04-07]. https://doi.org/10.1109/IJCNN.2019.8851699
[91]	葛道辉,李洪升,张亮,等. 轻量级神经网络架构综述[J]. 软件学报,2020,31(9):2627−2653 doi: 10.13328/j.cnki.jos.005942 Ge Daohui, Li Hongsheng, Zhang Liang, et al. A review of lightweight neural network architecture[J]. Journal of Software, 2020, 31(9): 2627−2653 (in Chinese) doi: 10.13328/j.cnki.jos.005942
[92]	Shi Lei, Feng Shi, Zhu Zhifang. Functional hashing for compressing neural networks[J]. arXiv preprint, arXiv: 1605.06560, 2016
[93]	Wu Junru, Wang Yue, Wu Zhenyu, et al. Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions[C] //Proc of the 35th Int Conf on Machine Learning PMLR. New York: ACM, 2018: 5363−5372
[94]	Xu Xiaowei, Lu Qing, Wang Tianchen, et al. Efficient hardware implementation of cellular neural networks with incremental quantization and early exit[J]. ACM Journal on Emerging Technologies in Computing Systems, 2018, 14(4): 1−20
[95]	Li Yuhong, Hao Cong, Zhang Xiaofan, et al. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions[C/OL] //Proc of the 57th ACM/IEEE Design Automation Conf. New York: ACM, 2020 [2022-04-07]. https://doi.org/10.1109/DAC18072.2020.9218749
[96]	Aimar A, Mostafa H, Calabrese E, et al. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3): 644−656 doi: 10.1109/TNNLS.2018.2852335
[97]	Sebastian A, Le Gallo M, Khaddam-Aljameh R, et al. Memory devices and applications for in-memory computing[J]. Nature Nanotechnology, 2020, 15(7): 529−544 doi: 10.1038/s41565-020-0655-z
[98]	Song Zhuoran, Fu Bangqi, Wu Feiyang, et al. DRQ: Dynamic region-based quantization for deep neural network acceleration[C] //Proc of the 47th ACM/IEEE Annual Int Symp on Computer Architecture. New York: ACM, 2020: 1010−1021
[99]	Yang Yixiong, Yuan Zhe, Su Fang, et al. Multi-channel precision-sparsity-adapted Inter-frame differential data Codec for video neural network processor[C] //Proc of the 33rd ACM/IEEE Int Symp on Low Power Electronics and Design. New York: ACM, 2020: 103−108
[100]	Tang Yibin, Wang Ying, Li Huawei, et al. MV-Net: Toward real-time deep learning on mobile GPGPU systems[J]. ACM Journal on Emerging Technologies in Computing Systems, 2019, 15(4): 1−25
[101]	Chen Shengbo, Shen Cong, Zhang Lanxue, et al. Dynamic aggregation for heterogeneous quantization in federated learning[J]. IEEE Transactions on Wireless Communications, 2021, 20(10): 6804−6819 doi: 10.1109/TWC.2021.3076613
[102]	Teerapittayanon S, McDanel B, Kung H T. BranchyNet: Fast inference via early exiting from deep neural networks[C] //Proc of the 23rd Int Conf on Pattern Recognition. Piscataway, NJ: IEEE, 2016: 2464−2469
[103]	Lo C, Su YY, Lee CY, et al. A dynamic deep neural network design for efficient workload allocation in edge computing[C] //Proc of the 35th 2017 IEEE Int Conf on Computer Design. Piscataway, NJ: IEEE, 2017: 273−280
[104]	Wang Zizhao, Bao Wei, Yuan Dong, et al. SEE: Scheduling early exit for mobile DNN inference during service outage[C] //Proc of the 22nd Int ACM Conf on Modeling, Analysis and Simulation of Wireless and Mobile Systems. New York: ACM, 2019: 279−288
[105]	Wang Zizhao, Bao Wei, Yuan Dong, et al. Accelerating on-device DNN inference during service outage through scheduling early exit[J]. Computer Communications, 2020, 162(10): 69−82
[106]	Scarpiniti M, Baccarelli E, Momenzadeh A, et al. DeepFogSim: A toolbox for execution and performance evaluation of the inference phase of conditional deep neural networks with early exits atop distributed Fog platforms[J/OL]. Applied Sciences, 2021, 11(1)[2022-03-18]. https://doi.org/10.3390/app11010377
[107]	Su Xiao. EasiEI simulator[CP/OL]. [2022-03-18]. https://gitlab.com/Mirrola/ns-3-dev/-/wikis/EasiEI-Simulator
[108]	Park E, Kim D, Kim S, et al. Big/little deep neural network for ultra low power inference[C] //Proc of the Int Conf on Hardware/Software Codesign and System Synthesis. Piscataway, NJ: IEEE, 2015: 124−132
[109]	Putra T A, Leu J S. Multilevel Neural network for reducing expected inference time[J]. IEEE Access, 2019, 7(11): 174129−174138
[110]	Taylor B, Marco V S, Wolff W, et al. Adaptive deep learning model selection on embedded systems[J]. ACM SIGPLAN Notices, 2018, 53(6): 31−43 doi: 10.1145/3299710.3211336
[111]	Shu Guansheng, Liu Weiqing, Zheng Xiaojie, et al. IF-CNN: Image-aware inference framework for cnn with the collaboration of mobile devices and cloud[J]. IEEE Access, 2018, 6(10): 68621−68633
[112]	Stamoulis D, Chin T W, Prakash A K, et al. Designing adaptive neural networks for energy-constrained image classification[C] //Proc of the Int Conf on Computer-Aided Design. New York: ACM, 2018: 1−8
[113]	Song Mingcong, Zhong Kan, Zhang Jiaqi, et al. In-Situ AI: Towards autonomous and incremental deep learning for IoT systems[C] //Proc of the 24th IEEE Int Symp on High Performance Computer Architecture. Piscataway, NJ: IEEE, 2018: 92−103
[114]	Zhang Li, Han Shihao, Wei Jianyu, et al. nn-Meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices[C] //Proc of the 19th Annual Int Conf on Mobile Systems, Applications, and Services. New York: ACM, 2021: 81−93
[115]	Yue Zhifeng, Zhu Zhixiang, Wang Chuang, et al. Research on big data processing model of edge-cloud collaboration in cyber-physical systems[C] //Proc of the 5th IEEE Int Conf on Big Data Analytics. Piscataway, NJ: IEEE, 2020: 140−144
[116]	Wang Huitian, Cai Guangxing, Huang Zhaowu, et al. ADDA: Adaptive distributed DNN inference acceleration in edge computing environment[C] //Proc of the 25th Int Conf on Parallel and Distributed Systems. Piscataway, NJ: IEEE, 2019: 438−445
[117]	Chen Liang, Qi Jiapeng, Su Xiao, et al. REMR: A reliability evaluation method for dynamic edge computing network under time constraints[J]. arXiv preprint, arXiv: 2112.01913, 2021
[118]	Long Saiqin, Long Weifan, Li Zhetao, et al. A game-based approach for cost-aware task assignment with QoS constraint in collaborative edge and cloud environments[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(7): 1629−1640 doi: 10.1109/TPDS.2020.3041029
[119]	Yang Bo, Cao Xuelin, Li Xiangfan, et al. Mobile-edge-computing-based hierarchical machine learning tasks distribution for IIoT[J]. IEEE Internet of Things Journal, 2020, 7(3): 2169−2180 doi: 10.1109/JIOT.2019.2959035
[120]	Fang Yihao, Jin Ziyi, Zheng Rong. TeamNet: A collaborative inference framework on the edge[C] //Proc of the 39th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2019: 1487−1496
[121]	Fang Yihao, Shalmani SM, Zheng Rong. CacheNet: A model caching framework for deep learning inference on the edge[J]. 2020 (2020-07-03)[2022-03-17]. arXiv preprint, arXiv: 2007.01793, 2020
[122]	檀超,张静宣,王铁鑫,等. 复杂软件系统的不确定性[J]. 软件学报,2021,32(7):1926−1956 doi: 10.13328/j.cnki.jos.006267 Tan Chao, Zhang Jingxuan, Wang Tiexin, et al. Uncertainty in complex software systems[J]. Journal of Software, 2021, 32(7): 1926−1956 (in Chinese) doi: 10.13328/j.cnki.jos.006267
[123]	宋纯贺,曾鹏,于海斌. 工业互联网智能制造边缘计算: 现状与挑战[J]. 中兴通讯技术,2019,25(3):50−57 doi: 10.12142/ZTETJ.201903008 Song Chunhe, Zeng Peng, Yu Haibin. Industrial Internet intelligent manufacturing edge computing: Current situation and challenges[J]. ZTE Technology, 2019, 25(3): 50−57 (in Chinese) doi: 10.12142/ZTETJ.201903008
[124]	Chen Chao, Zhang Daqing, Wang Yasha, et al. Enabling Smart Urban Services with GPS Trajectory Data[M]. Berlin: Springer, 2021
[125]	黄倩怡,李志洋,谢文涛,等. 智能家居中的边缘计算[J]. 计算机研究与发展,2020,57(9):1800−1809 doi: 10.7544/issn1000-1239.2020.20200253 Huang Qianyi, Li Zhiyang, Xie Wentao, et al. Edge computing in smart home[J]. Journal of Computer Research and Development, 2020, 57(9): 1800−1809 (in Chinese) doi: 10.7544/issn1000-1239.2020.20200253
[126]	Li Xian, Bi Suzhi, Wang Hui. Optimizing resource allocation for joint AI model training and task inference in edge intelligence systems[J]. IEEE Wireless Communications Letters, 2021, 10(3): 532−536 doi: 10.1109/LWC.2020.3036852
[127]	Trivedi A, Wang Lin, Bal H, et al. Sharing and caring of data at the edge[C/OL] //Proc of the 3rd USENIX Workshop on Hot Topics in Edge Computing. Berkeley, CA: USENIX Association, 2020 [2022-04-06]. https://www.usenix.org/conference/hotedge20/presentation/trivedi
[128]	Richins D, Doshi D, Blackmore M, et al. AI tax: The hidden cost of AI data center applications[J]. ACM Transactions on Computer Systems, 2021, 37(1-4): 1-32

施引文献(4)

期刊类型引用(4)

1.	邱淼波，高晋，林述波，李椋，王刚，胡卫明，王以政. 线性分解注意力的边缘端高效Transformer跟踪. 中国图象图形学报. 2025(02): 485-502 . 百度学术
2.	郭虎升，刘正琪，刘艳杰，王文剑. 时空特征强化与感知的视觉目标跟踪方法. 陕西师范大学学报(自然科学版). 2025(01): 60-70 . 百度学术
3.	张忠林. 基于蒙特卡罗算法的海上目标搜索研究. 中国新通信. 2024(16): 10-12 . 百度学术
4.	郭虎升. 目标检测综述：从传统方法到深度学习. 新兴科学和技术趋势. 2024(02): 128-145 . 百度学术