DNN Inference Acceleration via Heterogeneous IoT Devices Collaboration
-
摘要: 深度神经网络(deep neural network, DNN)已经广泛应用于各种智能应用,如图像和视频识别.然而,由于DNN任务计算量大,资源受限的物联网(Internet of things, IoT)设备难以本地单独执行DNN推断任务.现有云协助方法容易受到通信延迟无法预测和远程服务器性能不稳定等因素的影响.一种非常有前景的方法是利用IoT设备协作实现分布式、可扩展DNN任务推断.然而,现有工作仅研究IoT设备同构情况下的静态拆分策略.因此,迫切需要研究如何在能力异构且资源受限的IoT设备间自适应地拆分DNN任务,协作执行任务推断.上述研究问题面临2个重要挑战:1)DNN任务多层推断延迟难以准确预测;2)难以在异构动态的多设备环境中实时智能调整协作推断策略.为此,首先提出细粒度可解释的多层延迟预测模型.进一步,利用进化增强学习(evolutionary reinforcement learning, ERL)自适应确定DNN推断任务的近似最优拆分策略.实验结果表明:该方法能够在异构动态环境中实现显著DNN推断加速.
-
关键词:
- 深度神经网络推断加速 /
- 异构设备协作 /
- 进化增强学习 /
- 多层预测模型 /
- 拆分策略
Abstract: Deep neural networks (DNNs) have been intensively deployed in a variety of intelligent applications (e.g., image and video recognition). Nevertheless, due to DNNs’ heavy computation burden, resource-constrained IoT devices are unsuitable to locally execute DNN inference tasks. Existing cloud-assisted approaches are severely affected by unpredictable communication latency and unstable performance of remote servers. As a countermeasure, it is a promising paradigm to leverage collaborative IoT devices to achieve distributed and scalable DNN inference. However, existing works only consider homogeneous IoT devices with static partition. Thus, there is an urgent need for a novel framework to adaptively partition DNN tasks and orchestrate distributed inference among heterogeneous resource-constrained IoT devices. There are two main challenges in this framework. First, it is difficult to accurately profile the DNNs’ multi-layer inference latency. Second, it is difficult to learn the collaborative inference strategy adaptively and in real-time in the heterogeneous environments. To this end, we first propose an interpretable multi-layer prediction model to abstract complex layer parameters. Furthermore, we leverage the evolutionary reinforcement learning (ERL) to adaptively determine the near-optimal partitioning strategy for DNN inference tasks. Real-world experiments based on Raspberry Pi are implemented, showing that our proposed method can significantly accelerate the inference speed in dynamic and heterogeneous environments. -
-
期刊类型引用(9)
1. 潘海霞,曹宁. 面向无线网络的数据传输自适应拥塞控制. 自动化与仪器仪表. 2024(01): 75-78+84 . 百度学术
2. 江宝英,廖锋. 基于云计算的多媒体网络数据传输拥塞控制方法. 长江信息通信. 2024(11): 96-98 . 百度学术
3. 吴欣. 基于流媒体技术的医学档案信息资源数字化传输. 微型电脑应用. 2023(08): 213-216 . 百度学术
4. 朱振伸,范黎林,赵敬云. 多媒体网络中基于QoS的自适应SPC仿真. 计算机仿真. 2022(01): 213-217 . 百度学术
5. 范洁,谢鑫,陈战胜. 关键姿态映射下视频动态帧目标定位方法. 计算机仿真. 2022(03): 156-159+248 . 百度学术
6. 王健,王仲宇,朱文凯,孙洁茹,潘瑞娟,陈晓宁. 基于可穿戴设备的无线组网输液监控系统. 传感器与微系统. 2022(06): 106-108+113 . 百度学术
7. 廖彬彬,张广兴,刁祖龙,谢高岗. 基于深度强化学习的MPTCP动态编码调度系统. 高技术通讯. 2022(07): 727-736 . 百度学术
8. 刘伟,张涛. 移动边缘计算中基于视频内容协作分发的联合激励机制. 计算机应用研究. 2021(09): 2803-2810 . 百度学术
9. 肖巍,卢劲伉,李博深,吴启槊,白英东,潘超. Faster RCNN优化实时人数流量检测. 长春工业大学学报. 2020(04): 369-374 . 百度学术
其他类型引用(5)
计量
- 文章访问数: 1343
- HTML全文浏览量: 6
- PDF下载量: 591
- 被引次数: 14