高级检索

    边缘智能计算系统中加速推荐模型训练的样本调度机制

    Samples Dispatching Mechanism for Accelerating Recommendation Model Training in Edge Intelligent Computing System

    • 摘要: 在边缘智能计算系统中使用边缘工作节点训练深度学习推荐模型(DLRMs)具有诸多优势,尤其是在数据隐私保护、低延迟和个性化推荐等方面. 然而,由于嵌入表的规模庞大,在训练DLRM时通常采用一个或多个参数服务器来维护全局嵌入表,同时利用多个边缘节点缓存嵌入表的一部分. 在此架构下,需要在边缘节点和参数服务器间传输嵌入以保证嵌入数据一致性,嵌入传输代价通常主导了训练周期. 本文旨在研究在边缘智能计算系统中,当面对异构网络和资源受限等挑战时,如何将嵌入样本调度到合适的边缘节点上进行训练,以最小化总嵌入传输代价. 为此,本文提出了一个基于预期嵌入传输代价的嵌入样本调度机制ESD.在ESD中,本文设计了一个结合资源密集型最优解法和启发式解法的调度决策方法HybridDis,以实现决策质量和资源消耗之间的平衡. 本文使用C++和Python实现了ESD的原型系统,并在真实工作负载下将其与现有最先进的机制进行比较. 大量实验结果表明,ESD可将嵌入传输代价至多降低36.76%,并且在端到端DLRM训练速度上实现了最高1.74倍的加速1.

       

      Abstract: Training deep learning recommendation models (DLRMs) using edge workers in edge intelligent computing system brings several benefits, particularly in terms of data privacy protection, low latency and personalization. However, due to the huge size of embedding tables, typical DLRM training frameworks adopt one or more parameter servers to maintain global embedding tables, while leveraging several edge workers to cache part of them. This incurs significant transmission cost for embedding transmissions between workers and parameter servers, which can dominate the training cycle. In this paper, we investigate how to dispatch input embedding samples to appropriate edge workers to minimize the total embedding transmission cost when facing edge-specific challenges such as heterogeneous networks and limited resources. We develop ESD, a novel mechanism that optimizes the dispatching of input embedding samples to edge workers based on expected embedding transmission cost. We propose HybridDis as the dispatch decision method within ESD, which combines a resource-intensive optimal algorithm and a heuristic algorithm to balance decision quality and resource consumption. We implement a prototype of ESD using C++ and Python and compare it with state-of-the-art mechanisms on real-world workloads. Extensive experimental results show that ESD reduces the embedding transmission cost by up to 36.76% and achieves up to 1.74x speedup in end-to-end DLRM training.

       

    /

    返回文章
    返回