一种考虑隐私保护的深度强化学习任务分配模型

杨明川; 朱敬华; 李元婧; 奚赫然

doi:10.7544/issn1000-1239.202220647

一种考虑隐私保护的深度强化学习任务分配模型

Task Allocation Model Based on Deep Reinforcement Learning ConsideringPrivacy Protection

摘要

摘要: 移动群智感知（mobile crowdsensing, MCS）是利用大规模移动智能设备进行数据收集、数据挖掘和智能决策的新范式，高效的任务分配方法是MCS获得高性能的关键. 传统的贪婪算法或蚂蚁算法假设工人和任务固定，不适用于工人和任务的位置、数量和时间动态变化的场景. 而且，现有任务分配方法通常由中央服务器收集工人和任务的信息进行决策，容易导致工人隐私泄露. 因此，提出具有隐私保护的深度强化学习（deep reinforcement learning, DRL）模型来获得优化的任务分配策略. 首先，将任务分配建模为多目标优化的动态规划问题，旨在最大化工人和平台的双向收益，实现纳什均衡. 其次，提出基于DRL的近端策略优化（proximal policy optimization, PPO）模型进行训练，学习模型参数. 最后，通过本地差分隐私方式，对工人位置等敏感信息加入随机噪声实现隐私保护，并由中央服务器训练整个模型，获得最优分配策略. 对收敛时间、最大收益和任务覆盖率等指标进行实验评估，在模拟数据集上的实验结果表明，与传统方法和其他基于DRL的方法对比，该方法在不同的评估指标上均有明显提升，并且能够保护工人的隐私.

Abstract: Mobile crowdsensing (MCS) is a new mode for collecting and mining data and intelligent decision-making with mobile intelligent devices. The key to the high performance of MCS is the efficient method of task allocation. The traditional algorithm (greedy algorithm or ant algorithm) assumes that workers and tasks are static. It’s not fit for the scene where the position and time of workers and tasks change continuously. In addition, the existing methods usually make decisions by the central server based on the collected information, which usually leads to leakage of workers’ privacy. Therefore, we propose a task allocation method based on deep reinforcement learning (DRL) with privacy protection. Firstly, aiming to maximize the two-way benefits of workers and platforms and realize Nash equilibrium, the task allocation is modeled as a dynamic programming problem of multi-objective optimization. Secondly, the model based on proximal policy optimization (PPO) of DRL for training and learning model parameters is proposed. Finally, we use the local differential privacy method to add random noise to the sensitive information of workers to protect privacy. The central server trains the whole model to obtain the optimal allocation strategy. In this paper, the astringency, revenue and task cover rate are experimentally evaluated. The results show that the proposed method has significant improvement in different indexes, and can protect the privacy of workers, compared with the traditional methods and other DRL based methods.

HTML全文

参考文献(24)

施引文献

资源附件(0)