基于秘密分享和梯度选择的高效安全联邦学习

董业; 侯炜; 陈小军; 曾帅

doi:10.7544/issn1000-1239.2020.20200463

基于秘密分享和梯度选择的高效安全联邦学习

董业^1,2,
侯炜^1,2,
陈小军¹,
曾帅¹

¹(中国科学院信息工程研究所北京 100195)
²(中国科学院大学网络空间安全学院北京 101408) (dongye@iie.ac.cn)

详细信息

中图分类号: TP391; TP181
计量
- 文章访问数: 2396
- HTML全文浏览量: 11
- PDF下载量: 1277
出版历程
- 发布日期: 2020-09-30

Efficient and Secure Federated Learning Based on Secret Sharing and Gradients Selection

¹(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100195)
²(School of Cyber Security, University of Chinese Academy of Sciences, Beijing 101408)

摘要

摘要: 近年来，联邦学习已经成为一种新兴的协作式机器学习方法.在联邦学习中，分布式用户可以仅通过共享梯度来训练各种模型.但是一些研究表明梯度也会泄露用户的隐私信息，而安全多方计算被认为是一种保护隐私安全的有效工具.另一方面，一些研究人员提出了Top-K梯度选择算法，以减少用户之间同步梯度的通信开销.但是，目前很少有工作可以平衡这2个领域的优势.将秘密共享与Top-K梯度选择相结合，设计了高效且安全的联邦学习协议，以便在保证用户隐私和数据安全的同时，减少通信开销，并提高模型训练效率.此外，提出了一种高效的方法来构造消息验证码，以验证服务器返回的聚合结果的有效性，其中，验证码引入的通信开销与梯度的数量无关.实验结果表明：相比于同样条件下的明文训练，该文的安全技术在通信和计算方面都会引入少量额外的开销，但该方案取得了和明文训练同一水平的模型准确率.
- 安全 /
- 隐私 /
- 秘密分享 /
- 梯度选择 /
- 联邦学习
Abstract: In recent years, federated learning (FL) has been an emerging collaborative machine learning method where distributed users can train various models by only sharing gradients. To prevent privacy leakages from gradients, secure multi-party computation (MPC) has been considered as a promising guarantee recently. Meanwhile, some researchers proposed the Top-K gradients selection algorithm to reduce the traffic for synchronizing gradients among distributed users. However, there are few works that can balance the advantages of the two areas at present. We combine secret sharing with Top-K gradients selection to design efficient and secure federated learning protocols, so that we can cut down the communication overheads and improve the efficiency during the training phase while guaranteeing the users privacy and data security. Also, we propose an efficient method to construct message authentication code (MAC) to verify the validity of the aggregated results from the servers. And the communication overheads introduced by the MAC is small and independent of the number of shared gradients. Besides, we implement a prototype system. Compared with the plaintext training, on the one hand, our secure techniques introduce small additional overheads in communication and computation; On the other hand, we achieve the same level of accuracy as the plaintext training.
- security /
- privacy /
- secret sharing /
- gradients selection /
- federated learning

HTML全文

参考文献(0)

施引文献(47)

期刊类型引用(18)

1.	苏小红，郑伟宁，蒋远，魏宏巍，万佳元，魏子越. 基于学习的源代码漏洞检测研究与进展. 计算机学报. 2024(02): 337-374 . 百度学术
2.	刘忠鑫，唐郅杰，夏鑫，李善平. 代码变更表示学习及其应用研究进展. 软件学报. 2023(12): 5501-5526 . 百度学术
3.	奚建飞，王志英，邹文景，甘莹. 基于深度学习的非结构化表格文档数据抽取方法. 微型电脑应用. 2022(02): 102-105 . 百度学术
4.	钱忠胜，宋佳，俞情媛，成轶伟，孙志旺. 利用函数影响力的相似程序间测试用例重用与生成. 电子学报. 2022(07): 1696-1707 . 百度学术
5.	张祥平，刘建勋. 基于深度学习的代码表征及其应用综述. 计算机科学与探索. 2022(09): 2011-2029 . 百度学术
6.	魏敏，张丽萍，闫盛. 基于程序向量树和聚类的学生程序算法识别方法. 计算机工程与设计. 2022(10): 2790-2798 . 百度学术
7.	汶东震，张帆，刘海峰，杨亮，徐博，林原，林鸿飞. 深度程序理解视角下代码搜索研究综述. 计算机工程与应用. 2022(20): 63-72 . 百度学术
8.	王一凡，赵逢禹，艾均. 面向基本路径学习的代码自动命名. 小型微型计算机系统. 2022(11): 2302-2307 . 百度学术
9.	杨静宜，崔建弘，庞雅静. 基于特征深度学习的机器人协调操作感知控制. 计算机仿真. 2021(01): 307-311 . 百度学术
10.	赵乐乐，张丽萍. 代码注释自动生成研究进展. 计算机应用研究. 2021(04): 982-989 . 百度学术
11.	陈翔，杨光，崔展齐，孟国柱，王赞. 代码注释自动生成方法综述. 软件学报. 2021(07): 2118-2141 . 百度学术
12.	谢春丽，梁瑶，王霞. 深度学习在代码表征中的应用综述. 计算机工程与应用. 2021(20): 53-63 . 百度学术
13.	魏敏，张丽萍. 代码搜索方法研究进展. 计算机应用研究. 2021(11): 3215-3221+3230 . 百度学术
14.	李眩，吴晓兵，童百利. 基于动态模糊聚类的数据挖掘研究——以安徽城市综合实力分析为例. 贵阳学院学报(自然科学版). 2020(01): 52-57 . 百度学术
15.	池昊宇，陈长波. 基于神经网络的循环分块大小预测. 计算机科学. 2020(08): 62-70 . 百度学术
16.	景艳娥. 基于深度学习技术的语法纠错算法模型构建分析. 信息技术. 2020(09): 143-147+152 . 百度学术
17.	霍丽春，张丽萍. 代码注释演化及分类研究综述. 内蒙古师范大学学报(自然科学汉文版). 2020(05): 423-432 . 百度学术
18.	何后裕，王炳鑫. 基于深度学习的综合性共享数据匹配算法研究. 电子设计工程. 2020(20): 111-115 . 百度学术