ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (10): 2243-2249.doi: 10.7544/issn1000-1239.2019.20190414

所属专题: 2019密码学与智能安全研究专题

• 信息安全 • 上一篇    下一篇

基于数据纵向分布的隐私保护逻辑回归

宋蕾1,马春光2,段广晗1,袁琪3   

  1. 1(哈尔滨工程大学计算机科学与技术学院 哈尔滨 150001);2(山东科技大学计算机科学与工程学院 山东青岛 266590);3(齐齐哈尔大学通信与电子工程学院 黑龙江齐齐哈尔 161006) (songl@hrbeu.edu.cn)
  • 出版日期: 2019-10-16
  • 基金资助: 
    国家自然科学基金项目(61472097);黑龙江省自然科学基金项目(JJ2019LH1770)

Privacy-Preserving Logistic Regression on Vertically Partitioned Data

Song Lei1, Ma Chunguang2, Duan Guanghan1, Yuan Qi3   

  1. 1(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001);2(College of Computer Science and Engineering,Shandong University of Science and Technology, Qingdao, Shandong 266590);3(College of Telecommunication and Electronic Engineering, Qiqihar University, Qiqihar, Heilongjiang 161006)
  • Online: 2019-10-16

摘要: 逻辑回归是机器学习的重要算法之一,为解决集中式训练方式不能保护隐私的问题,提出隐私保护的逻辑回归解决方案,该方案适用于数据以特征维度进行划分,纵向分布在两方情况下,两方进行协作式训练学习到共享的模型结构.两方在本地数据集上进行训练,通过交换中间计算结果而不直接暴露私有数据,利用加法同态加密算法在密文下进行运算保证计算安全,保证在交互中不能获取对方的敏感信息.同时,提供隐私保护的预测方法,保证模型部署服务器不能获取询问者的私有数据.经过分析与实验验证,在几乎不损失精度的前提下,该案可以在两方均是半诚实参与者情况下提供隐私保护.

关键词: 逻辑回归, 隐私保护, 同态加密, 协作训练, 数据纵向分布

Abstract: Logistic regression is the important algorithms of machine learning. Traditional training methods require centralized collection of training data which will cause privacy issues. To solve this problem, this paper proposes privacy-preserving logistic regression. This scheme is suitable for dividing data by feature dimension, and the training data is shared between two parties. The two parties conduct collaborative training and learn a shared model. In this scheme, the two parties train the model locally on private data set while exchanging the intermediate calculation results without directly exposing their private data. Additionally, the additively homomorphic scheme can ensure the calculation security which can be performed on the cipher text. During the training process, the participants can only obtain zero knowledge of each other and cannot get any information about model parameters and training data of another participant. At the same time, a privacy protection prediction method is provided to ensure that the model deployment server cannot obtain the private data of the inquirer. After analysis and experimental verification, within the tolerable loss of precision, the scheme is secure against semi-honest participants and provide privacy protection.

Key words: logistic regression, privacy-preserving, homomorphic encryption, collaborative training, vertically partitioned data

中图分类号: