ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (2): 346-362.doi: 10.7544/issn1000-1239.2020.20190455

Previous Articles     Next Articles

Survey on Privacy-Preserving Machine Learning

Liu Junxu and Meng Xiaofeng   

  1. (College of Information, Renmin University of China, Beijing 100872)
  • Online:2020-02-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (91646203, 61532010, 91846204, 61532016, 61762082) and the National Key Research and Development Program of China (2016YFB1000602, 2016YFB1000603).

Abstract: Large-scale data collection has vastly improved the performance of machine learning, and achieved a win-win situation for both economic and social benefits, while personal privacy preservation is facing new and greater risks and crises. In this paper, we summarize the privacy issues in machine learning and the existing work on privacy-preserving machine learning. We respectively discuss two settings of the model training process—centralized learning and federated learning. The former needs to collect all the user data before training. Although this setting is easy to deploy, it still exists enormous privacy and security hidden troubles. The latter achieves that massive devices can collaborate to train a global model while keeping their data in local. As it is currently in the early stage of the study, it also has many problems to be solved. The existing work on privacy-preserving techniques can be concluded into two main clues—the encryption method including homomorphic encryption and secure multi-party computing and the perturbation method represented by differential privacy, each having its advantages and disadvantages. In this paper, we first focus on the design of differentially-private machine learning algorithm, especially under centralized setting, and discuss the differences between traditional machine learning models and deep learning models. Then, we summarize the problems existing in the current federated learning study. Finally, we propose the main challenges in the future work and point out the connection among privacy protection, model interpretation and data transparency.

Key words: privacy-preserving, differential privacy, machine learning, deep learning, federated learning

CLC Number: