Abstract:
With the rapid development of data-driven intelligent technologies, large-scale data collection has become a main application scenario of data governance and privacy-preserving. Local differential privacy technology as a mainstream technology has been widely used in companies, such as Google, Apple, and Microsoft. However, this technology has a fatal drawback, which is its poor data utility caused by accumulative noises added to users’ data. To juggle the data privacy and utility, the ESA (encode-shuffle-analyze) framework is proposed. This framework tries adding noises as little as possible while maintaining the same degree of data privacy, which ensures that any user’s sensitive information can be used effectively but cannot be recognized from collected data. Considering the elegant and strict definition of differential privacy in math, the major implementation of the ESA framework is based on differential privacy, named shuffle differential privacy. In the case of the same privacy loss, the data utility of shuffled differential privacy method is O(n\+1/2) higher than that of local differential privacy, closing to the central differential privacy but does not rely on a trusted third party. This paper is a survey about this novel privacy-preserving framework. Based on the popular shuffle differential privacy technology, it analyzes this framework, summarizes the theoretical and technical foundations, and compares different privacy-preserving mechanisms under different statistical issues theoretically and experimentally. Finally, this work proposes the challenges of the ESA, and prospects the implementation of non-differential privacy methods over this framework.