Research on Privacy Auditing in Data Governance

Xu Jingnan; Wang Leixia; Meng Xiaofeng

doi:10.7544/issn1000-1239.202540530

Journal of Computer Research and Development > 2025 > Accepted Manuscript > DOI: 10.7544/issn1000-1239.202540530 CSTR: 32373.14.issn1000-1239.202540530

Xu Jingnan, Wang Leixia, Meng Xiaofeng. Research on Privacy Auditing in Data Governance[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202540530

Citation:

Xu Jingnan, Wang Leixia, Meng Xiaofeng. Research on Privacy Auditing in Data Governance[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202540530

Citation:

Xu Jingnan, Wang Leixia, Meng Xiaofeng. Research on Privacy Auditing in Data Governance[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202540530

PDF (1572 KB)

Research on Privacy Auditing in Data Governance

School of Information, Renmin University of China, Beijing 100872

Funds: This work was supported by the National Natural Science Foundation of China (62172423).

More Information

Author Bio:
Xu Jingnan: born in 1997. PhD candidiate. Her main research interests include differential privacy and privacy auditing

Wang Leixia: born in 1994. PhD candidate. Her main research interests include secure data collection, differential privacy and its application

Meng Xiaofeng: born in 1964. Professor and PhD supervisor. Fellow of CCF. His main research interests include cloud data management, web data management, privacy preserving, etc.(xfmeng@ruc.edu.cn)
Received Date: June 16, 2024
Revised Date: February 10, 2025
Accepted Date: March 02, 2025
Available Online: March 02, 2025

Graphical Abstract

Abstract

Abstract

Privacy auditing is a crucial issue of data governance, aiming to detect whether data privacy has been protected effectively. Typically, scholars would protect personal private data to meet differential privacy guarantees by perturbing data or adding noise to them. Especially in scenarios of machine learning, an increasing number of differential privacy algorithms have emerged, claiming a relatively stringent level of privacy protection. Although rigorous mathematical proofs of privacy have been conducted before the algorithms’ release, the actual effect on privacy in practice is hardly assured. Due to the complexity of the theory of differential privacy, the correctness of their proofs may not have been thoroughly examined, and imperceptible errors may occur during programming. All of these can undermine the extent of privacy protection to the claimed degree, leaking additional privacy. To tackle this issue, privacy auditing for differential privacy algorithms has emerged. This technique aims to obtain the actual degree of privacy-preserving of differential privacy algorithms, facilitating the discovery of mistakes and improving existing differential privacy algorithms. This paper surveys the scenarios and methods of privacy auditing, summarizing the methods from three aspects―data construction, data measurement, and result quantification, and evaluating them through experiments. Finally, this work presents the challenges of privacy auditing and its future direction.
- data governance,
- privacy preserving,
- machine learning,
- privacy auditing,
- differential privacy

FullText(HTML)

References (77)

References

[1]	Archer D W, Pigem B B, Bogdanov D, et al. UN handbook on privacy-preserving computation techniques[J]. arXiv preprint, arXiv: 2301.06167, 2023
[2]	Dwork C. Differential privacy: A survey of results [C/OL] //Proc of the 5th Int Conf on Theory and Applications of Models of Computation, Berlin: Springer , 2008[2024-01-24]. https://web.cs.ucdavis.edu/~franklin/ecs289/2010/dwork_2008.pdf
[3]	Dwork C, Lei J. Differential privacy and robust statistics[C]//Proc of the 41st Annual ACM Symp on Theory of Computing. New York: ACM, 2009. 371–380
[4]	Smith A. Privacy-preserving statistical estimation with optimal convergence rates[C]//Proc of the 43rd Annual ACM Symp on Theory of Computing. New York: ACM, 2011: 813−822
[5]	Stevens T, Ngong I C, Darais D, et al. Backpropagation clipping for deep learning with differential privacy[J]. arXiv preprint, arXiv: 2202.05089, 2022
[6]	Tramer F, Terzis A, Steinke T, et al. Debugging differential privacy: A case study for privacy auditing[J]. arXiv preprint, arXiv: 2202.12219, 2022
[7]	Baldoni R, Coppa E, D’elia D C, et al. A survey of symbolic execution techniques[J]. ACM Computing Surveys (CSUR), 2018, 51(3): 1−39
[8]	陈若曦,金海波,陈晋音,等. 面向深度学习模型的可靠性测试综述[J]. 信息安全学报,2024,9(1):33−55 Chen Ruoxi, Jin Haibo, Chen Jinyin, et al. Deep learning testing for reliability: A survey[J]. Journal of Cyber Security, 2024, 9(1): 33−55(in Chinese)
[9]	Barthe G, Gaboardi M, Arias E J G, et al. Proving differential privacy in Hoare logic[C]//Proc of the 27th Computer Security Foundations Symp. Piscataway, NJ: IEEE, 2014: 411−424
[10]	Barthe G, Danezis G, Grégoire B, et al. Verified computational differential privacy with applications to smart metering[C]// Proc of the 26th Computer Security Foundations Symposium. Piscataway, NJ: IEEE, 2013: 287−30
[11]	Barthe G, Fong N, Gaboardi M, et al. Advanced probabilistic couplings for differential privacy [C] //Proc of the 23rd ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2016: 55−67
[12]	Barthe G, Gaboardi M, Grégoire B, et al. Proving differential privacy via probabilistic couplings[C]//Proc of the 31st Annual ACM/IEEE Symp on Logic in Computer Science. New York: ACM, 2016: 749−758
[13]	Barthe G, Köpf B, Olmedo F, et al. Probabilistic relational reasoning for differential privacy[C]//Proc of the 39th Annual ACM SIGPLAN-SIGACT Symp on Principles of Programming Languages. New York: ACM, 2012: 97−110
[14]	Barthe G, Olmedo F. Beyond differential privacy: Composition theorems and relational logic for f-divergences between probabilistic programs[C]//Proc of the 40th Int Colloquium on Automata, Languages, and Programming. Berlin: Springer, 2013: 49−60
[15]	Gaboardi M, Haeberlen A, Hsu J, et al. Linear dependent types for differential privacy[C]//Proc of the 40th Annual ACM SIGPLAN-SIGACT Symp on Principles of Programming Languages. New York: ACM, 2013: 357−370
[16]	Ding Zeyu, Wang Yuxin, Wang Guanhong, et al. Detecting violations of differential privacy[C]//Proc of the 25th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2018: 475−489
[17]	Bichsel B, Gehr T, Drachsler-Cohen D, et al. Dp-finder: Finding differential privacy violations by sampling and optimization[C]//Proc of the 25th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2018: 508−524
[18]	McSherry F D. Privacy integrated queries: an extensible platform for privacy-preserving data analysis[C]//Proc of the 34th ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2009: 19−30
[19]	Roy I, Setty S T V, Kilzer A, et al. Airavat: Security and privacy for MapReduce[C]// Proc of the 7th USENIX Conf on Networked Systems Design and Implementation. New York: ACM, 2010: 297−312
[20]	Mohan P, Thakurta A, Shi E, et al. GUPT: Privacy preserving data analysis made easy[C]// Proc of the 12th ACM SIGMOD Int Conf on Management of Data. New York: ACM, 2012: 349−360
[21]	Abadi M , Chu A, Goodfellow I , et al. Deep learning with differential privacy[C]// Proc of the 23rd ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2016: 308–318
[22]	McMahan H B, Daniel R, Kunal T, et al. Learning differentially private recurrent language models. [J]. arXiv preprint, arXiv: 1710.06963, 2017
[23]	叶青青,孟小峰,朱敏杰,等. 本地化差分隐私研究综述[J]. 软件学报,2018,29(7):159−183 Ye Qingqing, Meng Xiaofeng, Zhu Minjie, et al. Survey on local differential privacy[J]. Journal of Software, 2018, 29(7): 159−183 (in Chinese)
[24]	McMahan B, Moore E, Ramage D, et al. Communication-efficient learning of deep networks from decentralized data [C]// Proc of the 20th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2017: 1273−1282
[25]	Wang Tianhao, Blocki J, Li N, et al. Locally differentially private protocols for frequency estimation[C]// Proc of the 26th USENIX Conf on Security Symp. Berkeley, CA: USENIX Association, 2017: 729−745
[26]	Shokri R, Stronati M, Song C, et al. Membership inference attacks against machine learning models[C]// Proc of the 37th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2017: 3−18
[27]	Dong Jinshuo, Aaron R, Su W J, Gaussian differential privacy. [J]Journal of the Royal Statistical Society Series B : Statistical Methodology 2022: 84(1): 3−37
[28]	Pillutla K, Andrew G, Kairouz P, et al. Unleashing the power of randomization in auditing differentially private ML[C]//Proc of the 36th Advances in Neural Information Processing Systems. New York: Curran Associates, 2023: 66201−66238
[29]	Domingo-Enrich C, Mroueh Y. Auditing differential privacy in high dimensions with the kernel quantum R\'enyi divergence[J]. arXiv preprint, arXiv: 2205.13941, 2022
[30]	Jagielski M, Ullman J, Oprea A. Auditing differentially private machine learning: How private is private sgd?[C]//Proc of the 33rd Int Conf on Neural Information Processing Systems. New York: Curran Associates, 2020: 22205−22216
[31]	Lu Fred, Munoz J, Fuchs M, et al. A general framework for auditing differentially private machine learning[C]//Proc of the 35th Int Conf on Neural Information Processing Systems. New York: Curran Associates, 2022: 4165−4176
[32]	Chaudhuri K, Monteleoni C. Privacy-preserving logistic regression[C]//Proc of the 21st Advances in Neural Information Processing Systems. New York: Curran Associates, 2008, 21: 289−296
[33]	Chaudhuri K, Monteleoni C, Sarwate A D. Differentially private empirical risk minimization[J]. Journal of Machine Learning Research, 2011, 12(3): 1069−1109
[34]	Vaidya J, Shafiq B, Basu A, et al. Differentially private naive bayes classification[C]//Proc of the 10th IEEE/WIC/ACM Int Joint Conf on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). Piscataway, NJ: IEEE, 2013: 571−576
[35]	Fletcher S, Islam M Z. Differentially private random decision forests using smooth sensitivity[J]. Expert Systems with Applications, 2017, 78: 16−31 doi: 10.1016/j.eswa.2017.01.034
[36]	Nasr M, Songi S, Thakurta A, et al. Adversary instantiation: Lower bounds for differentially private machine learning[C]//Proc of the 42nd IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2021: 866−882
[37]	Matsumoto M, Takahashi T, Liew S P, et al. Measuring lower bounds of local differential privacy via adversary instantiations in federated learning. [J]arXiv preprint, arXiv: 2206.09122, 2022
[38]	Erlingsson Ú, Feldman V, Mironov I, et al. Encode, shuffle, analyze privacy revisited: Formalizations and empirical evaluation[J]. arXiv preprint, arXiv: 2001.03618, 2020
[39]	Ian J G, Jonathon S, Christian S. Explaining and harnessing adversarial examples[J]. arXiv preprint, arXiv: 1412.6572, 2014
[40]	Arcolezi H H, Gambs S. Revealing the true cost of local privacy: An auditing perspective[J]. arXiv preprint, arXiv: 2309.01597, 2023
[41]	Kairouz P, Bonawitz K, Ramage D. Discrete distribution estimation under local privacy[C]// Proc of the 33rd Int Conf on Int Conf on Machine Learning. New York: PMLR, 2016: 2436−2444
[42]	Wang Shaowei, Huang Liusheng, Wang Pengzhan, et al. Mutual information optimally local private discrete distribution estimation[J]. arXiv preprint, arXiv: 1607.08025, 2016
[43]	Ye Min, Barg A. Optimal schemes for discrete distribution estimation under locally differential privacy[J]. IEEE Transaction on Infomation Theory, 2018, 64(8): 5662−5676 doi: 10.1109/TIT.2018.2809790
[44]	Erlingsson Ú, Pihur V, Korolova A. Rappor: Randomized aggregatable privacy-preserving ordinal response[C]//Proc of the 21st ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2014: 1054−1067
[45]	Bassily R, Smith A. Local, private, efficient protocols for succinct histograms[C]//Proc of the 47th Annual ACM Symp on Theory of Computing. New York: ACM, 2015: 127−135
[46]	Maddock S, Sablayrolles A, Stock P. CANIFE: Crafting canaries for empirical privacy measurement in federated learning[J]. arXiv preprint, arXiv: 2210.02912, 2022
[47]	Steinke T, Nasr M, Jagielski M. Privacy auditing with one (1) training run[C]//Proc of the 36th Advances in Neural Information Processing Systems. New York: Curran Associates, 2023: 49268−49280
[48]	Andrew G, Kairouz P, Oh S, et al. One-shot empirical privacy estimation for federated learning[J]. arXiv preprint, arXiv: 2302.03098, 2023
[49]	Nasr M, Hayes J, Steinke T, et al. Tight auditing of differentially private machine learning[C]// Proc of the 32nd USENIX Conf on Security Symp. Berkeley, CA: USENIX Association, 2023: 1631−1648
[50]	Chadha K, Jagielski M, Papernot N, et al. Auditing private prediction[J]. arXiv preprint, arXiv: 2402.09403, 2024
[51]	Papernot N, Abadi M, Erlingsson U, et al. Semi-supervised knowledge transfer for deep learning from private training data[J]. arXiv preprint, arXiv: 1610.05755, 2016
[52]	Papernot N, Song Shuang, Mironov I, et al. Scalable private learning with pate[J]. arXiv preprint, arXiv: 1802.08908, 2018
[53]	Choquette-Choo C A, Dullerud N, Dziedzic A, et al. Capc learning: Confidential and private collaborative learning[J]. arXiv preprint, arXiv: 2102.05188, 2021
[54]	Duan H, Dziedzic A, Papernot N, et al. Flocks of stochastic parrots: Differentially private prompt learning for large language models[C]//Proc of the 36th Advances in Neural Information Processing Systems. New York: Curran Associates, 2024: 76852−76871
[55]	Zhu Yuqing, Yu Xiang, Chandraker M, et al. Private-kNN: Practical differential privacy for computer vision[C]//Proc of the 43rd IEEE/CVF Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 11854−11862
[56]	Dwork C, Feldman V. Privacy-preserving prediction[C] //Proc of the 31st Conf on Learning Theory. New York: Curran Associates, 2018: 1693−1702
[57]	Mironov I. Rényi differential privacy[C]// Proc of the 30th IEEE Computer Security Foundations Symp (CSF). Piscataway, NJ: IEEE, 2017: 263−275
[58]	Bernau D, Eibl G, Grassal P W, et al. Quantifying identifiability to choose and audit $\varepsilon$ in differentially private deep learning[C]//Proceedings of the VLDB Endowment, 2022, 14(13): 3335–3347
[59]	Lee J, Clifton C. How much is enough? Choosing ε for differential privacy[C]//Proc of the 14th Int Conf on Information Security, Berlin: Springer, 2011: 325−340
[60]	Yeom S, Giacomelli I, Fredrikson M, et al. Privacy risk in machine learning: Analyzing the connection to overfitting[C]// Proc of the 31st IEEE Computer Security Foundations Symp (CSF). Piscataway, NJ: IEEE, 2018: 268−282
[61]	Zanella-Béguelin S, Wutschitz L, Tople S, et al. Bayesian estimation of differential privacy[C]// Proc of the 40th Int Conf on Machine Learning. New York: PMLR, 2023: 40624−40636
[62]	Krizhevsky A , Hinton G . Learning multiple layers of features from tiny images[J/OL]. Handbook of Systemic Autoimmune Diseases, 2009, 1(4): [2025-01-23]. https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
[63]	Liu Ziwei, Luo Peng, Wang Xiaogang, et al. Deep learning face attributes in the wild[C]//Proc of the 15th IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2015: 3730−3738
[64]	Go A, Bhayani R, Huang Lei. Twitter sentiment classification using distant supervision[R/OL]. 2009[2025-01-23]. https://www-cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf
[65]	Cun Y L , Boser B , Denker J , et al. Handwritten digit recognition with a backpropogation network[C]//Proc of the 2nd Advances in Neural Information Processing Systems. New York: Curran Associates, 1989: 396−404
[66]	He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition [C] //Proc of the 29th IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
[67]	Zagoruyko S, Nikos K. Wide residual networks[J]. arXiv preprint, arXiv: 1605.07146
[68]	Ye Qingqing , Hu Haibo , Meng Xiaofeng , et al. PrivKV: Key-value data collection with local differential privacy[C]// Proc of the 40th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2019: 317−331
[69]	Ye Qinging , Hu Haibo, Li Ninghui , et al. Beyond value perturbation: local differential privacy in the temporal setting[C/OL]//Proc of the IEEE Conf on Computer Communications. Piscataway, NJ: IEEE, 2021[2025-01-23]. https://drive.google.com/file/d/1ODzRtejolFnKHoZr_cGjQr6E6lMBJeFL/view
[70]	Ghazi B, Golowich N, Kumar R, et al. Deep learning with label differential privacy[C]//Proc of the 34th Advances in Neural Information Processing Systems. New York: Curran Associates, 2021: 27131−27145
[71]	Malek Esmaeili M, Mironov I, Prasad K, et al. Antipodes of label differential privacy: Pate and alibi[C]//Proc of the 34th Advances in Neural Information Processing Systems. New York: Curran Associates, 2021: 6934−6945
[72]	Ghazi B, Kamath P, Kumar R, et al. Regression with label differential privacy[J]. arXiv preprint, arXiv: 2212.06074, 2022
[73]	Busa-Fekete R I, Syed U, Vassilvitskii S. On the pitfalls of label differential privacy[EB/OL]. 2021[2025-01-20]. https://openreview.net/forum?id=2sWidqliCDG
[74]	Brahmbhatt A, Saket R, Havaldar S, et al. Label differential privacy via aggregation[J]. arXiv preprint, arXiv: 2310.10092, 2023
[75]	Esfandiari H, Mirrokni V, Syed U, et al. Label differential privacy via clustering[C/OL]//Proc of the 25th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2022: 7055−7075
[76]	Bitau A, Erlingssonú, Maniatis P, et al. Prochlo: Strong privacy for analytics in the crowd [C]//Proc of the 17th ACM Symp on Operating Systems Principles. New York: ACM, 2017: 441−459
[77]	Wu Ruihan, Zhou Jinpeng, Weinberger K Q, et al. Does label differential privacy prevent label inference attacks?[C]//Proc of the 26th Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2023: 4336−4347