• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Wang Xiujun, Mo Lei, Zheng Xiao, Wei Linna, Dong Jun, Liu Zhi, Guo Longkun. Sampling Based Fast Publishing Algorithm with Differential Privacy for Data Stream[J]. Journal of Computer Research and Development, 2024, 61(10): 2433-2447. DOI: 10.7544/issn1000-1239.202440481
Citation: Wang Xiujun, Mo Lei, Zheng Xiao, Wei Linna, Dong Jun, Liu Zhi, Guo Longkun. Sampling Based Fast Publishing Algorithm with Differential Privacy for Data Stream[J]. Journal of Computer Research and Development, 2024, 61(10): 2433-2447. DOI: 10.7544/issn1000-1239.202440481

Sampling Based Fast Publishing Algorithm with Differential Privacy for Data Stream

Funds: This work was supported by the National Natural Science Foundation of China (62172003, 12271098, 61772005), the Natural Science Foundation of Anhui Province (2108085MF218, 2108085MF217), the Natural Science Research Project of Anhui Educational Committee (2022AH040052), and the Science and Technology Innovation Program of Ma'anshan (2021a120009).
More Information
  • Author Bio:

    Wang Xiujun: born in 1983. PhD, associate professor. His main research interests include data streams and RFID systems

    Mo Lei: born in 1995. PhD candidate. His main research interest includes data stream differential privacy

    Zheng Xiao: born in 1975. PhD, professor. Senior member of CCF. His main research interests include computer network, industrial Internet, cloud computing and service computing, machine learning, and privacy protection

    Wei Linna: born in 1984. PhD. Her main research interests include computer networks and wireless networks

    Dong Jun: born in 1973. PhD, associate research fellow. Member of CCF. His main research interests include machine vision, information networks, and applications of the agricultural Internet of things

    Liu Zhi: born in 1986. PhD, associate professor. Senior member of IEEE. His main research interests include video streaming, mobile edge computing, and wireless networking

    Guo Longkun: born in 1983. PhD, professor. His main research interests include data science and computer networks

  • Received Date: May 30, 2024
  • Revised Date: July 21, 2024
  • Available Online: September 13, 2024
  • Many cloud native database applications need to handle massive data streams. To analyze group trend information in these data streams in real time without compromising individual user privacy, these applications require the capability to quickly create differentially private histograms for the most recent dataset at any given moment. However, existing histogram publishing methods lack efficient data structures, making it difficult to rapidly extract key information to ensure real-time data usability. To address this issue, we deeply analyze the relationship between data sampling and privacy protection, and propose a sampling based fast publishing algorithm with differential privacy for data stream (SPF). SPF introduces an efficient data stream sampling sketch structure (EDS) for the first time, which samples and statistically estimates data within a sliding window and filters out unreasonable data, enabling rapid extraction of key information. Then, we demonstrate that the approximations output by the EDS structure are theoretically equivalent to adding differential privacy noise to the true values. Finally, to meet the privacy protection strength provided by the user while reflecting the true situation of the original data stream, an adaptive noise addition algorithm based on efficient data stream sampling is proposed. According to the relationship between the user-provided privacy protection strength and the privacy protection strength provided by the EDS structure, the algorithm adaptively generates the final publishable histogram through privacy allocation. Experiments show that compared with existing algorithms, SPF significantly reduces time and space overhead while maintaining the same data usability.

  • [1]
    董昊文,张超,李国良,等. 云原生数据库综述[J]. 软件学报,2023,35(2):899−926

    Dong Haowen, Zhang Chao, Li Guoliang, et al. Survey on cloudnative databases[J]. Journal of Software, 2023, 35(2): 899−926 (in Chinese)
    [2]
    Zhao Zhanhao, Pan Hexiang, Chen Gang, et al. VeriTxn: Verifiable transactions for cloud-native databases with storage disaggregation[J]. Proceedings of the ACM on Management of Data, 2023, 1(4): 1−27
    [3]
    Papadogiannaki E, Ioannidis S. A survey on encrypted network traffic analysis applications, techniques, and countermeasures[J]. ACM Computing Surveys, 2021, 54(6): 1−35
    [4]
    Shahraki A, Taherkordi A, Haugen Ø. TONTA: Trend-based online network traffic analysis in ad-hoc IoT networks[J]. Computer Networks, 2021, 194: 108125 doi: 10.1016/j.comnet.2021.108125
    [5]
    Shahid M R, Blanc G, Zhang Z, et al. IoT devices recognition through network traffic analysis[C]//Proc of 2018 IEEE Int Conf on Big Data (Big Data). Piscataway, NJ: IEEE, 2018: 5187−5192
    [6]
    Butilă E V, Boboc R G. Urban traffic monitoring and analysis using unmanned aerial vehicles (UAVs): A systematic literature review[J]. Remote Sensing, 2022, 14(3): 620 doi: 10.3390/rs14030620
    [7]
    Jain N K, Saini R K, Mittal P. A review on traffic monitoring system techniques. Soft Computing: Theories and Applications[C]//Proc of SOCTA 2017, 2019: 569−577
    [8]
    Figueiras P, Herga Z, Guerreiro G, et al. Real-time monitoring of road traffic using data stream mining[C]//Proc of 2018 IEEE Int Conf on Engineering, Technology and Innovation (ICE/ITMC). Piscataway, NJ: IEEE, 2018: 1−8
    [9]
    Fang B, Zhang P. Big data in finance[J]. Big Data Concepts, Theories, and Applications, 2016: 391−412
    [10]
    Fikri N, Rida M, Abghour N, et al. An adaptive and real-time based architecture for financial data integration[J]. Journal of Big Data, 2019, 6: 1−25 doi: 10.1186/s40537-018-0162-3
    [11]
    Thennakoon A, Bhagyani C, Premadasa S, et al. Real-time credit card fraud detection using machine learning[C]//Proc of 2019 9th Int Conf on Cloud Computing, Data Science & Engineering (Confluence). Piscataway, NJ: IEEE, 2019: 488−493
    [12]
    Upadhyay J. Sublinear space private algorithms under the sliding window model[C]//Proc of Int Conf on Machine Learning. New York: PMLR, 2019: 6363−6372
    [13]
    Bassily R, Nissim K, Stemmer U, et al. Practical locally private heavy hitters[J]. The Journal of Machine Learning Research, 2020, 21(1): 535−576
    [14]
    Li F. Cloud-native database systems at Alibaba: Opportunities and challenges[J]. Proceedings of the VLDB Endowment, 2019, 12(12): 2263−2272 doi: 10.14778/3352063.3352141
    [15]
    李海翔,李晓燕,刘畅,等. 数据库管理系统中数据异常体系化定义与分类. 软件学报,2022,33(3):909–930

    Li Haixiang, Li Xiaoyan, Liu Chang, et al. Systematic definition and classification of data anomalies in data base management systems. Journal of Software, 2022, 33(3): 909−930 (in Chinese)
    [16]
    赵泓尧,赵展浩,杨皖晴,等. 内存数据库并发控制算法的实验研究. 软件学报,2022,33(3):867–890

    Zhao Hongyao, Zhao Zhanhao, Yang Wanqing, et al. Experimental study on concurrency control algorithms in in-memory databases. Journal of Software, 2022, 33(3): 867−890 (in Chinese)
    [17]
    Spillner J, Toffetti G, López M R. Cloud-native databases: An application perspective[C]//Advances in Service-Oriented and Cloud Computing: Workshops of ESOCC 2017. Berlin: Springer, 2018: 102−116
    [18]
    Dwork C. Differential privacy[C]//Proc of Int Colloquium on Automata, Languages, and Programming. Berlin: Springer, 2006: 1−12
    [19]
    Shahin V, Zhang Xinyao, Qiu Dongyu. Analysis and optimization of big-data stream processing[C]//Proc of 2016 IEEE Global Communications Conf (GLOBECOM). Piscataway, NJ: IEEE, 2016. Doi: 10.1109/GLOCOM.2016.7841598
    [20]
    Bar-Yossef Z, Jayram T S, Kumar R, et al. Counting distinct elements in a data stream[C]//Proc of Int Workshop on Randomization and Approximation Techniques in Computer Science. Berlin: Springer, 2002: 1−10
    [21]
    Danassis P, Triastcyn A, Faltings B. A distributed differentially private algorithm for resource allocation in unboundedly large settings[J]. arXiv preprint, arXiv: 2011.07934, 2020
    [22]
    Mazmudar M, Humphries T, Liu J, et al. Cache me if you can: Accuracy-aware inference engine for differentially private data exploration[J]. arXiv preprint, arXiv: 2211.15732, 2022
    [23]
    Chen Y, Al-Rubaye S, Tsourdos A, et al. Differentially-private federated intrusion detection via knowledge distillation in third-party IoT systems of smart airports[C]//Proc of IEEE Int Conf on Communications (ICC 2023). Piscataway, NJ: IEEE, 2023: 603−608
    [24]
    Xu Jia, Zhang Zhejie, Xiao Xiaokui, et al. Differentially private histogram publication[J]. The VLDB Journal, 2013, 22(6): 797−822 doi: 10.1007/s00778-013-0309-y
    [25]
    Zhang X, Chen R, Xu J, et al. Towards accurate histogram publication under differential privacy[C]//Proc of the 2014 SIAM Int Conf on Data Mining. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2014: 587−595
    [26]
    张啸剑,邵超,孟小峰. 差分隐私下一种精确直方图发布方法[J]. 计算机研究与发展,2016,53(5):1106−1117 doi: 10.7544/issn1000-1239.2016.20150304

    Zhang Xiaojian, Shao Chao, Meng Xiaofeng. Accurate histogram release under differential privacy[J]. Journal of Computer Research and Development, 2016, 53(5): 1106−1117 (in Chinese) doi: 10.7544/issn1000-1239.2016.20150304
    [27]
    唐海霞,杨庚,白云璐. 自适应差分隐私预算分配策略的直方图发布算法[J]. 计算机应用研究,2020,53(7):1952−1957, 1963

    Tang Haixia, Yang Geng, Bai Yunlu. Histogram publishing algorithm based on adaptive privacy budget allocation strategy under differential privacy[J]. Application Research of Computers, 2020, 53(7): 1952−1957, 1963 (in Chinese)
    [28]
    林富鹏,吴英杰,王一蕾,等. 差分隐私二维数据流统计发布[J]. 计算机应用,2015,35(1):88−92 doi: 10.11772/j.issn.1001-9081.2015.01.0088

    Lin Fupeng, Wu Yingjie, Wang Yilei, et al. Differentially private statistical publication for two-dimensional data stream[J]. Journal of Computer Applications, 2015, 35(1): 88−92 (in Chinese) doi: 10.11772/j.issn.1001-9081.2015.01.0088
    [29]
    张啸剑,孟小峰. 基于差分隐私的流式直方图发布方法[J]. 软件学报,2016,27(2):381−393

    Zhang Xiaojian, Meng Xiaofeng. Streaming histogram publication method with differential privacy[J]. Journal of Software, 2016, 27(2): 381−393 (in Chinese)
    [30]
    Sun L, Ge C, Huang X, et al. Differentially private real-time streaming data publication based on sliding window under exponential decay[J]. Computers, Materials and Continua, 2019, 58(1): 61−78 doi: 10.32604/cmc.2019.03744
    [31]
    Wu X, Tong N, Ye Z, et al. Histogram publishing algorithm based on sampling sorting and greedy clustering[C]//Proc of Int Conf on Blockchain and Trustworthy Systems. Berlin: Springer, 2020: 81−91
    [32]
    Liu X, Liu H. Data publication based on differential privacy In V2G network[J]. International Journal of Electronics Engineering and Applications, 2021, 9(2): 34−44
    [33]
    Cardoso A R, Rogers R. Differentially private histograms under continual observation: Streaming selection into the unknown[C]//Proc of Int Conf on Artificial Intelligence and Statistics. New York: PMLR, 2022: 2397−2419
    [34]
    Cao X, Cao Y, Pappachan P, et al. Differentially private streaming data release under temporal correlations via post-processing[C]//Proc of IFIP Annual Conf on Data and Applications Security and Privacy. Cham: Springer, 2023: 184−200
    [35]
    Lebeda C J, Tetek J. Better differentially private approximate histograms and heavy hitters using the Misra-Gries sketch[C]//Proc of the 42nd ACM SIGMOD-SIGACT-SIGAI Symp on Principles of Database Systems. New York: ACM, 2023: 79−88
    [36]
    Streeter M, Golovin D. An online algorithm for maximizing submodular functions[J]. Advances in Neural Information Processing Systems, 2008, 21
    [37]
    Luo Ziyue, Wu Chuan. An online algorithm for VNF service chain scaling in datacenters[J]. IEEE/ACM Transactions on Networking, 2020, 28(3): 1061−1073 doi: 10.1109/TNET.2020.2979263
    [38]
    Misra J, Gries D. Finding repeated elements[J]. Science of Computer Programming, 1982, 2(2): 143−152 doi: 10.1016/0167-6423(82)90012-0
    [39]
    Chan T H H, Li M, Shi E, et al. Differentially private continual monitoring of heavy hitters from distributed streams[C]//Proc of the 12th Int Symp on Privacy Enhancing Technologies (PETS 2012), Berlin: Springer, 2012: 140−159
  • Related Articles

    [1]Cao Yiran, Zhu Youwen, He Xingyu, Zhang Yue. Utility-Optimized Local Differential Privacy Set-Valued Data Frequency Estimation Mechanism[J]. Journal of Computer Research and Development, 2022, 59(10): 2261-2274. DOI: 10.7544/issn1000-1239.20220504
    [2]Hong Jinxin, Wu Yingjie, Cai Jianping, Sun Lan. Differentially Private High-Dimensional Binary Data Publication via Attribute Segmentation[J]. Journal of Computer Research and Development, 2022, 59(1): 182-196. DOI: 10.7544/issn1000-1239.20200701
    [3]Wu Wanqing, Zhao Yongxin, Wang Qiao, Di Chaofan. A Safe Storage and Release Method of Trajectory Data Satisfying Differential Privacy[J]. Journal of Computer Research and Development, 2021, 58(11): 2430-2443. DOI: 10.7544/issn1000-1239.2021.20210589
    [4]Zhang Yuxuan, Wei Jianghong, Li Ji, Liu Wenfen, Hu Xuexian. Graph Degree Histogram Publication Method with Node-Differential Privacy[J]. Journal of Computer Research and Development, 2019, 56(3): 508-520. DOI: 10.7544/issn1000-1239.2019.20170886
    [5]Zhu Weijun, You Qingguang, Yang Weidong, Zhou Qinglei. Trajectory Privacy Preserving Based on Statistical Differential Privacy[J]. Journal of Computer Research and Development, 2017, 54(12): 2825-2832. DOI: 10.7544/issn1000-1239.2017.20160647
    [6]Wu Yingjie, Zhang Liqun, Kang Jian, Wang Yilei. An Algorithm for Differential Privacy Streaming Data Adaptive Publication[J]. Journal of Computer Research and Development, 2017, 54(12): 2805-2817. DOI: 10.7544/issn1000-1239.2017.20160555
    [7]Wang Liang, Wang Weiping, Meng Dan. Privacy Preserving Data Publishing via Weighted Bayesian Networks[J]. Journal of Computer Research and Development, 2016, 53(10): 2343-2353. DOI: 10.7544/issn1000-1239.2016.20160465
    [8]Lu Guoqing, Zhang Xiaojian, Ding Liping, Li Yanfeng, Liao Xin. Frequent Sequential Pattern Mining under Differential Privacy[J]. Journal of Computer Research and Development, 2015, 52(12): 2789-2801. DOI: 10.7544/issn1000-1239.2015.20140516
    [9]Ouyang Jia, Yin Jian, Liu Shaopeng, Liu Yubao. An Effective Differential Privacy Transaction Data Publication Strategy[J]. Journal of Computer Research and Development, 2014, 51(10): 2195-2205. DOI: 10.7544/issn1000-1239.2014.20130824
    [10]Ni Weiwei, Chen Geng, Chong Zhihong, Wu Yingjie. Privacy-Preserving Data Publication for Clustering[J]. Journal of Computer Research and Development, 2012, 49(5): 1095-1104.
  • Cited by

    Periodical cited type(5)

    1. 张涵,于航,周继威,白云开,赵路坦. 面向隐私计算的可信执行环境综述. 计算机应用. 2025(02): 467-481 .
    2. 付裕,林璟锵,冯登国. 虚拟化与密码技术应用:现状与未来. 密码学报(中英文). 2024(01): 3-21 .
    3. 徐传康,李忠月,刘天宇,种统洪,杨发雪. 基于可信执行环境的汽车域控系统安全研究. 汽车实用技术. 2024(15): 18-25+73 .
    4. 徐文嘉,岑孟杰,陈亮. 隐私保护下单细胞RNA测序数据细胞分类研究. 医学信息学杂志. 2024(10): 86-89 .
    5. 孙钰,熊高剑,刘潇,李燕. 基于可信执行环境的安全推理研究进展. 信息网络安全. 2024(12): 1799-1818 .

    Other cited types(4)

Catalog

    Article views (181) PDF downloads (84) Cited by(9)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return