• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Pan Xudong, Zhang Mi, Yan Yifan, Lu Yifan, Yang Min. Evaluating Privacy Risks of Deep Learning Based General-Purpose Language Models[J]. Journal of Computer Research and Development, 2021, 58(5): 1092-1105. DOI: 10.7544/issn1000-1239.2021.20200908
Citation: Pan Xudong, Zhang Mi, Yan Yifan, Lu Yifan, Yang Min. Evaluating Privacy Risks of Deep Learning Based General-Purpose Language Models[J]. Journal of Computer Research and Development, 2021, 58(5): 1092-1105. DOI: 10.7544/issn1000-1239.2021.20200908

Evaluating Privacy Risks of Deep Learning Based General-Purpose Language Models

Funds: This work was supported by the National Natural Science Foundation of China (61972099, U1636204, U1836213, U1836210, U1736208) and the Natural Science Foundation of Shanghai (19ZR1404800).
More Information
  • Published Date: April 30, 2021
  • Recently, a variety of Transformer-based GPLMs (general-purpose language models), including Google’s BERT (bidirectional encoder representation from transformers), are proposed in NLP (natural language processing). GPLMs help achieve state-of-the-art performance on a wide range of NLP tasks, and are applied in industrial applications. Despite their generality and promising performance, a recent research work first shows that an attacker, who has access to the textual embeddings produced by GPLMs, can infer whether the original text contains a specific keyword with high accuracy. However, the previous work has the following limitations. First, they only consider the occurrence of one sensitive word as the sensitive information to steal, which is still far from a threatening privacy violation. Besides, their attack requires several rather strict security assumptions on the attacker’s capability, e.g., the attacker knows which GPLM produces the victim’s textual embeddings. Moreover, they only consider the GPLMs designed for English texts. To address the aforementioned limitations and serve as a complement to their work, this paper proposes a more comprehensive privacy theft chain which is designed to explore whether there are even more privacy risks in general-purpose language models. Via experiments on 13 commercial GPLMs, we empirically show that an attacker can step by step infer the GPLM type behind the textual embedding with near 100% accuracy, then infer the textual length with over 70% on average and finally probe sensitive words that possibly occur in the original text, which brings useful information for the attacker to finally reconstruct the sensitive semantics. Besides, this paper also evaluates the privacy risks of three typical general-purpose language models in Chinese. The results confirm that privacy risks also exist in Chinese general-purpose language models, which calls for mitigation studies in the future.
  • Related Articles

    [1]Deng Xinguo, Zhang Xinhong, Chen Jiarui, Liu Qinghai, Chen Chuandong. A Weighted Directed Graph-Based Algorithm for Group Routing in Printed Circuit Boards[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440069
    [2]Wang Houzhen, Qin Wanying, Liu Qin, Yu Chunwu, Shen Zhidong. Identity Based Group Key Distribution Scheme[J]. Journal of Computer Research and Development, 2023, 60(10): 2203-2217. DOI: 10.7544/issn1000-1239.202330457
    [3]Zhang Qikun, Gan Yong, Wang Ruifang, Zheng Jiamin, Tan Yu’an. Inter-Cluster Asymmetric Group Key Agreement[J]. Journal of Computer Research and Development, 2018, 55(12): 2651-2663. DOI: 10.7544/issn1000-1239.2018.20170651
    [4]Wang Haiyan, Xiao Yikang. Dynamic Group Discovery Based on Density Peaks Clustering[J]. Journal of Computer Research and Development, 2018, 55(2): 391-399. DOI: 10.7544/issn1000-1239.2018.20160928
    [5]Wang Haiyan, Dong Maowei. Latent Group Recommendation Based on Dynamic Probabilistic Matrix Factorization Model Integrated with CNN[J]. Journal of Computer Research and Development, 2017, 54(8): 1853-1863. DOI: 10.7544/issn1000-1239.2017.20170344
    [6]Li Xuefeng, Zhang Junwei, Ma Jianfeng, Liu Hai. TSNP: A Novel PCLSecure and Efficient Group Authentication Protocol in Space Information Network[J]. Journal of Computer Research and Development, 2016, 53(10): 2376-2392. DOI: 10.7544/issn1000-1239.2016.20160453
    [7]Meng Fei, Lan Julong, Hu Yuxiang. A Cooperative Game Based Data Center Backbone Network Bandwidth Allocation Policy[J]. Journal of Computer Research and Development, 2016, 53(6): 1306-1313. DOI: 10.7544/issn1000-1239.2016.20148400
    [8]Zhang Qikun, Wang Ruifang, Tan Yu'an. Identity-Based Authenticated Asymmetric Group Key Agreement[J]. Journal of Computer Research and Development, 2014, 51(8): 1727-1738. DOI: 10.7544/issn1000-1239.2014.20121165
    [9]Wang Feng, Zhou Yousheng, Gu Lize, Yang Yixian. A Multi-Policies Threshold Signature Scheme with Group Verifiability[J]. Journal of Computer Research and Development, 2012, 49(3): 499-505.
    [10]Li Shaofang, Hu Shanli, Shi Chunyi. An Anytime Coalition Structure Generation Based on the Grouping Idea of Cardinality Structure[J]. Journal of Computer Research and Development, 2011, 48(11): 2047-2054.
  • Cited by

    Periodical cited type(3)

    1. 潘佳,于秀兰. 基于社交意识和支付激励的D2D协作传输策略. 计算机应用研究. 2023(06): 1801-1805 .
    2. 刘琳岚,谭镇阳,舒坚. 基于图神经网络的机会网络节点重要度评估方法. 计算机研究与发展. 2022(04): 834-851 . 本站查看
    3. 王淳,吴仕荣. 舰船自组织网络数据分发机制研究. 舰船科学技术. 2020(14): 166-168 .

    Other cited types(2)

Catalog

    Article views (1068) PDF downloads (618) Cited by(5)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return