高级检索

    基于数据集蜜点的抗损毁数据标识技术

    A Corruption-resistant Data Identification Technology Based on Dataset Honeypoint

    • 摘要: 数据标识是实现数据精准监管的前提条件,有效地保证了数据要素在跨域流转过程中的安全. 当前已有针对单一数据的标识生成方法,但是随着数据规模的不断扩大,数据层面的数据标识无法直接应用到数据集层面,并且会带来标识“易损毁”和标识“难嵌入”的问题. 为有效解决上述问题,通过沿用方滨兴院士提出的“护卫”模式中网络蜜点的设计理念,借助欺骗防御的思想提出数据跨域流转场景下基于数据集蜜点的抗损毁数据标识技术,设计并形成一套完整的数据集蜜点生成和嵌入方法. 首先,针对数据跨域流转场景设计了数据集蜜点,并通过增强数据集蜜点的隐蔽性和增加数据集蜜点冗余的方式解决标识“易损毁”的问题. 其次,通过保证数据集蜜点形态与真实数据密不可分,解决标识“难嵌入”的问题. 最后,通过在图像和加密文本2个数据模态下进行实验,验证了数据集蜜点具备高抗损毁、高鲁棒和低性能开销的特性.

       

      Abstract: Data identification is a prerequisite for achieving precise data governance, effectively ensuring the security of data elements during cross-domain transfer. Currently, there are methods for generating identifiers for individual data, but as the scale of data continues to expand, identifiers at the data level cannot be directly applied to the dataset level. This also introduces issues of identifiers being “easily damaged” and “difficult to embed”. To effectively address these issues, we adopt the design concept of network honeypoint from the “guardian” model proposed by academician Fang Binxing. Utilizing the idea of deception defense, we propose an anti-damage data identification technology based on dataset honeypoint for cross-domain data transfer scenarios, and design a complete method for generating and embedding dataset honeypoints. First, for cross-domain data transfer scenarios, dataset honeypoints are designed. By enhancing the concealment of dataset honeypoints and increasing their redundancy, the issue of identifiers being “easily damaged” is addressed. Second, by ensuring that the form of dataset honeypoint is indistinguishable from real data, the issue of identifiers being “difficult to embed” is resolved. Finally, experiments conducted on both image and encrypted text data modalities demonstrate that dataset honeypoints possess high anti-damage capability, high robustness, and low performance overhead.

       

    /

    返回文章
    返回