• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xiong Ping, Zhu Tianqing. A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering[J]. Journal of Computer Research and Development, 2012, 49(7): 1545-1552.
Citation: Xiong Ping, Zhu Tianqing. A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering[J]. Journal of Computer Research and Development, 2012, 49(7): 1545-1552.

A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering

More Information
  • Published Date: July 14, 2012
  • Data anonymization is one of the important solutions to preserve privacy in data publishing. The basic concept of data anonymization and the application models are introduced, and the requirements that an anonymized dataset should meet are discussed. To resist the background knowledge attack, a new data anonymization approach based on impurity gain and hierarchical clustering is brought out. The impurity of a cluster is used to measure the randomicity of sensitive attributes, and the clusters' combination process is controlled by the restrictions that the information loss caused by generalization should be minimized and the impurity gain should be maximized. With the method, the anonymization results of a dataset can meet the requirements of k-anonymity model and l-diversity model, meanwhile, the information loss is minimized and the values of the sensitive attributes in each cluster has a uniform distribution. An evaluation method is provided in the experiment section, which compares anonymized dataset with the original one to evaluate the quality by calculating the average information loss and impurity. The experimental results validate the availability of the method.
  • Related Articles

    [1]Yin Yuyu, Wu Guangqiang, Li Youhuizi, Wang Xinyu, Gao Honghao. A Machine Unlearning Method via Feature Constraint and Adaptive Loss Balance[J]. Journal of Computer Research and Development, 2024, 61(10): 2649-2661. DOI: 10.7544/issn1000-1239.202440476
    [2]Fan Ye, Peng Shujuan, Liu Xin, Cui Zhen, Wang Nannan. Cross-Modal Anomaly Detection via Hierarchical Deep Networks and Bi-Quintuple Loss[J]. Journal of Computer Research and Development, 2022, 59(12): 2770-2780. DOI: 10.7544/issn1000-1239.20210729
    [3]Zhang Qiang, Ye Ayong, Ye Guohua, Deng Huina, Chen Aimin. k-Anonymous Data Privacy Protection Mechanism Based on Optimal Clustering[J]. Journal of Computer Research and Development, 2022, 59(7): 1625-1635. DOI: 10.7544/issn1000-1239.20210117
    [4]Wang Jina, Chen Junhua, Gao Jianhua. ECC Multi-Label Code Smell Detection Method Based on Ranking Loss[J]. Journal of Computer Research and Development, 2021, 58(1): 178-188. DOI: 10.7544/issn1000-1239.2021.20190836
    [5]Quan Zhenzhen, Chen Songcan. Convex Clustering Combined with Weakly-Supervised Information[J]. Journal of Computer Research and Development, 2017, 54(8): 1763-1771. DOI: 10.7544/issn1000-1239.2017.20170345
    [6]Lou Zhengzheng, Ye Yangdong, and Liu Ruina. Non-Redundant Multi-View Clustering Based on Information Bottleneck[J]. Journal of Computer Research and Development, 2013, 50(9): 1865-1875.
    [7]Liu Ming, Liu Bingquan, and Liu Yuanchao. A Fast Clustering Algorithm for Information Retrieval[J]. Journal of Computer Research and Development, 2013, 50(7): 1452-1463.
    [8]Song Jinling, Liu Guohua, Huang Liming, Zhu Caiyun. Algorithms to Find the Set of Relevant Views and Quasi-Identifiers for K-Anonymity Method[J]. Journal of Computer Research and Development, 2009, 46(1): 77-88.
    [9]Zhang Gang, Liu Yue, Guo Jiafeng, and Cheng Xueqi. A Hierarchical Search Result Clustering Method[J]. Journal of Computer Research and Development, 2008, 45(3): 542-547.
    [10]Ding Shifei, Shi Zhongzhi, Jin Fengxiang, Xia Shixiong. A Direct Clustering Algorithm Based on Generalized Information Distance[J]. Journal of Computer Research and Development, 2007, 44(4): 674-679.

Catalog

    Article views (812) PDF downloads (499) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return