A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering

Xiong Ping; Zhu Tianqing

Xiong Ping, Zhu Tianqing. A Data Anonymization Approach Based on Impurity Gain and Hierarchical ClusteringJ. Journal of Computer Research and Development, 2012, 49(7): 1545-1552.

Citation:

Xiong Ping, Zhu Tianqing. A Data Anonymization Approach Based on Impurity Gain and Hierarchical ClusteringJ. Journal of Computer Research and Development, 2012, 49(7): 1545-1552.

Citation:

Xiong Ping, Zhu Tianqing. A Data Anonymization Approach Based on Impurity Gain and Hierarchical ClusteringJ. Journal of Computer Research and Development, 2012, 49(7): 1545-1552.

A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering

Graphical Abstract

Abstract

Abstract

Data anonymization is one of the important solutions to preserve privacy in data publishing. The basic concept of data anonymization and the application models are introduced, and the requirements that an anonymized dataset should meet are discussed. To resist the background knowledge attack, a new data anonymization approach based on impurity gain and hierarchical clustering is brought out. The impurity of a cluster is used to measure the randomicity of sensitive attributes, and the clusters' combination process is controlled by the restrictions that the information loss caused by generalization should be minimized and the impurity gain should be maximized. With the method, the anonymization results of a dataset can meet the requirements of k-anonymity model and l-diversity model, meanwhile, the information loss is minimized and the values of the sensitive attributes in each cluster has a uniform distribution. An evaluation method is provided in the experiment section, which compares anonymized dataset with the original one to evaluate the quality by calculating the average information loss and impurity. The experimental results validate the availability of the method.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

A Data Anonymization Approach Based on Impurity Gain and Hierarchical Clustering

Abstract

Catalog

Export File

Citation

Format

Content