Abstract:
Privacy preserving micro-data publishing is a hot issue in data privacy preserving research. Data perturbation is one of those methods to solve this problem, which does some revision to primitive data values at the cost of little mining accuracy loss. The key is the balance between privacy preserving and mining accuracy, which contradict each other to some extent. Concerning the problem of privacy preserving clustering, a novel privacy preserving data perturbation algorithm NETPA is proposed. The potential relation between data object and it’s neighborhood is analyzed. Referring the idea of entropy in information theory, the definitions of neighborhood entropy of attribute and neighboring main attribute are proposed. The primitive data set can be perturbed by changing each data object’s values of neighboring main attributes with corresponding attribute average value of those data objects in its k nearest neighborhood. Theoretical analysis testifies that this perturbation strategy can maintain the stability of k nearest neighboring relations in primitive data well, meanwhile it can avoid privacy leakage effectively. Experimental analysis is designed by adopting clustering algorithm DBCSAN and k-LDCHD on primitive datasets and perturbed ones by NETPA. Experimental results on both realistic and synthetic datasets prove that NETPA can preserve the privacy of primitive data effectively and maintain the clustering model of primitive data well.