Abstract:
Privacy-preserving micro-data publishing for clustering is an important issue in data mining research, which aims at protecting privacy of individual data meanwhile accommodating enough clustering usability of the published data. Different from traditional distance-preserving and distribution-preserving solutions, a data perturbation method RWSGA (roulette wheel selection genetic algorithm) is proposed from the view of maintaining neighboring relation stability of the dataset during the obfuscation process in this paper. Roulette-wheel-selection-based genetic methods are adopted to make data obfuscation by building imitating relations between crossing, mutating and data perturbation. Firstly, the solution randomly chooses a pair of data points from the k neighborhood of a data point using roulette wheel strategy. Subsequently, tailored crossing or mutating operations are applied to the selected pair of data points to protect micro-data values from leakage, meanwhile guaranteeing stability of the corresponding k neighborhood. Furthermore,to avoid too large changes originated by mutating operations, an optimization is applied to improve the choice of mutating domain leveraging specifying centers of k nearest neighborhood from data space with higher density. Theoretical analysis and experimental results testify that RWSGA can modify published micro-data values greatly from their original correspondences and keep the clustering difference between the original dataset and the published dataset small.