高级检索

    基于赌轮选择遗传算法的数据隐藏发布方法

    A Privacy-Preserving Data Publishing Method Based on Genetic Algorithm with Roulette Wheel

    • 摘要: 面向聚类挖掘的隐私保护微数据发布是数据挖掘研究的新兴热点之一,其目标是通过对微数据数值的修改实现对微数据隐私的保护,同时保证隐藏后数据具有与原数据集相同(或相似)的聚类效果.从保持数据邻域关系稳定的角度,提出一种基于赌轮遗传的干扰方法RWSGA,采用在数据点的k邻域内运用赌轮算子随机选择2个数据点进行交叉或变异的思想实现数据隐藏.进一步,提出从高密度点区域筛选k邻域中心以改进变异操作选择域的优化策略,解决变异操作可能导致数据扰动幅度过大的问题.理论分析和实验结果表明,该方法能较好地对原始数据进行扰动修改以保护数据隐私不泄露,同时保证发布前后的数据聚类结果差异较小.

       

      Abstract: Privacy-preserving micro-data publishing for clustering is an important issue in data mining research, which aims at protecting privacy of individual data meanwhile accommodating enough clustering usability of the published data. Different from traditional distance-preserving and distribution-preserving solutions, a data perturbation method RWSGA (roulette wheel selection genetic algorithm) is proposed from the view of maintaining neighboring relation stability of the dataset during the obfuscation process in this paper. Roulette-wheel-selection-based genetic methods are adopted to make data obfuscation by building imitating relations between crossing, mutating and data perturbation. Firstly, the solution randomly chooses a pair of data points from the k neighborhood of a data point using roulette wheel strategy. Subsequently, tailored crossing or mutating operations are applied to the selected pair of data points to protect micro-data values from leakage, meanwhile guaranteeing stability of the corresponding k neighborhood. Furthermore,to avoid too large changes originated by mutating operations, an optimization is applied to improve the choice of mutating domain leveraging specifying centers of k nearest neighborhood from data space with higher density. Theoretical analysis and experimental results testify that RWSGA can modify published micro-data values greatly from their original correspondences and keep the clustering difference between the original dataset and the published dataset small.

       

    /

    返回文章
    返回