Abstract:
Privacy has become a more and more serious concern in applications involving micro-data. Recently, privacy-preserving data publishing has attracted much research work. Most of the present methods focus on categorical data publishing, and the potential applications are mainly for aggregate querying, frequent pattern mining and classification. Concerning the problem of publishing numerical data for clustering analysis, definitions of individual data record and common data record are introduced by making density analysis within the neighborhood of a given record, which can describe the effect of each data record on maintaining clustering usability. Furthermore, positive neighborhood and negative neighborhood are designed for individual data record respectively. Based on the above definitions, a data obfuscating method NeSDO is proposed, which realizes privacy-preserving data publishing by substituting primitive micro-data values with synthetic statistical values of some suitable data subset. For an individual data record, average value of records in its negative neighborhood(or positive neighborhood) is adopted to substitute corresponding items of this record. For a common data record, average value of records in its k nearest neighborhood is adopted vice versa. Theoretical analysis and experimental results indicate that the algorithm NeSDO is effective and can preserve privacy of the sensitive data well meanwhile maintaining better clustering usability.