基于隐私保护的分类挖掘

Privacy Preserving Classification Mining

摘要: 基于隐私保护的分类挖掘是近年来数据挖掘领域的热点之一，如何对原始真实数据进行变换，然后在变换后的数据集上构造判定树是研究的重点.基于转移概率矩阵提出了一个新颖的基于隐私保护的分类挖掘算法，可以适用于非字符型数据(布尔类型、分类类型和数字类型)和非均匀分布的原始数据，可以变换标签属性.实验表明该算法在变换后的数据集上构造的分类树具有较高的精度.

Abstract: Privacy preserving classification mining is one of the fast-growing sub-areas of data mining. How to perturb original data and then build a decision tree based on perturbed data is the key research challenge. By applying transition probability matrix a novel privacy preserving classification mining algorithm is proposed, which suits non-char type data (Boolean, categorical, and numeric type) and non-uniform probability distribution of original data, and can perturb label attribute. Experimental results demonstrate that the decision tree built using this algorithm on perturbed data has a classifying accuracy comparable to that of the decision tree built using un-privacy-preserving algorithm on original data.