Abstract:
Privacy preserving classification mining is one of the fast-growing sub-areas of data mining. How to perturb original data and then build a decision tree based on perturbed data is the key research challenge. By applying transition probability matrix a novel privacy preserving classification mining algorithm is proposed, which suits non-char type data (Boolean, categorical, and numeric type) and non-uniform probability distribution of original data, and can perturb label attribute. Experimental results demonstrate that the decision tree built using this algorithm on perturbed data has a classifying accuracy comparable to that of the decision tree built using un-privacy-preserving algorithm on original data.