Abstract:
Privacy preservation in data publishing has gained wide concern in databases recently. There are various anonymity models proposed for preserving privacy. The l-diversity is an effective model to preserve individual privacy while publishing data. However, the l-diversity model is suitable for processing categorical sensitive attributes, rather than numerical sensitive attributes, which can not effectively thwart homogeneity attack and background knowledge attack for numerical sensitive attributes. To address this problem, a multi-level l-diversity model based on level distance is proposed especially for numerical sensitive attribute. The main idea of the multi-level l-diversity model is that it divides numerical sensitive values into several levels at first, and then realizes sensitive attribute l-diversity based on these levels and level distance. Instantiations of the multi-level l-diversity model, such as multi-level distinct l-diversity, multi-level l-entropy diversity and multi-level recursive (c,l)-diversity, are introduced. The properties of the multi-level l-diversity model are also analyzed. Based on the properties, an l-incognito algorithm is designed to realize the multi-level l-diversity. Experiments compare the proposed model and the existing l-diversity model in terms of the diversity of anonymity tables. Experimental results show that the anonymity data generated by the l-incognito algorithm on the multi-level l-diversity model have higher sensitive attributes diversity than that on mono-level l-diversity model, so it can resist homogeneity attack and background knowledge attack effectively.