Abstract:
The emerging technologies about big data enable many organizations to collect massive amount information about individuals. Sharing such a wealth of information presents enormous opportunities for data mining applications, data privacy has been a major barrier.
k-anonymity based on clustering is the most important technique to prevent privacy disclosure in data-sharing, which can overcome the threat of background based attacks and link attacks. Existing anonymity methods achieve the balance with privacy and utility requirements by seeking the optimal
k-equivalence set. However, viewing the results as a whole,
k-equivalent set is not necessarily the optimal solution satisfying
k-anonymity so that the utility optimality is not guaranteed. In this paper, we endeavor to solve this problem by using optimal clustering approach. We follow this idea and propose a greedy clustering-anonymity method by combining the greedy algorithm and dichotomy clustering algorithm. In addition, we formulate the optimal data release problem that minimizes information loss given a privacy constraint. We also establish the functional relationship between data distance and information loss to capture the privacy/accuracy trade-off process in an online way. Finally, we evaluate the mechanism through theoretic analysis and experiments verification. Evaluations using real datasets show that the proposed method can minimize the information loss and be effective in terms of running time.