Abstract:
In many areas, a lot of data have been modeled by graphs which are subject to uncertainties, such as molecular compounds and protein interaction networks. While many real applications, for example, collaborative filtering, fraud detection, and link prediction in social networks etc, rely on efficiently answering k-nearest neighbor queries (kNN), which is the problem of computing the most “similar” k nodes to a given query node. To solve the problem, in this paper a novel method based on measurement of SimRank is proposed. However, because graphs evolve over time and are uncertainly, the computing cost can be very high in practice to solve the problem using the existing algorithms of SimRank. So the paper presents an optimization algorithm. Introducing path threshold, which is suitable in both determined graph and uncertain graph,the algorithm merely considers the local neighborhood of a given query node instead of whole graph to prune the search space. To further improving efficiency, the algorithm adopts sample technology in uncertain graph. At the same time, theory and experiments interpret and verify that the optimization algorithm is efficient and effective.