Abstract:
Classification in networked data, which classify entities based on their relationship information, is an important research issue of the data mining field. The previous methods usually assign a class to a node based on the classes of its neighbor nodes. These methods have high performance of classification in the networks with high. However, there are many networks with low homophily in the real world. In the networks with low homophily, there are a majority of connected nodes whose classes are different from each other. The previous methods cannot assign the correct classes to the nodes in such networks. Therefore, a novel method of classification in networked data is proposed in this paper. The main idea of the proposed method is to build a new generative model for networks, in which the edges of networks are observed variables and the classes of the nodes whose classes are unknown are latent variables. The values of latent variables can be calculated by fitting the generative model to the network. Consequently, the classes of the nodes whose classes are unknown are obtained. Experimental results on the real datasets show that the proposed method can provide better performance than the previous methods in the networks with low homophily.