Abstract:
As the rapid development of information technology represented by Internet, network data exist widely in real world. The classification in network data has become an important research topic in network data mining. Collective classification exploits the dependent relationships between nodes to classify related nodes simultaneously, and obtains higher classification accuracy. It has attracted wide attention from researchers and has been applied in a variety of domains, such as hyperlinked document classification, protein interaction and gene expression data classification, social network analysis and so on. We present an active collective classification method that combines feature selection and link filtering to perform classification. The algorithm first chooses important attributes based on minimum redundancy-maximum relevance feature selection method and constructs implicit links, and then filters original links to obtain explicit links, and finally integrates explicit and implicit links to perform classification. We compare our method with several typical traditional classification methods and several typical collective classification methods, and the results show that our method obtains higher accuracy, especially for sparsely labeled network, its advantage is more obvious.