ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (7): 1383-1395.doi: 10.7544/issn1000-1239.2019.20180641

Previous Articles     Next Articles

A Classification Method of Scientific Collaborator Potential Prediction Based on Ensemble Learning

Ai Ke, Ma Guoshuai, Yang Kaikai, Qian Yuhua   

  1. (Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006) (Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education, Taiyuan 030006) (School of Computer and Information Technology, Shanxi University, Taiyuan 030006)
  • Online:2019-07-01

Abstract: Scientific cooperation is a very important form of academic achievement. Many high-level researches are achieved through cooperation. Researching the collaboration potential can provide guidance for scholars to choose collaborators and maximize the efficiency of scientific research. However, the current outbursts of big data have hindered the effective choice of collaborators. In order to solve the problem, based on scholar-paper big data, after features analysis and optimization and comprehensively considering individual attributes and related attributes of scholars' papers, institutions, research interests, etc., sample features from various dimensions such as paper title, paper rank, paper number, time and coauthor order are constructed. Taking journal or conference level of papers as the sample tags of collaborators sequence pairs, which indicates the potential of current cooperators and make use of the strong learning characteristics of the ensemble methods, a scientific collaborator potential prediction model based on ensemble learning classification method is proposed. After analyzing and constructing the feature set that corresponds to the problem of scientific collaborator potential prediction, classification method is adopted to solve the problem. In experiments, the accuracy, recall rate, and F1 score are much higher than those of traditional machine learning methods and can converge to high values (above 80%) with few samples and little time, indicating the superiority of the proposed model.

Key words: scientific cooperation, potential prediction, feature construction, big scholar data, ensemble learning

CLC Number: