ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (7): 1383-1395.doi: 10.7544/issn1000-1239.2019.20180641

• 人工智能 • 上一篇    下一篇

一种基于集成学习的科研合作者潜力预测分类方法

艾科,马国帅,杨凯凯,钱宇华   

  1. (山西大学大数据科学与产业研究院 太原 030006) (计算智能与中文信息处理教育部重点实验室(山西大学) 太原 030006) (山西大学计算机与信息技术学院 太原 030006) (aike0229@163.com)
  • 出版日期: 2019-07-01
  • 基金资助: 
    国家自然科学基金项目(61672332,61432011,U1435212);山西省海外归国人员研究项目(2017023)

A Classification Method of Scientific Collaborator Potential Prediction Based on Ensemble Learning

Ai Ke, Ma Guoshuai, Yang Kaikai, Qian Yuhua   

  1. (Institute of Big Data Science and Industry, Shanxi University, Taiyuan 030006) (Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education, Taiyuan 030006) (School of Computer and Information Technology, Shanxi University, Taiyuan 030006)
  • Online: 2019-07-01

摘要: 科研合作是学术成果非常重要的实现形式,很多高水平的研究成果通过合作实现.研究合作潜力可以为学者选择合作者提供指导,最大化科研效率.然而当前大数据爆发阻碍了合作者的有效选择.为了解决这个问题,基于学者-文章大数据,经过特征分析和优化,综合考虑学者的文章、机构、研究兴趣等个人属性和相关属性,分别从文章标题、文章等级、文章数量、时间及署名序多维度构造样本特征,以文章所发表的期刊会议等级作为合作者序列对的样本标签,表示当前合作者的潜力高低,利用集成方法的强学习特性,提出了基于集成学习分类方法的科研合作者潜力预测模型.分析并构造对应于科研合作者潜力预测问题的特征集后,采用分类方法解决这一问题.实验中准确率、召回率、F1分数都远高于传统机器学习方法,并能以较少的样本和时间收敛于较高值(80%以上),说明了模型的优越性.

关键词: 科研合作, 潜力预测, 特征构造, 学术大数据, 集成学习

Abstract: Scientific cooperation is a very important form of academic achievement. Many high-level researches are achieved through cooperation. Researching the collaboration potential can provide guidance for scholars to choose collaborators and maximize the efficiency of scientific research. However, the current outbursts of big data have hindered the effective choice of collaborators. In order to solve the problem, based on scholar-paper big data, after features analysis and optimization and comprehensively considering individual attributes and related attributes of scholars' papers, institutions, research interests, etc., sample features from various dimensions such as paper title, paper rank, paper number, time and coauthor order are constructed. Taking journal or conference level of papers as the sample tags of collaborators sequence pairs, which indicates the potential of current cooperators and make use of the strong learning characteristics of the ensemble methods, a scientific collaborator potential prediction model based on ensemble learning classification method is proposed. After analyzing and constructing the feature set that corresponds to the problem of scientific collaborator potential prediction, classification method is adopted to solve the problem. In experiments, the accuracy, recall rate, and F1 score are much higher than those of traditional machine learning methods and can converge to high values (above 80%) with few samples and little time, indicating the superiority of the proposed model.

Key words: scientific cooperation, potential prediction, feature construction, big scholar data, ensemble learning

中图分类号: