ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (9): 1903-1919.doi: 10.7544/issn1000-1239.2018.20180139

所属专题: 2018优青专题

• 综述 • 上一篇    下一篇



  1. 1(清华大学计算机科学与技术系 北京 100084); 2(伊利诺伊大学厄巴纳-香槟分校计算机科学系 伊利诺伊州厄巴纳-香槟 61801) (
  • 出版日期: 2018-09-01
  • 基金资助: 
    国家自然科学基金优秀青年科学基金项目(61222212);国家自然科学基金项目(61806111);国家“八六三”高技术研究发展计划基金项目(2015AA124102) This work was supported by the National Natural Science Foundation of China for Excellent Young Scientists (61222212), the National Natural Science Foundation of China (61806111), and the National High Technology Research and Development Program of China (863 Program) (2015AA124102).

A Survey on Scholar Profiling Techniques in the Open Internet

Yuan Sha1, Tang Jie1, Gu Xiaotao2   

  1. 1(Department of Computer Science and Technology, Tsinghua University, Beijing 100084); 2(Computer Science Department, University of Illinois at Urbana-Champaign, Urbana-Champain, IL 61801)
  • Online: 2018-09-01

摘要: 开放互联网中的学者画像工作是近年来的研究热点问题.学者画像的目标是提取学者各维度的属性信息进行信息挖掘和分析应用.学者画像技术是大型智库实现专家发现、学术影响力评估等功能的关键.在开放互联网中,学者画像面临数据量大、数据噪音和数据冗余等新挑战.这使得传统的用户画像理论、模型和方法无法直接无缝地移植到开放互联网环境下的用户画像系统中.针对这些挑战,对现有学者画像技术进行了总结和分类,为进一步的研究工作提供参考.首先分析了学者画像问题,对学者画像的基础理论——信息抽取方法——进行了总体概述,详细总结了各种可用模型与方法;对实现学者画像的基本任务包括学者信息标注、研究兴趣挖掘和学术影响力预测进行了详细阐述;介绍了学者画像应用实例AMiner系统;对未来重点的研究内容和发展方向进行了探讨和展望.

关键词: 用户画像, 学者画像, 信息抽取方法, 研究兴趣挖掘, 学术影响力预测

Abstract: Scholar profiling from the open Internet has become a hot research topic in recent years. Its goal is to extract the attribute information of a scholar. Scholar profiling is a fundamental issue in large-scale expert databases for finding experts, evaluating academic influence, and so on. In the open Internet, scholar profiling faces new challenges, such as large amount of data, data noise and data redundancy. The traditional user profiling methods and algorithms cannot be directly used in the user profiling system in the open Internet environment. In this paper, the existing technologies are summarized and classified to provide reference for further research. Firstly, we analyze the problem of scholar profiling, and give a general overview of the information extraction method, which is the basic theory of user profiling. Then, the three basic tasks of scholar profiling including scholar information annotation, research interest mining and academic impact prediction are introduced in detail. What’s more, the successful application system of scholar profiling called AMiner is introduced. Finally, open research issues are discussed and possible future research directions are prospected.

Key words: user profiling, scholar profiling, information extraction method, research interest mining, academic impact prediction