ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (2): 456-474.doi: 10.7544/issn1000-1239.2015.20131342

• 信息处理 • 上一篇    下一篇

基于开放网络知识的信息检索与数据挖掘

王元卓1,贾岩涛1,刘大伟2,靳小龙1,程学旗1   

  1. 1(中国科学院网络数据科学与技术重点实验室(中国科学院计算技术研究所) 北京 100190); 2(烟台中科网络技术研究所/中国科学院计算技术研究所烟台分所 山东烟台 264005) (wangyuanzhuo@ict.ac.cn)
  • 出版日期: 2015-02-01
  • 基金资助: 
    基金项目:国家“九七三”重点基础研究发展计划基金项目(2014CB340401,2013CB329601);国家自然科学基金项目(61173008, 61100175,61232010,60933005,61402442);北京市科技新星计划项目(Z121101002512063);北京市自然科学基金青年基金项目(4154086)

Open Web Knowledge Aided Information Search and Data Mining

Wang Yuanzhuo1, Jia Yantao1, Liu Dawei2,Jin Xiaolong1, Cheng Xueqi1   

  1. 1(CAS Key Laboratory of Network Data Science & Technology (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190); 2(Institute of Network Technology/ICT(YANTAI), Chinese Academy of Sciences, Yantai, Shandong 264005)
  • Online: 2015-02-01

摘要: 网络大数据是指“人、机、物”三元世界在网络空间(cyberspace)中交互、融合所产生并在互联网上可获得的大数据.这些数据具有多源异构、交互性、时效性、社会性、突发性和高噪声等特点,不但非结构化数据多,而且数据的实时性强.网络大数据背后蕴含着丰富的、复杂关联的知识.建立面向开放网络的知识库是获取网络大数据中的丰富知识的有效手段.对当前国内外主要的开放网络库进行了比较,分析了相应的构建方法、多源知识的融合以及知识库的更新等关键技术.进一步从用户意图理解、查询扩展、语义问答、线索挖据、关系推理以及关系和属性预测等方面出发,总结了基于开放网络知识库的信息检索、数据挖掘与系统应用的研究现状和主要问题.最后,对开放网络知识库的发展趋势和面临的主要挑战进行了展望.

关键词: 网络大数据, 开放网络知识, 本体, 信息检索, 数据挖掘

Abstract: Network big data refers to the massive data generated via interaction and fusion of the ternary human-machine-thing universe in the cyberspace and available on the Internet. It has a few typical features, such as multi-sourced, heterogeneous, interactive, bursty, and noisy. It contains mainly unstructured data, and has strong real-timeness. Network big data implicitly contains tremendous highly-interconnected knowledge. Building up open Web oriented large-scale knowledge bases is an effective means for obtaining rich knowledge from network big data. This paper compares both the domestic and international mainstream open Web knowledge bases. We specifically analyze the core techniques and methods for constructing open Web knowledge bases, fusing multi-sourced knowledge, and updating the knowledge bases. Furthermore, we summarize the research status and main issues of open Web knowledge base based information search, data mining, and system applications from different aspects, including user intension understanding, query extension, semantic Q&A, clue mining, relationship referencing, and prediction of relationships and attributes. Finally, we look into the development trends and main challenges of open Web knowledge bases.

Key words: network big data, open Web knowledge, ontology, information search, data mining

中图分类号: