ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (2): 235-247.doi: 10.7544/issn1000-1239.2017.20160847

所属专题: 2017科学大数据管理专题

• 软件技术 • 上一篇    下一篇

科学大数据管理:概念、技术与系统

黎建辉1,沈志宏1,孟小峰2   

  1. 1(中国科学院计算机网络信息中心 北京 100190); 2(中国人民大学信息学院 北京 100872) (lijh@cnic.cn)
  • 出版日期: 2017-02-01
  • 基金资助: 
    国家重点研发计划项目(2016YFB1000600)

Scientific Big Data Management: Concepts, Technologies and System

Li Jianhui1, Shen Zhihong1, Meng Xiaofeng2   

  1. 1(Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190);2(School of Information, Remin University of China, Beijing 100872)
  • Online: 2017-02-01

摘要: 近年来,随着越来越多的大科学装置的建设和重大科学实验的开展,科学研究进入到一个前所未有的大数据时代.大数据时代科学研究是一个大科学、大需求、大数据、大计算、大发现的过程,研发一个支持科学大数据全生命周期的数据管理系统具有重要的意义.分析了研发科学大数据管理系统的背景,阐述了科学大数据的概念和三大特征,通过对科学数据资源发展和科学数据管理系统的研究进展进行综述分析,提出了满足科学数据管理全生命周期的科学大数据管理框架,并从数据融合、数据实时分析、长期存储、云服务体系以及数据开放共享机制5个方面分析了科学大数据管理系统中的关键技术.最后,结合科学研究领域展望了科学大数据管理系统的应用前景.

关键词: 科学数据, 大数据, 数据流水线, 数据全生命周期

Abstract: In recent years, as more and more large-scale scientific facilities have been built and significant scientific experiments have been carried out, scientific research has entered an unprecedented big data era. Scientific research in big data era is a process of big science, big demand, big data, big computing, and big discovery. It is of important significance to develop a full life cycle data management system for scientific big data. In this paper, we first introduce the background of the development of scientific big data management system. Then we specify the concepts and three key characteristics of scientific big data. After an review of scientific data resource development projects and scientific data management systems, a framework is proposed aiming at the full life cycle management of scientific big data. Further, we introduce the key technologies of the management framework including data fusion, real-time analysis, long termstorage, cloud service, and data opening and sharing. Finally, we summarize the research progress in this field, and look into the application prospects of scientific big data management system.

Key words: scientific data, big data, data pipeline, full life cycle of data

中图分类号: