ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (2): 258-266.doi: 10.7544/issn1000-1239.2017.20160939

Special Issue: 2017科学大数据管理专题

Previous Articles     Next Articles

Data Management Challenges and Event Index Technologies in High Energy Physics

Cheng Yaodong1, Zhang Xiao2, Wang Peijian2, Zha Li3, Hou Di2, Qi Yong2, Ma Can4   

  1. 1(Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049);2(Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049);3(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);4(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100193)
  • Online:2017-02-01

Abstract: Nowadays, more and more scientific data has been produced by new generation high energy physics facilities. The scale of the data can be achieved to PB or EB level even by one experiment, which brings big challenges to data management technologies such as data acquisition, storage, transmission,sharing, analyzing and processing. Event is the basic data unit of high energy physics, and one large high energy physics experiment can produce trillions of events. The traditional high energy physical data processing technology adopts file as a basic data management unit, and each file contains thousands of events. The benefit of file-based method is to simplify the complexity of data management system. However, one physical analysis task is only interested in very few events, which leads to some problems including transferring too much redundant data, I/O bottleneck and low efficiency of data processing. To solve these problems, this paper proposes an event-oriented high energy physical data management method, which focuses on high efficiency indexing technology of massive events. In this method, event data is still stored in ROOT file while a large amount of events are indexed by some specified properties and stored in NoSQL database. Finally,experimental test results show the feasibility of the method, and optimized HBase system can meet the requirements of event index.

Key words: high energy physics, data management, event index, HBase, query optimization

CLC Number: