Abstract:
Nowadays, more and more scientific data has been produced by new generation high energy physics facilities. The scale of the data can be achieved to PB or EB level even by one experiment, which brings big challenges to data management technologies such as data acquisition, storage, transmission,sharing, analyzing and processing. Event is the basic data unit of high energy physics, and one large high energy physics experiment can produce trillions of events. The traditional high energy physical data processing technology adopts file as a basic data management unit, and each file contains thousands of events. The benefit of file-based method is to simplify the complexity of data management system. However, one physical analysis task is only interested in very few events, which leads to some problems including transferring too much redundant data, I/O bottleneck and low efficiency of data processing. To solve these problems, this paper proposes an event-oriented high energy physical data management method, which focuses on high efficiency indexing technology of massive events. In this method, event data is still stored in ROOT file while a large amount of events are indexed by some specified properties and stored in NoSQL database. Finally,experimental test results show the feasibility of the method, and optimized HBase system can meet the requirements of event index.