ISSN 1000-1239 CN 11-1777/TP


    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Meng Xiaofeng, Li Jianhui, Guo Yike
    Journal of Computer Research and Development    2017, 54 (2): 233-234.  
    Abstract769)   HTML1)    PDF (391KB)(523)       Save
    Related Articles | Metrics
    Scientific Big Data Management: Concepts, Technologies and System
    Li Jianhui, Shen Zhihong, Meng Xiaofeng
    Journal of Computer Research and Development    2017, 54 (2): 235-247.   DOI: 10.7544/issn1000-1239.2017.20160847
    Abstract2496)   HTML35)    PDF (2617KB)(1349)       Save
    In recent years, as more and more large-scale scientific facilities have been built and significant scientific experiments have been carried out, scientific research has entered an unprecedented big data era. Scientific research in big data era is a process of big science, big demand, big data, big computing, and big discovery. It is of important significance to develop a full life cycle data management system for scientific big data. In this paper, we first introduce the background of the development of scientific big data management system. Then we specify the concepts and three key characteristics of scientific big data. After an review of scientific data resource development projects and scientific data management systems, a framework is proposed aiming at the full life cycle management of scientific big data. Further, we introduce the key technologies of the management framework including data fusion, real-time analysis, long termstorage, cloud service, and data opening and sharing. Finally, we summarize the research progress in this field, and look into the application prospects of scientific big data management system.
    Related Articles | Metrics
    Data Management Challenges and Real-Time Processing Technologies in Astronomy
    Yang Chen, Weng Zujian, Meng Xiaofeng, Ren Wei, Xin Rihui, Wang Chunkai, Du Zhihui, Wan Meng, Wei Jianyan
    Journal of Computer Research and Development    2017, 54 (2): 248-257.   DOI: 10.7544/issn1000-1239.2017.20170005
    Abstract1831)   HTML14)    PDF (3154KB)(725)       Save
    In recent years, many large telescopes, which can produce petabytes or exabytes of data, have come out. These telescopes are not only beneficial to the find of new astronomical phenomena, but also the confirmation of existing astronomical physical models. However, the produced star tables are so large that the single database cannot manage them efficiently. Taking GWAC that has 40 cameras and is designed by China as an example, it can take high-resolution photos by 15s and the database on it has to make star tables be queried out in 15s. Moreover, the database has to process multi-camera data, find abnormal stars in real time, query their recent historical data very fast, persist and offline query star tables as fast as possible. Based on these problems, firstly, we design a distributed data generator to simulate the GWAC working process. Secondly, we address a two-level cache architecture which cannot only process multi-camera data and find abnormal stars in local memory, but also query star table in a distributed memory system. Thirdly, we address a storage format named star cluster, which can storage some stars into a physical file to trade off the efficiency of persistence and query. Last, our query engine based on an index table can query from the second cache and star cluster format. The experimental results show our distributed system prototype can satisfy the demand of GWAC in our server cluster.
    Related Articles | Metrics
    Data Management Challenges and Event Index Technologies in High Energy Physics
    Cheng Yaodong, Zhang Xiao, Wang Peijian, Zha Li, Hou Di, Qi Yong, Ma Can
    Journal of Computer Research and Development    2017, 54 (2): 258-266.   DOI: 10.7544/issn1000-1239.2017.20160939
    Abstract1176)   HTML4)    PDF (2984KB)(781)       Save
    Nowadays, more and more scientific data has been produced by new generation high energy physics facilities. The scale of the data can be achieved to PB or EB level even by one experiment, which brings big challenges to data management technologies such as data acquisition, storage, transmission,sharing, analyzing and processing. Event is the basic data unit of high energy physics, and one large high energy physics experiment can produce trillions of events. The traditional high energy physical data processing technology adopts file as a basic data management unit, and each file contains thousands of events. The benefit of file-based method is to simplify the complexity of data management system. However, one physical analysis task is only interested in very few events, which leads to some problems including transferring too much redundant data, I/O bottleneck and low efficiency of data processing. To solve these problems, this paper proposes an event-oriented high energy physical data management method, which focuses on high efficiency indexing technology of massive events. In this method, event data is still stored in ROOT file while a large amount of events are indexed by some specified properties and stored in NoSQL database. Finally,experimental test results show the feasibility of the method, and optimized HBase system can meet the requirements of event index.
    Related Articles | Metrics
    Data Infrastructure for Remote Sensing Big Data: Integration, Management and On-Demand Service
    Li Guoqing, Huang Zhenchun
    Journal of Computer Research and Development    2017, 54 (2): 267-283.   DOI: 10.7544/issn1000-1239.2017.20160837
    Abstract1613)   HTML9)    PDF (4787KB)(777)       Save
    The increasing growth of remote sensing data and geoscience research pushes earth sciences strongly and poses great challenges to data infrastructures for remote sensing big data, including the collection, storage, management, analysis and delivery. The de-fact remote sensing data infrastructures become bottleneck of the workflows for remote sensing data analysis because of their capability, scalability and performance. In this paper, data infrastructures for remote sensing big data are catalogued into 6 classes based on the features such as basic service unit, distributivity, heterogeneous, space-time continuation and on-demand processing. Then, architectures are designed for all the 6 classes of data infrastructures, and some implementation technologies such as data collection and integration, data storage and management, data service interface, and on-demand data processing, are discussed. With the architecture designs and implementation technologies, data infrastructures for remote sensing big data will provide PaaS (platform-as-a-service) and SaaS(software-as-a-service) services for developing much more remote sensing data analysis applications. With continuously growing data, tools and libraries in the infrastructures, users can easily develop analysis models to process remote sensing big data, create new applications based on these models, and exchange their knowledge each other by sharing models.
    Related Articles | Metrics
    Crowdsourcing-Based Scientific Data Processing
    Zhao Jianghua, Mu Shuting, Wang Xuezhi, Lin Qinghui, Zhang Xi, Zhou Yuanchun
    Journal of Computer Research and Development    2017, 54 (2): 284-294.   DOI: 10.7544/issn1000-1239.2017.20160850
    Abstract1438)   HTML5)    PDF (2465KB)(779)       Save
    The ultimate goal of acquiring scientific data is to extract useful knowledge from the data according to specific needs and apply the knowledge to specific areas to help decision makers make decisions. As the volume of scientific data becomes larger, and the structure becomes more complex, such as semi or unstructured data, it is difficult to automatically process these data by computers. By incorporating human computing power in data processing, crowdsourcing has become one of the solutions for big scientific data processing. By analyzing the characteristics of crowdsourcing scientific data processing tasks to citizens, this paper studies three aspects, which are talent selection mechanism, task execution mode, and result assessment strategy. Then a series of crowdsourcing-based remote sensing imagery interpretation experiments are carried out. Results show that not only scientific data can be processed through crowdsourcing paradigm, but also by designing reasonable procedure, high-quality data can be obtained.
    Related Articles | Metrics