高级检索
    杨晨, 翁祖建, 孟小峰, 任玮, 忻日辉, 王春凯, 都志辉, 万萌, 魏建彦. 天文大数据挑战与实时处理技术[J]. 计算机研究与发展, 2017, 54(2): 248-257. DOI: 10.7544/issn1000-1239.2017.20170005
    引用本文: 杨晨, 翁祖建, 孟小峰, 任玮, 忻日辉, 王春凯, 都志辉, 万萌, 魏建彦. 天文大数据挑战与实时处理技术[J]. 计算机研究与发展, 2017, 54(2): 248-257. DOI: 10.7544/issn1000-1239.2017.20170005
    Yang Chen, Weng Zujian, Meng Xiaofeng, Ren Wei, Xin Rihui, Wang Chunkai, Du Zhihui, Wan Meng, Wei Jianyan. Data Management Challenges and Real-Time Processing Technologies in Astronomy[J]. Journal of Computer Research and Development, 2017, 54(2): 248-257. DOI: 10.7544/issn1000-1239.2017.20170005
    Citation: Yang Chen, Weng Zujian, Meng Xiaofeng, Ren Wei, Xin Rihui, Wang Chunkai, Du Zhihui, Wan Meng, Wei Jianyan. Data Management Challenges and Real-Time Processing Technologies in Astronomy[J]. Journal of Computer Research and Development, 2017, 54(2): 248-257. DOI: 10.7544/issn1000-1239.2017.20170005

    天文大数据挑战与实时处理技术

    Data Management Challenges and Real-Time Processing Technologies in Astronomy

    • 摘要: 超大型天文观测技术的出现不仅能够让研究人员观测到新的天文现象,更能用于验证已有物理模型的正确性.这些最新天文成果的发现是建立在海量天文数据的近乎实时产生、管理与分析的基础上,因此给目前的数据管理系统带来了新的挑战.以我国自主研发的地基广角相机阵(the ground-based wide-angle camera array, GWAC)天文望远镜为例,15s的采样和处理周期都处于短时标观测领域的世界前列,但却对数据管理系统提出了很多问题,包括多镜头并行输出数据管理、实时瞬变源发现、当前观测夜数据的秒级查询、数据持久化和快速离线查询等.基于上述问题,设计了分布式GWAC数据模拟生成器用于模拟真实GWAC数据产生场景,并基于产生的数据特性,提出一种两级缓存架构,使用本地内存解决多镜头并行输出、实时瞬变源发现,使用分布式共享内存实现秒级查询.为了平衡持久化和查询效率,设计一种星表簇结构将整个星表数据划分后聚集存储.根据天文需求特点,设计基于索引表的查询引擎能从缓存和星表簇以较小的代价对星表数据查询.通过实验验证,当前方案能够满足GWAC的需求.

       

      Abstract: In recent years, many large telescopes, which can produce petabytes or exabytes of data, have come out. These telescopes are not only beneficial to the find of new astronomical phenomena, but also the confirmation of existing astronomical physical models. However, the produced star tables are so large that the single database cannot manage them efficiently. Taking GWAC that has 40 cameras and is designed by China as an example, it can take high-resolution photos by 15s and the database on it has to make star tables be queried out in 15s. Moreover, the database has to process multi-camera data, find abnormal stars in real time, query their recent historical data very fast, persist and offline query star tables as fast as possible. Based on these problems, firstly, we design a distributed data generator to simulate the GWAC working process. Secondly, we address a two-level cache architecture which cannot only process multi-camera data and find abnormal stars in local memory, but also query star table in a distributed memory system. Thirdly, we address a storage format named star cluster, which can storage some stars into a physical file to trade off the efficiency of persistence and query. Last, our query engine based on an index table can query from the second cache and star cluster format. The experimental results show our distributed system prototype can satisfy the demand of GWAC in our server cluster.

       

    /

    返回文章
    返回