• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhou Enqiang, Zhang Wei, Lu Yutong, Hou Hongjun, Dong Yong. A Cache Approach for Large Scale Data-Intensive Computing[J]. Journal of Computer Research and Development, 2015, 52(7): 1522-1530. DOI: 10.7544/issn1000-1239.2015.20148073
Citation: Zhou Enqiang, Zhang Wei, Lu Yutong, Hou Hongjun, Dong Yong. A Cache Approach for Large Scale Data-Intensive Computing[J]. Journal of Computer Research and Development, 2015, 52(7): 1522-1530. DOI: 10.7544/issn1000-1239.2015.20148073

A Cache Approach for Large Scale Data-Intensive Computing

More Information
  • Published Date: June 30, 2015
  • With HPC systems widely used in today’s modern science computing, more data-intensive applications are generating and analyzing the increasing scale of datasets, which makes HPC storage system facing new challenges. By comparing the different storage architectures with the corresponding approaches of file system, a novel cache approach, named DDCache, is proposed to improve the efficiency of data-intensive computing. DDCache leverages the distributed storage architecture as performance booster for centralized storage architecture by fully exploiting the potential benefits of node-local storage distributed across the system. In order to supply much larger cache volume than volatile memory cache, DDCache aggregates the node-local disks as huge non-volatile cooperative cache. Then high cache hit ratio is achieved through keeping intermediate data in the DDCache as long as possible during overall process of applications. To make the node-local storage efficient enough to act as data cache, locality aware data layout is used to make cached data close to compute tasks and evenly distributed. Furthermore, concurrency control is introduced to throttle I/O requests flowing into or out of DDCache and regain the special advantage of node-local storage. Evaluations on the typical HPC platforms verify the effectiveness of DDCache. Scalable I/O bandwidth is achieved on the well-known HPC scenario of checkpoint/restart and the overall performance of typical data-intensive application is improved up to 6 times.
  • Related Articles

    [1]Li Yong, Wang Ran, Feng Dan, Shi Zhan. A Cache Management Algorithm for the Heterogeneous Storage Systems[J]. Journal of Computer Research and Development, 2016, 53(9): 1953-1963. DOI: 10.7544/issn1000-1239.2016.20150157
    [2]Qin Zhiguang, Wang Shiyu, Zhao Yang, Xiong Hu, Wu Songyang. An Auditing Protocol for Data Storage in Cloud Computing with Data Dynamics[J]. Journal of Computer Research and Development, 2015, 52(10): 2192-2199. DOI: 10.7544/issn1000-1239.2015.20150509
    [3]Zhou Wei, Lu Jin, Zhou Keren, Wang Shipu, Yao Shaowen. Concurrent Skiplist Based Double-Layer Index Framework for Cloud Data Processing[J]. Journal of Computer Research and Development, 2015, 52(7): 1531-1545. DOI: 10.7544/issn1000-1239.2015.20140358
    [4]Shi Yuliang, Wang Jie. A Multi-Tenant Memory Management Mechanism for Cloud Data Storage[J]. Journal of Computer Research and Development, 2014, 51(11): 2528-2537. DOI: 10.7544/issn1000-1239.2014.20130789
    [5]Xun Changqing, Yang Qianming, Wu Nan, Wen Mei, and Zhang Chunyuan. Optimized Software-Hardware Communications for Shared Memory Reconfigurable Computer[J]. Journal of Computer Research and Development, 2013, 50(8): 1637-1646.
    [6]Yuan Pingpeng, Liu Pu, Zhang Wenya, and Wu Buwen. A Highly Scalable RDF Data Storage System[J]. Journal of Computer Research and Development, 2012, 49(10): 2131-2141.
    [7]Ao Li, Yu Deshui, Shu Jiwu, Xue Wei. A Tiered Storage System for Massive Data: TH-TS[J]. Journal of Computer Research and Development, 2011, 48(6): 1089-1100.
    [8]Mu Fei, Xue Wei, Shu Jiwu, and Zheng Weimin. A Mapping Algorithm for Replicated Data in LargeScale Storage System[J]. Journal of Computer Research and Development, 2009, 46(3): 492-497.
    [9]Wang Dong and Chen Shuming. DSCF: Data Streams Clustered Forwarding for Multi-Core DSPs with Memories Shared[J]. Journal of Computer Research and Development, 2008, 45(8): 1446-1553.
    [10]Wang Nianbin, Song Yibo, Yao Nianmin, Liu Daxin. A Parallel Data Processing Middleware Based on Clusters[J]. Journal of Computer Research and Development, 2007, 44(10): 1702-1708.

Catalog

    Article views (1177) PDF downloads (768) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return