ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (2): 246-264.doi: 10.7544/issn1000-1239.2018.20170687

Special Issue: 2018面向新型硬件的数据管理专题

Previous Articles     Next Articles

Heterogeneous Memory Programming Framework Based on Spark for Big Data Processing

Wang Chenxi1,2, Lü Fang1,4, Cui Huimin1, Cao Ting1, John Zigman3, Zhuang Liangji1,2, Feng Xiaobing1,2   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190); 2(University of Chinese Academy of Sciences, Beijing 100049); 3(Australia Centre for Field Robotics (University of Sydney), Sydney, Australia 2006); 4(State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, Jiangsu 214125)
  • Online:2018-02-01

Abstract: Due to the boom of big data applications, the amount of data being processed by servers is increasing rapidly. In order to improve processing and response speed, industry is deploying in-memory big data computing systems, such as Apache Spark. However, traditional DRAM memory cannot satisfy the large memory request of these systems for the following reasons: firstly, the energy consumption of DRAM can be as high as 40% of the total; secondly, the scaling of DRAM manufacturing technology is hitting the limit. As a result, heterogeneous memory integrating DRAM and NVM (non-volatile memory) is a promising candidate for future memory systems. However, because of the longer latency and lower bandwidth of NVM compared with DRAM, it is necessary to place data in appropriate memory module to achieve ideal performance. This paper analyzes the memory access behavior of Spark applications and proposes a heterogeneous memory programming framework based on Spark. It is easy to apply this framework to existing Spark applications without rewriting the code. Experiments show that for Spark benchmarks, by utilizing our framework, only placing 20%~25% data on DRAM and the remaining on NVM can reach 90% of the performance when all the data is placed on DRAM. This leads to an improved performance-dollar ratio compared with DRAM-only servers and the potential support for larger scale in-memory computing applications.

Key words: in-memory computing, Spark, heterogeneous memory, non-volatile memory (NVM), programming framework

CLC Number: