ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (2): 229-245.doi: 10.7544/issn1000-1239.2018.20170742

Special Issue: 2018面向新型硬件的数据管理专题

Previous Articles     Next Articles

NV-Shuffle: Shuffle Based on Non-Volatile Memory

Pan Fengfeng, Xiong Jin   

  1. (State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190) (University of Chinese Academy of Sciences, Beijing 100049)
  • Online:2018-02-01

Abstract: In the popular big data processing platforms like Spark, it is common to collect data in a many-to-many fashion during a stage traditionally known as the Shuffle phase. Data exchange happens across different types of tasks or stages via Shuffle phase. And during this phase, the data need to be transferred via network and persisted into traditional disk-based file system. Hence, the efficiency of Shuffle phase is one of the key factors in the performance of the big data processing. In order to reducing I/O overheads, we propose an optimized Shuffle strategy based on Non-Volatile Memory (NVM)—NV-Shuffle. Next-generation non-volatile memory (NVM) technologies, such as Phase Change Memory (PCM), Spin-Transfer Torque Magnetic Memories (STTMs) introduce new opportunities for reducing I/O overhead, due to their non-volatility, high read/write performance, low energy, etc. In the big data processing platform based on memory computing such as Spark, Shuffle data access based on disks is an important factor of application performance, NV-Shuffle uses NVM as persist memory to store Shuffle data and employs direct data accesses like memory by introducing NV-Buffer to organize data instead of traditional file system.We implemented NV-Shuffle in Spark. Our performance results show, NV-shuffle reduces job execution time by 10%~40% for Shuffle-heavy workloads.

Key words: big data processing, Shuffle, NVM(non-volatile memory), non-volatile buffer, fault tolerance

CLC Number: