Heterogeneous Memory Programming Framework Based on Spark for Big Data Processing
-
摘要: 随着大数据应用的发展,需要处理的数据量急剧增长,企业为了保证数据的及时处理并快速响应客户,正在广泛部署以Apache Spark为代表的内存计算系统.然而TB级别的内存不但造成了服务器成本的上升,也促进了功耗的增长.由于DRAM的功耗、容量密度受限于工艺瓶颈,无法满足内存计算快速增长的内存需求,因此研发人员将目光逐渐移向了新型的非易失性内存(non-volatile memory, NVM).由DRAM和NVM共同构成的异质内存,具有低成本、低功耗、高容量密度等特点,但由于NVM读写性能较差,如何合理布局数据到异质内存是一个关键的研究问题.系统分析了Spark应用的访存特征,并结合OpenJDK的内存使用特点,提出了一套管理数据在DRAM和NVM之间布局的编程框架.应用开发者通过对本文提供接口的简单调用,便可将数据合理布局在异质内存之中.仅需20%~25%的DRAM和大量的NVM,便可以达到使用等量的DRAM时90%左右的性能.该框架可以通过有效利用异质内存来满足内存计算不断增长的计算规模.同时,“性能/价格”比仅用DRAM时提高了数倍.Abstract: Due to the boom of big data applications, the amount of data being processed by servers is increasing rapidly. In order to improve processing and response speed, industry is deploying in-memory big data computing systems, such as Apache Spark. However, traditional DRAM memory cannot satisfy the large memory request of these systems for the following reasons: firstly, the energy consumption of DRAM can be as high as 40% of the total; secondly, the scaling of DRAM manufacturing technology is hitting the limit. As a result, heterogeneous memory integrating DRAM and NVM (non-volatile memory) is a promising candidate for future memory systems. However, because of the longer latency and lower bandwidth of NVM compared with DRAM, it is necessary to place data in appropriate memory module to achieve ideal performance. This paper analyzes the memory access behavior of Spark applications and proposes a heterogeneous memory programming framework based on Spark. It is easy to apply this framework to existing Spark applications without rewriting the code. Experiments show that for Spark benchmarks, by utilizing our framework, only placing 20%~25% data on DRAM and the remaining on NVM can reach 90% of the performance when all the data is placed on DRAM. This leads to an improved performance-dollar ratio compared with DRAM-only servers and the potential support for larger scale in-memory computing applications.
-
-
期刊类型引用(5)
1. 郑磊,韩鹏军,田晨雨,张琦,钱隆. 基于威胁建模的网络安全日志自动化分析技术. 微型电脑应用. 2023(07): 154-156+180 . 百度学术
2. 魏丽英,杨立华. 智能化无线通信信道安全容量控制仿真. 计算机仿真. 2022(09): 230-233+238 . 百度学术
3. 钟煜明,陈长辉. 网络安全分析中的大数据综合研究. 现代信息科技. 2020(08): 142-144 . 百度学术
4. 刘鸿楠. 网络交易安全与民商法保护的相关性窥探. 法制与社会. 2019(27): 11-12 . 百度学术
5. 曾峰,崔宁. 无线传感器网络安全技术. 电子技术与软件工程. 2019(19): 195-196 . 百度学术
其他类型引用(3)
计量
- 文章访问数: 1362
- HTML全文浏览量: 4
- PDF下载量: 732
- 被引次数: 8