A Cross-Layer Memory Tracing Toolkit for Big Data Application Based on Spark

Xu Danya; Wang Jing; Wang Li; Zhang Weigong

doi:10.7544/issn1000-1239.2020.20200109

Journal of Computer Research and Development > 2020 > 57(6): 1179-1190. > DOI: 10.7544/issn1000-1239.2020.20200109

Xu Danya, Wang Jing, Wang Li, Zhang Weigong. A Cross-Layer Memory Tracing Toolkit for Big Data Application Based on Spark[J]. Journal of Computer Research and Development, 2020, 57(6): 1179-1190. DOI: 10.7544/issn1000-1239.2020.20200109

Citation:

PDF (2107 KB)

A Cross-Layer Memory Tracing Toolkit for Big Data Application Based on Spark

¹(Information Engineering College, Capital Normal University, Beijing 100048)
²(Beijing Engineering Research Center of High Reliable Embedded System (Capital Normal University), Beijing 100048)
³(Beijing Advanced Innovation Center for Imaging Theory and Technology (Capital Normal University), Beijing 100048)

Funds: This work was supported by the National Natural Science Foundation of China (61772350), the Beijing Nova Program (Z181100006218093), the Research Fund from Beijing Innovation Center for Future Chips (KYJJ2018008), the Construction Plan of Beijing High-level Teacher Team (CIT＆TCD201704082), and the Capacity Building for Sci-Tech Innovation Fundamental Scientific Research Funds (19530050173).

More Information

Published Date: May 31, 2020

Graphical Abstract

Abstract

Abstract

Spark has been increasingly employed by industries for big data analytics recently, due to its efficient in-memory distributed programming model. Most existing optimization and analysis tool of Spark perform at either application layer or operating system layer separately, which makes Spark semantics separate from the underlying actions. For example, unknowing the impaction of operating system parameters on performance of Spark layer will lead unknowing of how to use OS parameters to tune system performance. In this paper, we propose SMTT, a new Spark memory tracing toolkit, which establishes the semantics of the upper application and the underlying physical hardware across Spark layer, JVM layer and OS layer. Based on the characteristics of Spark memory, we design the tracking scheme of execution memory and storage memory respectively. Then we analyze the Spark iterative calculation process and execution/storage memory usage by SMTT. The experiment of RDD memory assessment analysis shows our toolkit could be effectively used on performance analysis and provide guides for optimization of Spark memory system.
- big data,
- Spark,
- memory management,
- cross-layer analysis,
- memory tracing

FullText(HTML)

References (0)

[1]	Zhang Liping, Liu Lei, Hao Xiaohong, Li Song, Hao Zhongxiao. Voronoi-Based Group Reverse k Nearest Neighbor Query in Obstructed Space[J]. Journal of Computer Research and Development, 2017, 54(4): 861-871. DOI: 10.7544/issn1000-1239.2017.20151111
[2]	Yang Zexue, Hao Zhongxiao. Group Obstacle Nearest Neighbor Query in Spatial Database[J]. Journal of Computer Research and Development, 2013, 50(11): 2455-2462.
[3]	Liu Runtao, Hao Zhongxiao. Fast Algorithm of Nearest Neighbor Query for Line Segments of Spatial Database[J]. Journal of Computer Research and Development, 2011, 48(12): 2379-2384.
[4]	Miao Dongjing, Shi Shengfei, and Li Jianzhong. An Algorithm on Probabilistic Frequent Nearest Neighbor Query over Snapshots of Uncertain Database with Locally Correlation[J]. Journal of Computer Research and Development, 2011, 48(10): 1812-1822.
[5]	Liao Haojun, Han Jizhong, Fang Jinyun. All-Nearest-Neighbor Queries Processing in Spatial Databases[J]. Journal of Computer Research and Development, 2011, 48(1): 86-93.
[6]	Sun Dongpu, Hao Zhongxiao. Group Nearest Neighbor Queries Based on Voronoi Diagrams[J]. Journal of Computer Research and Development, 2010, 47(7): 1244-1251.
[7]	Sun Dongpu, Hao Zhongxiao. Multi-Type Nearest Neighbor Queries with Partial Range Constrained[J]. Journal of Computer Research and Development, 2009, 46(6): 1036-1042.
[8]	Hao Zhongxiao, Wang Yudong, He Yunbin. Line Segment Nearest Neighbor Query of Spatial Database[J]. Journal of Computer Research and Development, 2008, 45(9): 1539-1545.
[9]	Zhang Jing, Lu Hong, and Xue Xiangyang. Efficient Sports Video Retrieval Based on Index Structure[J]. Journal of Computer Research and Development, 2006, 43(11): 1953-1958.
[10]	Dong Daoguo, Liu Zhenzhong, and Xue Xiangyang. VA-Trie: A New and Efficient High Dimensional Index Structure for Approximate k Nearest Neighbor Query[J]. Journal of Computer Research and Development, 2005, 42(12): 2213-2218.

Cited By

Cited by

Periodical cited type(6)

1.	徐怡，陶强. 划分序乘积空间约简算法研究. 系统工程理论与实践. 2025(02): 554-570 .
2.	徐怡，邱紫恒. 基于遗传算法的划分序乘积空间问题求解层选择. 软件学报. 2024(04): 1945-1963 .
3.	徐怡，张杰. 基于划分序乘积空间的多尺度决策模型. 智能系统学报. 2024(06): 1528-1538 .
4.	王宝丽，王涛，廉侃超，韩素青. 粒空间中划分知识的正交补研究. 山东大学学报(理学版). 2022(03): 31-40 .
5.	陈丽芳，代琪，付其峰. 基于粒计算的ELM加权集成算法研究. 华北理工大学学报(自然科学版). 2020(03): 126-132 .
6.	应申，王子豪，杜志强，丁火平，李翔翔. 数据粒度均衡的二维矢量瓦片构建方法. 地理信息世界. 2020(04): 66-74 .