• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xu Danya, Wang Jing, Wang Li, Zhang Weigong. A Cross-Layer Memory Tracing Toolkit for Big Data Application Based on Spark[J]. Journal of Computer Research and Development, 2020, 57(6): 1179-1190. DOI: 10.7544/issn1000-1239.2020.20200109
Citation: Xu Danya, Wang Jing, Wang Li, Zhang Weigong. A Cross-Layer Memory Tracing Toolkit for Big Data Application Based on Spark[J]. Journal of Computer Research and Development, 2020, 57(6): 1179-1190. DOI: 10.7544/issn1000-1239.2020.20200109

A Cross-Layer Memory Tracing Toolkit for Big Data Application Based on Spark

Funds: This work was supported by the National Natural Science Foundation of China (61772350), the Beijing Nova Program (Z181100006218093), the Research Fund from Beijing Innovation Center for Future Chips (KYJJ2018008), the Construction Plan of Beijing High-level Teacher Team (CIT&TCD201704082), and the Capacity Building for Sci-Tech Innovation Fundamental Scientific Research Funds (19530050173).
More Information
  • Published Date: May 31, 2020
  • Spark has been increasingly employed by industries for big data analytics recently, due to its efficient in-memory distributed programming model. Most existing optimization and analysis tool of Spark perform at either application layer or operating system layer separately, which makes Spark semantics separate from the underlying actions. For example, unknowing the impaction of operating system parameters on performance of Spark layer will lead unknowing of how to use OS parameters to tune system performance. In this paper, we propose SMTT, a new Spark memory tracing toolkit, which establishes the semantics of the upper application and the underlying physical hardware across Spark layer, JVM layer and OS layer. Based on the characteristics of Spark memory, we design the tracking scheme of execution memory and storage memory respectively. Then we analyze the Spark iterative calculation process and execution/storage memory usage by SMTT. The experiment of RDD memory assessment analysis shows our toolkit could be effectively used on performance analysis and provide guides for optimization of Spark memory system.
  • Related Articles

    [1]Shen Yuan, Song Wei, Zhao Changsheng, Peng Zhiyong. A Cross-Domain Ciphertext Sharing Scheme Supporting Access Behavior Identity Tracing[J]. Journal of Computer Research and Development, 2024, 61(7): 1611-1628. DOI: 10.7544/issn1000-1239.202330618
    [2]Hu Hao, Liang Wenkai, Li Shiyi, Wang Hongpeng, Xia Wen. Survey of Transaction Management System in New Memory Hardware Environment[J]. Journal of Computer Research and Development, 2023, 60(3): 572-591. DOI: 10.7544/issn1000-1239.202220579
    [3]Yin Xiaokang, Lu Bin, Cai Ruijie, Zhu Xiaoya, Yang Qichao, Liu Shengli. Memory Copy Function Identification Technique with Control Flow and Data Flow Analysis[J]. Journal of Computer Research and Development, 2023, 60(2): 326-340. DOI: 10.7544/issn1000-1239.202110990
    [4]Liu Kunjia, Li Xinyi, Tang Jiuyang, Zhao Xiang. Interpretable Deep Knowledge Tracing[J]. Journal of Computer Research and Development, 2021, 58(12): 2618-2629. DOI: 10.7544/issn1000-1239.2021.20211021
    [5]You Litong, Wang Zhenjie, Huang Linpeng. A Log-Structured Key-Value Store Based on Non-Volatile Memory[J]. Journal of Computer Research and Development, 2018, 55(9): 2038-2049. DOI: 10.7544/issn1000-1239.2018.20180258
    [6]Wang Chenxi, Lü Fang, Cui Huimin, Cao Ting, John Zigman, Zhuang Liangji, Feng Xiaobing. Heterogeneous Memory Programming Framework Based on Spark for Big Data Processing[J]. Journal of Computer Research and Development, 2018, 55(2): 246-264. DOI: 10.7544/issn1000-1239.2018.20170687
    [7]Ying Changtian, Yu Jiong, Bian Chen, Wang Weiqing, Lu Liang, Qian Yurong. Criticality Checkpoint Management Strategy Based on RDD Characteristics in Spark[J]. Journal of Computer Research and Development, 2017, 54(12): 2858-2872. DOI: 10.7544/issn1000-1239.2017.20160717
    [8]Zhu Pengfei, Lu Tianyue, Chen Mingyu. A Trace-Driven Simulation of Memory System in Multithread Applications[J]. Journal of Computer Research and Development, 2015, 52(6): 1266-1277. DOI: 10.7544/issn1000-1239.2015.20150160
    [9]Wang Xiaoming, Yao Guoxiang, and Liao Zhiwei. Cryptanalysis and Modification of a Traitor Tracing Scheme[J]. Journal of Computer Research and Development, 2013, 50(10): 2092-2099.
    [10]Chen Licheng, Cui Zehan, Bao Yungang, Chen Mingyu, Shen Linfeng, Liang Qi. An Approach for Monitoring Memory Address Traces with Functional Semantic Information[J]. Journal of Computer Research and Development, 2013, 50(5): 1100-1109.
  • Cited by

    Periodical cited type(8)

    1. 申彦,敬露艺,张士翔. 基于Spark的分布式时序分类学习模型. 计算机工程与设计. 2023(04): 1042-1049 .
    2. 梁雪青,杜舒明,赵小凡,刘超. 分布式电力大数据存储系统参数优化方法. 电子设计工程. 2023(10): 101-105 .
    3. 张创奥,吴晖,孙积锦,王黎明,王葳霖. 大数据时代计算机软件技术的应用. 软件. 2023(06): 160-162 .
    4. 任刚,李鑫,刘小杰,张阳,郜广兰,肖东栩. 基于Spark大数据计算模型的遗传算法深度前馈神经网络训练算法. 河南工学院学报. 2023(05): 14-22 .
    5. 许伟,胡婷. 基于Spark的烟草大数据分析系统的设计与实现. 电子元器件与信息技术. 2022(01): 57-59 .
    6. 唐荣辉. 大数据时代计算机信息处理技术分析. 网络安全技术与应用. 2022(06): 65-67 .
    7. 张荣芳. Java虚拟机在星载计算机系统中的应用. 信息记录材料. 2021(12): 159-160 .
    8. 周迪民,欧嵬. 基于大数据的计算机数据分析管理系统设计. 湖南科技学院学报. 2020(05): 64-66 .

    Other cited types(4)

Catalog

    Article views (1262) PDF downloads (699) Cited by(12)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return