• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Shen Yijie, Zeng Dan, Xiong Jin. A Benefit Model Based Data Reuse Mechanism for Spark SQL[J]. Journal of Computer Research and Development, 2020, 57(2): 318-332. DOI: 10.7544/issn1000-1239.2020.20190563
Citation: Shen Yijie, Zeng Dan, Xiong Jin. A Benefit Model Based Data Reuse Mechanism for Spark SQL[J]. Journal of Computer Research and Development, 2020, 57(2): 318-332. DOI: 10.7544/issn1000-1239.2020.20190563

A Benefit Model Based Data Reuse Mechanism for Spark SQL

Funds: This work was supported by the National Key Research and Development Program (2016YFB1000202) and the National Natural Science Foundation of China (61379042).
More Information
  • Published Date: January 31, 2020
  • Analyzing massive data to discover the potential values in them can bring great benefits. Spark is a widely used data analytics engine for large-scale data processing due to its good scalability and high performance. Spark SQL is the most commonly used programming interface for Spark. There are a lot of redundant computations in data analytic applications. Such redundancies not only waste system resources but also prolong the execution time of queries. However, current implementation of Spark SQL is not aware of redundant computations among data analytic queries, and hence cannot remove them. To address this issue, we present a benefit model based, fine-grained, automatic data reuse mechanism called Criss in this paper. Criss automatically identifies redundant computations among queries. Then it uses an I/O performance aware benefit model to automatically choose the operator results with the biggest benefit and cache these results using a hybrid storage consisting of both memory and HDD. Moreover, cache management and data reuse in Criss are partition-based instead of the whole result of an operator. Such fine-grained mechanism greatly improves query performance and storage utilization. We implement Criss in Spark SQL using modified TachyonFS for data caching. Our experiment results show that Criss outperforms Spark SQL by 40% to 68%.
  • Related Articles

    [1]Dai Chenglong, Li Guanghui, Li Dong, Shen Jiahua, Pi Dechang. Electroencephalogram Clustering with Multiple Regularization Constrained Pseudo Label Propagation Optimization[J]. Journal of Computer Research and Development, 2024, 61(1): 156-171. DOI: 10.7544/issn1000-1239.202220295
    [2]Wang Hang, Tian Shengzhao, Tang Qing, Chen Duanbing. Few-Shot Image Classification Based on Multi-Scale Label Propagation[J]. Journal of Computer Research and Development, 2022, 59(7): 1486-1495. DOI: 10.7544/issn1000-1239.20210376
    [3]Cao Jiuxin, Gao Qingqing, Xia Rongqing, Liu Weijia, Zhu Xuelin, Liu Bo. Information Propagation Prediction and Specific Information Suppression in Social Networks[J]. Journal of Computer Research and Development, 2021, 58(7): 1490-1503. DOI: 10.7544/issn1000-1239.2021.20200809
    [4]Hu Dou, Wei Lingwei, Zhou Wei, Huai Xiaoyong, Han Jizhong, Hu Songlin. A Rumor Detection Approach Based on Multi-Relational Propagation Tree[J]. Journal of Computer Research and Development, 2021, 58(7): 1395-1411. DOI: 10.7544/issn1000-1239.2021.20200810
    [5]Du Ming, Yang Yun, Zhou Junfeng, Chen Ziyang, Yang Anping. Efficient Methods for Label-Constraint Reachability Query[J]. Journal of Computer Research and Development, 2020, 57(9): 1949-1960. DOI: 10.7544/issn1000-1239.2020.20190569
    [6]Zheng Wenping, Che Chenhao, Qian Yuhua, Wang Jie. A Two-Stage Community Detection Algorithm Based on Label Propagation[J]. Journal of Computer Research and Development, 2018, 55(9): 1959-1971. DOI: 10.7544/issn1000-1239.2018.20180277
    [7]Song Pan, Jing Liping. Exploiting Label Relationships in Multi-Label Classification with Neural Networks[J]. Journal of Computer Research and Development, 2018, 55(8): 1751-1759. DOI: 10.7544/issn1000-1239.2018.20180362
    [8]Ma Gang, Du Yuge, An Bo, Zhang Bo, Wang Wei, Shi Zhongzhi. Risk Evaluation of Complex Information System Based on Threat Propagation Sampling[J]. Journal of Computer Research and Development, 2015, 52(7): 1642-1659. DOI: 10.7544/issn1000-1239.2015.20140184
    [9]Zhu Xiang, Jia Yan, Nie Yuanping, Qu Ming. Event Propagation Analysis on Microblog[J]. Journal of Computer Research and Development, 2015, 52(2): 437-444. DOI: 10.7544/issn1000-1239.2015.20140187
    [10]She Qiaoqiao, Yu Yang, Jiang Yuan, and Zhou Zhihua. Large-Scale Image Annotation via Random Forest Based Label Propagation[J]. Journal of Computer Research and Development, 2012, 49(11): 2289-2295.

Catalog

    Article views (1027) PDF downloads (362) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return