ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (3): 479-496.doi: 10.7544/issn1000-1239.2021.20200489

• 软件技术 • 上一篇    下一篇

基于Storm平台的数据恢复节能策略

蒲勇霖1,于炯1,鲁亮2,李梓杨1,国冰磊1,廖彬3   

  1. 1(新疆大学信息科学与工程学院 乌鲁木齐 830046);2(中国民航大学计算机科学与技术学院 天津 300300);3(新疆财经大学统计与数据科学学院 乌鲁木齐 830012) (puyonglin1991@foxmail.com)
  • 出版日期: 2021-03-01
  • 基金资助: 
    国家自然科学基金项目(61862060, 61462079, 61562086, 61562078);新疆维吾尔自治区研究生科研创新基金项目(XJ2019G038);新疆大学博士生科技创新基金项目(XJUBSCX-201902)

Energy-Efficient Strategy Based on Data Recovery in Storm

Pu Yonglin1, Yu Jiong1, Lu Liang2, Li Ziyang1, Guo Binglei1, Liao Bin3   

  1. 1(School of Information Science and Engineering, Xinjiang University, Urumqi 830046);2(School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300);3(College of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012)
  • Online: 2021-03-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61862060, 61462079, 61562086, 61562078), the Research Innovation Project of Graduate Student in Xinjiang Uygur Autonomous Region (XJ2019G038), and the Doctoral Innovation Program of Xinjiang University (XJUBSCX-201902).

摘要: 作为目前主流的大数据流式计算平台之一, Storm在设计之初以性能为目的进行研究而忽视了高能耗的问题, 但是其高能耗问题已经开始制约着平台的发展.针对这一问题, 分别建立了任务分配模型、拓扑信息监控模型、数据恢复模型以及能耗模型, 并进一步提出了基于Storm平台的数据恢复节能策略(energy-efficient strategy based on data recovery in Storm, DR-Storm), 包括吞吐量检测算法与数据恢复算法.其中吞吐量检测算法根据拓扑信息监控模型反馈的拓扑信息计算集群吞吐量, 并通过信息反馈判断是否终止整个集群内拓扑的任务.数据恢复算法根据数据恢复模型选择备份节点用于数据存储, 并通过拓扑信息监控模型反馈的信息判断集群拓扑是否进行数据恢复.此外, DR-Storm通过备份节点内存恢复集群拓扑内的数据, 并根据大数据流式计算的系统延迟与能效评估DR-Storm.实验结果表明:与现有研究成果相比, DR-Storm在减少系统计算延迟、降低集群功率的同时, 有效节约了能耗.

关键词: 大数据, 流式计算, Storm, 信息监控, 数据恢复, 能耗

Abstract: As one of the most popular platforms in big data stream computing, Storm is developed for high performance in the design process, which ignores the problem of high energy consumption and restricts the development of the platform. Aiming at this problem, the task allocation model, the topology information monitoring model, the data recovery model, and the energy consumption model are set up. Moreover, an energy-efficient strategy based on data recovery in Storm(DR-Storm) is proposed. The proposed strategy is composed of the throughput detection algorithm and the data recovery algorithm. According to the topology information, the throughput detection algorithm calculates cluster throughput which is feedbacked by the topology information monitoring model and estimates whether the task in cluster topology should be terminated by information feedback. The data recovery algorithm selects a backup node for data storage according to the data recovery model and estimates whether cluster topology is appropriate for data recovery by the feedback of the topology information monitoring model. In addition, the DR-Storm recovers data within the cluster topology from the memory of the backup node. We evaluate the DR-Storm by measuring the cluster latency as well as the energy consumption efficiency in a big data stream computing environment. The experimental results show that the proposed strategy can reduce cluster latency and power while the energy consumption is saved efficiently compared with existing researches.

Key words: big data, stream computing, Storm, information monitoring, data recovery, energy consumption

中图分类号: