基于Storm平台的数据恢复节能策略

蒲勇霖; 于炯; 鲁亮; 李梓杨; 国冰磊; 廖彬

doi:10.7544/issn1000-1239.2021.20200489

基于Storm平台的数据恢复节能策略

Energy-Efficient Strategy Based on Data Recovery in Storm

摘要

摘要: 作为目前主流的大数据流式计算平台之一, Storm在设计之初以性能为目的进行研究而忽视了高能耗的问题, 但是其高能耗问题已经开始制约着平台的发展.针对这一问题, 分别建立了任务分配模型、拓扑信息监控模型、数据恢复模型以及能耗模型, 并进一步提出了基于Storm平台的数据恢复节能策略(energy-efficient strategy based on data recovery in Storm, DR-Storm), 包括吞吐量检测算法与数据恢复算法.其中吞吐量检测算法根据拓扑信息监控模型反馈的拓扑信息计算集群吞吐量, 并通过信息反馈判断是否终止整个集群内拓扑的任务.数据恢复算法根据数据恢复模型选择备份节点用于数据存储, 并通过拓扑信息监控模型反馈的信息判断集群拓扑是否进行数据恢复.此外, DR-Storm通过备份节点内存恢复集群拓扑内的数据, 并根据大数据流式计算的系统延迟与能效评估DR-Storm.实验结果表明：与现有研究成果相比, DR-Storm在减少系统计算延迟、降低集群功率的同时, 有效节约了能耗.

Abstract: As one of the most popular platforms in big data stream computing, Storm is developed for high performance in the design process, which ignores the problem of high energy consumption and restricts the development of the platform. Aiming at this problem, the task allocation model, the topology information monitoring model, the data recovery model, and the energy consumption model are set up. Moreover, an energy-efficient strategy based on data recovery in Storm(DR-Storm) is proposed. The proposed strategy is composed of the throughput detection algorithm and the data recovery algorithm. According to the topology information, the throughput detection algorithm calculates cluster throughput which is feedbacked by the topology information monitoring model and estimates whether the task in cluster topology should be terminated by information feedback. The data recovery algorithm selects a backup node for data storage according to the data recovery model and estimates whether cluster topology is appropriate for data recovery by the feedback of the topology information monitoring model. In addition, the DR-Storm recovers data within the cluster topology from the memory of the backup node. We evaluate the DR-Storm by measuring the cluster latency as well as the energy consumption efficiency in a big data stream computing environment. The experimental results show that the proposed strategy can reduce cluster latency and power while the energy consumption is saved efficiently compared with existing researches.

HTML全文

参考文献(0)

施引文献

资源附件(0)