异构平台实时任务的可用性提升容错调度算法

孙健; 张兴军; 董小社

doi:10.7544/issn1000-1239.2015.20150721

异构平台实时任务的可用性提升容错调度算法

A Real-Time Task Availability Improving Fault-Tolerant Scheduling Algorithm on Heterogeneous Platform

摘要

摘要: 随着互联网+、云计算以及大数据等领域的迅速发展，异构平台成为部署科学计算、工业控制、云存储等关键应用的重要平台.由于平台内处理机性能及软硬件体系结构的异构性，异构平台表现出良好的可扩展性与高性价比.但是平台规模扩大和系统应用日趋复杂导致异构平台上实时任务的可调度性变差，系统可用性降低.针对此问题，提出了一种异构平台实时任务的可用性提升容错调度算法(availability improving fault-tolerant scheduling algorithm, AIFSAL).以处理器利用率和可用性成本为依据设计任务调度整体框架结构、处理机、任务以及调度模型；结合可用性成本设计算法并通过主副版本备份(primary/backup copy, PB)方法实现容错，任务副版本根据处理器利用率不同选择被动或重叠方式执行以减少系统冗余开销，提高可调度性，调度中无论任务主、副版本均优先选择可用性成本低的处理机以提高系统可用性；对任务分配情况和可调度性进行理论分析以证明AIFSAL的可行性.仿真实验与比较分析表明，AIFSAL较可用性约束(availability approached task scheduling algorithm, AATSAL)算法、单调速率扩展(task partition based fault-tolerant rate-monotonic, TPFTRM)算法以及最早完成时间(MinMin)算法在不降低可调度性的基础上有效地提升了系统可用性，减少了系统综合开销，综合性能提高显著.

Abstract: With the rapid development of Internet plus, cloud computing, big data and other fields, heterogeneous system has become an important platform for the deployment of scientific computing, industrial control, cloud storage and other key applications. Because of the heterogeneity of processor performance and software/hardware structure, heterogeneous platform shows better scalability and high cost-performance ratio. However, with the scale of platform becoming larger and the system application becoming more complex, system schedulability becomes worse, and availability decreases. To solve this problem, we propose a fault-tolerant scheduling algorithm aiming to improve availability for real-time tasks on heterogeneous platform, namely AIFSAL. The algorithm uses processor utilization and availability cost to design real-time task scheduling model, and combines availability cost and primary/backup copy (PB) method together for fault-tolerant. During task scheduling, no matter task’s primary or backup copy, processors with lower availability cost is chosen preferentially in order to improve system availability, meanwhile tasks’ backup copies are executed as the type of passive backup copy preferentially in order to achieve fault-tolerant and ensure the schedulability of task allocation. Simulation experiments and comparison analysis with other task scheduling algorithms, including availability approached task scheduling algorithm (AATSAL), task partition based fault tolerant rate-monotonic (TPFTRM) and the earliest completion algorithm (MinMin), verify the effectiveness of the proposed algorithm on availability improving and schedulability assuring. Hence, the system comprehensive cost is reduced and comprehensive performance is improved significantly.

HTML全文

参考文献(0)

施引文献

资源附件(0)