Duan Wenxue, Hu Ming, Zhou Qiong, Wu Tingming, Zhou Junlong, Liu Xiao, Wei Tongquan, Chen Mingsong. Reliability in Cloud Computing System: A Review[J]. Journal of Computer Research and Development, 2020, 57(1): 102-123. DOI: 10.7544/issn1000-1239.2020.20180675
Citation:
Duan Wenxue, Hu Ming, Zhou Qiong, Wu Tingming, Zhou Junlong, Liu Xiao, Wei Tongquan, Chen Mingsong. Reliability in Cloud Computing System: A Review[J]. Journal of Computer Research and Development, 2020, 57(1): 102-123. DOI: 10.7544/issn1000-1239.2020.20180675
Duan Wenxue, Hu Ming, Zhou Qiong, Wu Tingming, Zhou Junlong, Liu Xiao, Wei Tongquan, Chen Mingsong. Reliability in Cloud Computing System: A Review[J]. Journal of Computer Research and Development, 2020, 57(1): 102-123. DOI: 10.7544/issn1000-1239.2020.20180675
Citation:
Duan Wenxue, Hu Ming, Zhou Qiong, Wu Tingming, Zhou Junlong, Liu Xiao, Wei Tongquan, Chen Mingsong. Reliability in Cloud Computing System: A Review[J]. Journal of Computer Research and Development, 2020, 57(1): 102-123. DOI: 10.7544/issn1000-1239.2020.20180675
1(Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062)
2(School of Economics and Finance, Shanghai International Studies University, Shanghai 200083)
3(School of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing 210094)
4(School of Information Technology, Deakin University, Melbourne, Australia VIC 3125)
Funds: This work was supported by the National Key Research and Development Program of China (2018YFB2101300) and the National Natural Science Foundation of China (61872147).
As a new computing paradigm, cloud computing has attracts extensive concerns from both academic and industrial fields. Based on resource virtualization technology, cloud computing provides users with services in the forms of infrastructure, platform and software in a “pay-as-you-go” manner. In the meanwhile, since cloud computing provides highly scalable computing resources, more and more enterprises and organizations choose cloud computing platforms to deploy their scientific or commercial applications. However, with the increasing number of cloud users, cloud data centers continuously expand and the architecture becomes increasingly complex, leading to growing runtime failures in cloud computing systems. Therefore, how to ensure the system reliability in cloud computing systems with large scale and complex architecture has become a huge challenge. This paper first summarizes various failures in cloud systems, introduces several methods to evaluate the reliability of cloud computing, and describes some key fault management mechanisms. Since fault management techniques inevitably increase energy consumption of cloud systems, this paper reviews current researches on the trade-off between reliability and energy efficiency in cloud computing. In the end, we propose some major challenges in current research of cloud computing reliability and concludes our paper.