Abstract:
Hard disk fault has become the main source of failure in data centers, which seriously affects the reliability of data. The traditional data fault tolerant technology is usually realized by increasing data redundancy, which has some shortcomings. Proactive fault tolerant technology has become a research hotspot, because it can predict hard disk failures and migrate data ahead of time. However, the existing technology mostly studies hard disk fault prediction, but lacks the research of collection, migration and feedback, which causes difficulty in commercialize. This paper proposes a whole process proactive fault tolerant on “Collection—Prediction—Migration—Feedback” mechanism, which includes time-sharing hard disk information collection method, sliding window record merging and sample building method, multi-type hard disk fault prediction method, multi-disk joint data migration method, and two-level validation of prediction results with fast feedback method. The test results show that the impact of collecting hard disk information on front-end thread is only 0.96%, the recall rate of hard disk fault prediction is 94.66%, and data repair time is 55.10% less than traditional methods. This work has been used stably in ZTE’s data center, which meets the objectives of proactive fault tolerance technology, such as high-reliability, high-intelligence, low-interference, low-cost and wide-application.