Abstract:
With the rapid growth of data capacity and the continuous improvement of data transfer rate requirement in enterprises, the needs of massive data storage capacity and high network bandwidth in data center become a grand challenge in networked storage area nowadays. Based on high data redundancy in application-specific datasets, data deduplication can reduce storage capacity needs greatly, improve the efficiency of network bandwidth and economize IT cost. Data deduplication technique is rapidly becoming a major research focus in recent years. Firstly, this paper introduces the concepts, categories and applications of data deduplication techniques, and describes the architecture along with the basic principle of data deduplication storage systems, meanwhile, contrasts them with traditional storage systems. Thereafter, it focuses on analyzing and summarizing current research status of key techniques on data deduplication, including data partition methods, I/O optimization techniques, high reliability data deployment strategies and system scalability. Finally, current research work about data deduplication is summarized, and future research directions are pointed out.